Seaborn Heatmap Design - Complete Blueprint
1. Data Structure Design
INPUT (JoinedObservationRecord tuples)
├── tech_stack: ("Python", "SQL", "AWS")
├── seniority_level: "Mid-level"
└── [100+ more records]
PROCESSING
├── Step 1: Cross-tabulate (tech × seniority)
├── Step 2: Count co-occurrences
└── Step 3: Build DataFrame (top 10 skills × all seniorities)
OUTPUT (pandas.DataFrame)
Entry-level Mid-level Senior
Python 8 12 15
SQL 5 15 12
AWS 2 18 8
React 6 14 10
...
2. Complete Data Flow
ReportMetrics & RawRecords
↓
build_tech_seniority_pivot_from_records()
↓
pd.DataFrame (pivot table)
↓
┌────┴────┐
↓ ↓
Seaborn Plotly
↓ ↓
mpl.Figure go.Figure
↓ ↓
base64 JSON/HTML
↓ ↓
<img/> <script/>
3. Generating from ReportMetrics: Three Options
| Level | Approach | Effort | Accuracy | Code |
|---|---|---|---|---|
| L1: Current | Use aggregates only (no cross-tab) | 5 min | Low | tech_stack_counts + seniority_counts independent |
| L2: Recommended | Pass raw records through pipeline | 2 hrs | High | Modify MetricsBuildResult, update function signatures |
| L3: Future | Pre-compute in database | 1 day | High | Add materialized view to curated schema |
L1: Quick Mock (for testing UI)
# Create synthetic pivot from separate counts
pivot = pd.DataFrame(
np.random.randint(0, 20, size=(8, 3)),
index=[tech for tech, _ in metrics.tech_stack_counts[:8]],
columns=[sen for sen, _ in metrics.seniority_counts],
)
L2: Recommended (with access to records)
records = dataset.records # From CuratedDatasetReader
pivot = build_tech_seniority_pivot_from_records(records)
4. Seaborn Heatmap Best Practices
Color Mapping
sns.heatmap(
pivot,
cmap="RdYlGn", # ✓ Diverging (not sequential)
vmin=0, # ✓ Fair scale (not auto-normalized)
annot=True, # ✓ Show actual counts
fmt="d", # ✓ Integer format
)
Why RdYlGn? - Red = High skill demand (hot jobs) - Yellow = Medium demand - Green = Low demand (cool skills for this level) - Readers immediately see patterns
Styling Choices
sns.heatmap(
pivot,
linewidths=1, # ✓ Clear cell boundaries
linecolor="white", # ✓ High contrast
square=False, # ✓ Allow readable skill names
cbar_kws={"label": "Job Count"}, # ✓ Labeled colorbar
)
Layout Tips
fig, ax = plt.subplots(figsize=(12, 7), dpi=100) # ✓ Good screen resolution
ax.set_title("...", fontsize=16, fontweight="bold", pad=20)
ax.tick_params(axis="x", rotation=45) # ✓ Prevent label overlap
plt.tight_layout() # ✓ Prevent label cutoff
5. Converting to Base64 PNG for HTML Embedding
Flow
matplotlib.Figure
↓
savefig → BytesIO buffer (in memory, no temp files)
↓
read buffer → bytes
↓
base64.b64encode() → string
↓
Prepend "data:image/png;base64," → data URI
↓
Direct HTML embedding: <img src="{data_uri}" />
Code
def figure_to_base64_seaborn(fig: plt.Figure) -> str:
buffer = io.BytesIO()
fig.savefig(buffer, format="png", dpi=100, bbox_inches="tight")
buffer.seek(0)
img_bytes = buffer.read()
b64_string = base64.b64encode(img_bytes).decode("utf-8")
return f"data:image/png;base64,{b64_string}"
HTML Usage
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA..."
alt="Skills Heatmap"
style="max-width: 100%; height: auto;" />
Advantages
- ✓ No external server needed
- ✓ Single HTML file self-contained
- ✓ Works offline
- ✓ Fast (no HTTP request)
Disadvantages - ✗ Data URI can be ~2-5MB for large images (usually OK) - ✗ Less cacheable than URL-based images
Size Tips
- figsize=(12, 7), dpi=100 → ~200-300KB base64
- dpi=150 → ~500KB (for print quality)
- Reduce with figsize=(8, 5) if needed
6. Complete Sample Code: Records → Base64
# 1. Load records
from mexico_linkedin_jobs_portfolio.analytics.dataset import CuratedDatasetReader
from mexico_linkedin_jobs_portfolio.config import CuratedStorageConfig
reader = CuratedDatasetReader()
dataset = reader.load(CuratedStorageConfig(...))
records = dataset.records
# 2. Build pivot
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
build_tech_seniority_pivot_from_records,
)
pivot = build_tech_seniority_pivot_from_records(
records,
top_n_skills=10,
)
# 3. Create heatmap
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
create_seaborn_heatmap,
)
fig = create_seaborn_heatmap(
pivot,
title="Tech Skills by Seniority Level",
cmap="RdYlGn",
)
# 4. Convert to base64
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
figure_to_base64_seaborn,
)
data_uri = figure_to_base64_seaborn(fig)
# 5. Embed in HTML template
html = f"""
<!DOCTYPE html>
<html>
<head>
<title>Job Market Insights</title>
<style>
body {{ font-family: Arial, sans-serif; margin: 40px; }}
img {{ max-width: 100%; height: auto; border: 1px solid #ddd; }}
.chart-container {{ margin: 20px 0; }}
</style>
</head>
<body>
<h1>Mexico Tech Job Market</h1>
<div class="chart-container">
<h2>Skills in Demand by Experience Level</h2>
<img src="{data_uri}" alt="Skills Heatmap" />
</div>
</body>
</html>
"""
with open("report.html", "w") as f:
f.write(html)
print("✓ Report saved: report.html")
7. Comparison: Seaborn vs Plotly
| Aspect | Seaborn | Plotly |
|---|---|---|
| Setup | pip install seaborn | pip install plotly |
| Interaction | Static | Interactive (hover, zoom) |
| Styling | Matplotlib customization | JSON-based layouts |
| Web Ready | Needs base64 conversion | Native HTML/JSON |
| File Size | Smaller (~200-300KB PNG) | Larger (JSON) |
| Accessibility | Image (alt text needed) | Vector/HTML (screen readers) |
| Learning Curve | Familiar to data scientists | Growing standard for web |
| Embedding | <img src="..." /> |
<script>Plotly.newPlot(...)</script> |
Recommendation: For final reports → Plotly (interactive, web-native). For quick exports → Seaborn (familiar, simpler).
8. Integration Path: Next Steps
Current State
ReportMetrics → chart functions → Plotly figures
✗ No raw record access
Minimal Integration (v1)
# Create a wrapper function that accepts optional records
def create_seniority_skills_heatmap(
metrics: ReportMetrics,
records: Optional[tuple[JoinedObservationRecord, ...]] = None,
use_plotly: bool = True,
) -> go.Figure:
if records and use_plotly:
# New: use real data
pivot = build_tech_seniority_pivot_from_records(records)
return create_plotly_heatmap(pivot)
else:
# Fallback: current placeholder
return _placeholder_heatmap()
Full Integration (v2)
# Modify MetricsBuildResult to include records
@dataclass
class MetricsBuildResult:
metrics: ReportMetrics
latest_jobs: tuple[LatestJobRecord, ...]
records: tuple[JoinedObservationRecord, ...] # ← NEW
# Update chart pipeline
charts = create_all_charts(metrics, records=build_result.records)
9. Key Files Created
heatmap_design.py(364 lines)- Core implementation
- Both seaborn and Plotly approaches
- Complete workflows
-
Documentation & best practices
-
test_heatmap_design.py(217 lines) - Comprehensive test suite
- Sample data generation
- All workflows demonstrated
- Styling variations tested
10. Quick Start
# Run tests
python -m pytest tests/test_heatmap_design.py -v
# Within your code
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
seaborn_complete_workflow,
plotly_complete_workflow,
)
# Get base64 PNG for HTML embedding
data_uri = seaborn_complete_workflow(records)
# Get interactive Plotly figure
fig = plotly_complete_workflow(records)
Status: ✓ Design complete, implementation ready, test suite provided
Next Actions:
1. Run tests to validate
2. Choose integration path (v1 wrapper vs v2 full)
3. Update charts.py with new heatmap function
4. Test with real curated data