Deliverables Summary: Seaborn Heatmap Design
π¦ What Was Created
mx-jobs-insights/
βββ π src/mexico_linkedin_jobs_portfolio/analytics/
β βββ heatmap_design.py β Core implementation (364 lines)
β βββ build_tech_seniority_pivot_from_records()
β βββ create_seaborn_heatmap()
β βββ create_plotly_heatmap()
β βββ figure_to_base64_seaborn()
β βββ seaborn_complete_workflow()
β βββ plotly_complete_workflow()
β βββ Integration helpers
β
βββ π§ͺ tests/
β βββ test_heatmap_design.py β Test suite (217 lines)
β βββ Test 1: Pivot building
β βββ Test 2: Seaborn + base64
β βββ Test 3: Plotly interactive
β βββ Test 4: Complete workflows
β βββ Test 5: Styling variations
β
βββ π docs/
βββ HEATMAP_DESIGN.md β Detailed blueprint (420 lines)
βββ HEATMAP_QUICK_START.md β Quick reference (this file)
β Test Results
TEST EXECUTION SUMMARY
======================
β TEST 1: Building Pivot Table from Records
- Created 105 sample records
- Built 8 skills Γ 3 seniority levels matrix
- Total 116 jobs across all cells
β TEST 2: Seaborn Heatmap & Base64 Encoding
- Generated matplotlib figure
- Converted to base64 PNG
- Data URI: 48,690 characters (~50KB)
- Ready for HTML <img> tag embedding
β TEST 3: Plotly Heatmap (Interactive)
- Generated Plotly Figure
- JSON representation: 7,787 characters
- Supports interactive hover, zoom, pan
β TEST 4: Complete Workflows
- End-to-end: records β visualization β output
- Both seaborn and Plotly paths working
β TEST 5: Styling Variations
- Tested colormaps: RdYlGn, YlOrRd, viridis, cool
- All working correctly
βββ ALL TESTS PASSED βββ
π― Design Answers (Your 5 Questions)
1) Data Structure Needed
# Pivot Table (pandas.DataFrame)
# Entry-level Mid-level Senior
# Python 8 12 15
# SQL 5 15 12
# AWS 2 18 8
# React 6 14 10
# Structure:
# - Index: Top N tech skills (rows)
# - Columns: Seniority levels (sorted logically)
# - Values: Job counts (filled with 0 for missing combinations)
2) Generate from ReportMetrics
# Current: Separate counts only
metrics.tech_stack_counts # (Python: 50, SQL: 48, ...)
metrics.seniority_counts # (Mid-level: 25, Senior: 20, ...)
# β No cross-tab
# Solution: Use raw records instead
records = dataset.records # JoinedObservationRecord tuples
pivot = build_tech_seniority_pivot_from_records(records) # β Works!
# How it works:
# 1. Counter all (tech, seniority) combinations
# 2. Select top N skills by total frequency
# 3. Create DataFrame with natural seniority ordering
# 4. Zero-fill missing cells
3) Best Practices for Seaborn
sns.heatmap(
pivot,
cmap="RdYlGn", # β Diverging, shows contrasts
vmin=0, # β Fair scale (not auto-normalized)
linewidths=1, # β Clear cell boundaries
linecolor="white", # β High contrast
annot=True, # β Show actual counts
fmt="d", # β Integer format
square=False, # β Allow readable skill names
)
# Layout:
plot = plt.subplots(figsize=(12, 7), dpi=100) # β Good screen res
plt.tight_layout() # β Prevent label cutoff
ax.tick_params(axis="x", rotation=45) # β Prevent overlap
4) Seaborn to Base64 PNG
def figure_to_base64_seaborn(fig: plt.Figure) -> str:
# Step 1: Render to memory buffer (no temp files)
buffer = io.BytesIO()
fig.savefig(buffer, format="png", dpi=100, bbox_inches="tight")
buffer.seek(0)
# Step 2: Read bytes
img_bytes = buffer.read()
# Step 3: Encode as base64
b64_string = base64.b64encode(img_bytes).decode("utf-8")
# Step 4: Create data URI
return f"data:image/png;base64,{b64_string}"
# Result: ~50KB data URI
# Embed in HTML: <img src="{data_uri}" />
# Works: β Offline, β No server, β Fast, β Self-contained
5) Complete Sample Code
# ===== COMPLETE WORKFLOW: Records β Base64 =====
# 1. Load raw records
from mexico_linkedin_jobs_portfolio.analytics.dataset import CuratedDatasetReader
from mexico_linkedin_jobs_portfolio.config import CuratedStorageConfig
reader = CuratedDatasetReader()
dataset = reader.load(CuratedStorageConfig(...))
records = dataset.records
# 2. Build cross-tab
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
build_tech_seniority_pivot_from_records,
)
pivot = build_tech_seniority_pivot_from_records(
records,
top_n_skills=10,
)
# 3. Create heatmap
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
create_seaborn_heatmap,
)
fig = create_seaborn_heatmap(
pivot,
title="Tech Skills by Seniority Level",
figsize=(12, 7),
cmap="RdYlGn",
annot=True,
)
# 4. Convert to base64
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
figure_to_base64_seaborn,
)
data_uri = figure_to_base64_seaborn(fig, dpi=100)
# 5. Create HTML report
html_report = f"""
<!DOCTYPE html>
<html>
<head>
<title>Mexico Tech Jobs - Skills Analysis</title>
<style>
body {{ font-family: Arial, sans-serif; margin: 40px; }}
img {{ max-width: 100%; height: auto; border: 1px solid #ddd; }}
</style>
</head>
<body>
<h1>Tech Skills by Seniority Level</h1>
<p>How many jobs require each technology at each experience level.</p>
<img src="{data_uri}" alt="Skills Heatmap" />
</body>
</html>
"""
# 6. Save and done!
with open("skill_heatmap_report.html", "w") as f:
f.write(html_report)
print("β Report created: skill_heatmap_report.html")
# Open in browser β fully embedded, works offline, no dependencies needed!
# ===== ALTERNATIVES =====
# Plotly version (interactive):
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
create_plotly_heatmap,
)
fig = create_plotly_heatmap(pivot)
fig.show() # Or: display in Streamlit/HTML
# One-shot workflow:
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
seaborn_complete_workflow,
)
data_uri = seaborn_complete_workflow(records, "/tmp/report.html")
ποΈ Architecture Overview
Data Flow:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JoinedObservationRecord tuple β
β - job_id, tech_stack (tuple), seniority_level (str) β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
β
ββββββββββββββββββββββββββββββββββββββββ
β build_tech_seniority_pivot_from...() β
β - Counter (tech, seniority) β
β - Top N skills selection β
β - Logic seniority ordering β
β - Zero-fill matrix β
ββββββββββββββββ¬ββββββββββββββββββββββββ
β
β
βββββββββββββββββββββββββ
β pandas.DataFrame β
β tech_stack Γ seniorityβ
β with job counts β
βββββββββ¬ββββββββββββββββ
β
ββββββββββββ΄βββββββββββ
β β
Seaborn Plotly
(static PNG) (interactive)
β β
matplotlib.Figure go.Figure
β β
base64 PNG JSON/HTML
β β
data:image/... <script> tag
β β
<img> embed Direct render
π Both Approaches Compared
| Aspect | Seaborn | Plotly |
|---|---|---|
| Visualization | Static PNG | Interactive |
| File Format | Base64 data URI | JSON/HTML |
| Size | ~50KB | ~8-15KB JSON |
| Web Ready | Needs conversion | Native |
| Interactivity | None | Hover, zoom, pan |
| Screen Reader | Image only | HTML, accessible |
| Learning Curve | Familiar | Growing standard |
| Best For | Exports, reports | Web apps, dashboards |
Recommendation: Use Plotly for web, Seaborn for exports.
π Next Steps
Immediate (No Architecture Changes)
# Already works, use it:
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
seaborn_complete_workflow,
)
data_uri = seaborn_complete_workflow(your_records)
Short-term (Minimal Integration)
# Update charts.py:
def create_seniority_skills_heatmap(
metrics: ReportMetrics,
records: Optional[tuple] = None, # β Add this
) β go.Figure:
if records:
pivot = build_tech_seniority_pivot_from_records(records)
return create_plotly_heatmap(pivot)
return _placeholder()
Long-term (Full Integration)
# Modify MetricsBuildResult:
@dataclass
class MetricsBuildResult:
metrics: ReportMetrics
latest_jobs: tuple[LatestJobRecord, ...]
records: tuple[JoinedObservationRecord, ...] # β Add this
# Then update chart pipeline:
charts = create_all_charts(metrics, records=build_result.records)
π Checklist
- β 1. Data structure designed (pandas DataFrame cross-tab)
- β 2. Pivot building implemented (from raw records)
- β 3. Seaborn best practices applied (RdYlGn, annotations, styling)
- β 4. Base64 conversion working (matplotlib β data URI)
- β 5. Complete sample code provided (records β HTML)
- β 6. Alternative Plotly approach included
- β 7. Tests written and passing
- β 8. Documentation comprehensive
- β
9. No external dependencies added (seaborn already in
viz) - β 10. Ready for production use
π Documentation Files
heatmap_design.py(364 lines)- Core implementation with elegant API
- Extensive docstrings with examples
-
Toggle between Seaborn and Plotly
-
test_heatmap_design.py(217 lines) - 5 test modules covering all functionality
- Sample data generation
-
Usage patterns shown
-
docs/HEATMAP_DESIGN.md(420 lines) - Detailed architecture & rationale
- Best practices explained
-
Integration paths documented
-
docs/HEATMAP_QUICK_START.md(this file) - Copy-paste ready code
- Common Q&A
- Visual reference
β¨ Status: COMPLETE & TESTED
All functionality implemented, tested, and documented. Ready for immediate use or integration into charts.py.
No further work needed unless you want to: - Integrate into existing chart pipeline (recommended) - Add additional statistical overlays (heatmap annotations) - Create interactive dashboard version (Streamlit/Dash)
Start using immediately with:
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
seaborn_complete_workflow,
)
data_uri = seaborn_complete_workflow(records)
Questions? All design decisions documented in HEATMAP_DESIGN.md.