Seaborn Heatmap - Quick Reference Guide

What Was Delivered

Item	Location	Status
Design Module	`heatmap_design.py`	✓ Complete (364 lines)
Test Suite	`test_heatmap_design.py`	✓ Complete, all tests pass
Design Doc	`docs/HEATMAP_DESIGN.md`	✓ Complete (detailed blueprint)
Functionality	Both Seaborn + Plotly	✓ Both working
Base64 Conversion	Data URI embedding	✓ Working (48KB per image)

5-Minute Summary

The Problem

Your ReportMetrics has separate tech_stack_counts and seniority_counts tuples (aggregates only). You need cross-tabulation (tech × seniority) to visualize skill demand patterns across experience levels.

The Solution

# 1. Get raw records (they have tech_stack + seniority_level fields)
records = dataset.records  # from CuratedDatasetReader

# 2. Build cross-tab pivot table
pivot = build_tech_seniority_pivot_from_records(records)
#
#            Entry-level  Mid-level  Senior
# Python              8         12        15
# SQL                 5         15        12
# AWS                 2         18         8

# 3. Create visualization
fig = create_seaborn_heatmap(pivot)         # matplotlib version
# OR
fig = create_plotly_heatmap(pivot)          # interactive version

# 4. Convert to base64 for HTML embedding
data_uri = figure_to_base64_seaborn(fig)
html = f'<img src="{data_uri}" />'          # Direct embedding, no server needed

Key Concepts

1. Data Structure

Input: JoinedObservationRecord tuples
  - job_id, tech_stack (tuple), seniority_level (str), ...

Processing:
  - Counter("Python", "Mid-level") = 12 jobs match both
  - Build matrix with all combinations

Output: pandas DataFrame
  Index = Tech Skills (rows)
  Columns = Seniority Levels
  Values = Job counts

2. Two Approaches

SEABORN (Static PNG)

fig = create_seaborn_heatmap(pivot, cmap="RdYlGn")
data_uri = figure_to_base64_seaborn(fig)
# Result: ~50KB base64 PNG, embeds in <img> tag

PLOTLY (Interactive)

fig = create_plotly_heatmap(pivot)
# Result: Interactive chart, hover for values, zoom/pan

3. Color Theory

RdYlGn: Red (high demand) → Yellow → Green (low demand)
YlOrRd: Better for showing intensity only
Viridis: Colorblind-friendly, sequential

Recommendation: Use RdYlGn (diverging) to show contrasts (where skills are hot vs. cold).

4. Base64 Flow

matplotlib Figure
      ↓
savefig(BytesIO)     [No temp files]
      ↓
read bytes
      ↓
base64.b64encode()   [Standard library]
      ↓
Prepend "data:image/png;base64,"
      ↓
<img src="data:..." />   [Works anywhere HTML works]

Complete Example: Records → HTML Report

# Step 1: Load data
from mexico_linkedin_jobs_portfolio.analytics.dataset import CuratedDatasetReader
reader = CuratedDatasetReader()
dataset = reader.load(config)

# Step 2: Build chart
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
    build_tech_seniority_pivot_from_records,
    create_seaborn_heatmap,
    figure_to_base64_seaborn,
)

pivot = build_tech_seniority_pivot_from_records(dataset.records, top_n_skills=10)
fig = create_seaborn_heatmap(pivot)
data_uri = figure_to_base64_seaborn(fig)

# Step 3: Generate HTML report
html_content = f"""
<!DOCTYPE html>
<html>
<head>
    <title>Mexico Jobs - Tech Skills Analysis</title>
    <style>
        body {{ font-family: 'Segoe UI', sans-serif; margin: 40px; background: #f5f5f5; }}
        .container {{ background: white; padding: 30px; border-radius: 8px; box-shadow: 0 2px 4px rgba(0,0,0,0.1); }}
        h1 {{ color: #333; border-bottom: 3px solid #007bff; padding-bottom: 10px; }}
        img {{ max-width: 100%; height: auto; }}
        .footer {{ margin-top: 20px; color: #666; font-size: 12px; }}
    </style>
</head>
<body>
    <div class="container">
        <h1>Tech Skills by Seniority Level</h1>
        <p>This heatmap shows how many jobs at each experience level require specific technologies.</p>
        <p><strong>Interpretation:</strong> Darker colors = more jobs requiring that skill at that level.</p>
        <img src="{data_uri}" alt="Skills Heatmap" />
        <div class="footer">
            Generated on 2026-03-24 | Data from Mexico LinkedIn Jobs Dataset
        </div>
    </div>
</body>
</html>
"""

# Step 4: Save or serve
with open("skills_report.html", "w") as f:
    f.write(html_content)
print("✓ Report saved: skills_report.html")  # Open in browser!

Function Reference

Core Functions

# Main entry point
pivot = build_tech_seniority_pivot_from_records(
    records: tuple[JoinedObservationRecord, ...],
    top_n_skills: int = 10,
    top_n_seniorities: Optional[int] = None,
) → pd.DataFrame

# Seaborn pipeline
fig = create_seaborn_heatmap(
    pivot_df: pd.DataFrame,
    title: str = "...",
    figsize: tuple = (12, 7),
    cmap: str = "RdYlGn",      # Color scheme
    annot: bool = True,         # Show cell values
) → plt.Figure

data_uri = figure_to_base64_seaborn(
    fig: plt.Figure,
    format: str = "png",       # png, jpg, webp
    dpi: int = 100,            # 100=screen, 150=print
) → str                         # "data:image/png;base64,..."

# Plotly alternative
fig = create_plotly_heatmap(
    pivot_df: pd.DataFrame,
    locale: str = "en",
) → go.Figure

# One-shot workflows
data_uri = seaborn_complete_workflow(records)
fig = plotly_complete_workflow(records, locale="en")

Integration with Your Project

Current Architecture

ReportMetrics (aggregates only)
    ↓
chart functions
    ↓
Plotly figures
Issue: No cross-tabulation, no raw records passed

Minimal Fix (Recommended)

# In charts.py, update function signature:
def create_seniority_skills_heatmap(
    metrics: ReportMetrics,
    records: Optional[tuple[JoinedObservationRecord, ...]] = None,
) → go.Figure:
    if records:
        pivot = build_tech_seniority_pivot_from_records(records)
        return create_plotly_heatmap(pivot)
    else:
        return _placeholder()  # Fallback

# In metrics.py, pass records:
charts = create_all_charts(
    metrics,
    records=build_result.records  # ← Need to add this
)

Future Integration (v2)

# Modify MetricsBuildResult:
@dataclass
class MetricsBuildResult:
    metrics: ReportMetrics
    latest_jobs: tuple[LatestJobRecord, ...]
    records: tuple[JoinedObservationRecord, ...]  # ← ADD THIS

# Then no changes needed in chart functions

Testing & Validation

# Run all tests
python -m pytest tests/test_heatmap_design.py -v

# Results:
# ✓ Pivot building from records works
# ✓ Seaborn styling works
# ✓ Plotly generation works
# ✓ Base64 conversion produces valid data URIs
# ✓ All 4 colormaps work (RdYlGn, YlOrRd, viridis, cool)
# ✓ Complete workflows execute end-to-end

Files Overview

`heatmap_design.py` (364 lines)

Pivot building: Cross-tab logic using defaultdict + Counter
Seaborn heatmap: Full styling with best practices applied
Plotly heatmap: Interactive alternative with tooltips
Base64 conversion: Buffer → bytes → base64 → data URI
Complete workflows: Records → visualization end-to-end
Integration helpers: Ready-to-use functions for charts.py

`test_heatmap_design.py` (217 lines)

Sample data: Realistic JoinedObservationRecord generators
5 test suites: Pivot, seaborn, Plotly, workflows, styling
Validation: All components tested with real data flow
Usage examples: Copy-paste ready code snippets

`docs/HEATMAP_DESIGN.md` (420 lines)

Complete blueprint: Design decisions explained
Data flow diagrams: Visual architecture
Best practices: Seaborn styling theory & application
Integration paths: Current state → recommended solution
Quick start: Copy-paste to get started

Common Questions

Q: Should I use Seaborn or Plotly? A: Plotly for production (interactive, web-native). Seaborn for quick exports (familiar to data scientists).

Q: How big is the base64 image? A: ~50-60KB for figsize=(12,7), dpi=100. Add more if needed.

Q: Can I embed this directly in a Streamlit app? A: Yes! st.plotly_chart(fig) works directly.

Q: What if ReportMetrics doesn't have raw records? A: Use mock synthetic data for UI testing, or implement L2/L3 integration path to access records.

Q: Why RdYlGn colormap? A: Diverging colormaps show contrasts better (skill is hot at Senior but cold at Entry-level), plus it's somewhat colorblind-friendly.

Status: ✓ READY TO USE

All implementations tested and working. No additional dependencies needed (seaborn already in viz optional-dependencies).

Next Step: Choose integration path and update charts.py to pass raw records to heatmap function.