Skip to content

Deliverables Summary: Seaborn Heatmap Design

πŸ“¦ What Was Created

mx-jobs-insights/
β”œβ”€β”€ πŸ“„ src/mexico_linkedin_jobs_portfolio/analytics/
β”‚   └── heatmap_design.py              ← Core implementation (364 lines)
β”‚       β”œβ”€β”€ build_tech_seniority_pivot_from_records()
β”‚       β”œβ”€β”€ create_seaborn_heatmap()
β”‚       β”œβ”€β”€ create_plotly_heatmap()
β”‚       β”œβ”€β”€ figure_to_base64_seaborn()
β”‚       β”œβ”€β”€ seaborn_complete_workflow()
β”‚       β”œβ”€β”€ plotly_complete_workflow()
β”‚       └── Integration helpers
β”‚
β”œβ”€β”€ πŸ§ͺ tests/
β”‚   └── test_heatmap_design.py         ← Test suite (217 lines)
β”‚       β”œβ”€β”€ Test 1: Pivot building
β”‚       β”œβ”€β”€ Test 2: Seaborn + base64
β”‚       β”œβ”€β”€ Test 3: Plotly interactive
β”‚       β”œβ”€β”€ Test 4: Complete workflows
β”‚       └── Test 5: Styling variations
β”‚
└── πŸ“š docs/
    β”œβ”€β”€ HEATMAP_DESIGN.md              ← Detailed blueprint (420 lines)
    └── HEATMAP_QUICK_START.md         ← Quick reference (this file)

βœ… Test Results

TEST EXECUTION SUMMARY
======================
βœ“ TEST 1: Building Pivot Table from Records
  - Created 105 sample records
  - Built 8 skills Γ— 3 seniority levels matrix
  - Total 116 jobs across all cells

βœ“ TEST 2: Seaborn Heatmap & Base64 Encoding
  - Generated matplotlib figure
  - Converted to base64 PNG
  - Data URI: 48,690 characters (~50KB)
  - Ready for HTML <img> tag embedding

βœ“ TEST 3: Plotly Heatmap (Interactive)
  - Generated Plotly Figure
  - JSON representation: 7,787 characters
  - Supports interactive hover, zoom, pan

βœ“ TEST 4: Complete Workflows
  - End-to-end: records β†’ visualization β†’ output
  - Both seaborn and Plotly paths working

βœ“ TEST 5: Styling Variations
  - Tested colormaps: RdYlGn, YlOrRd, viridis, cool
  - All working correctly

βœ“βœ“βœ“ ALL TESTS PASSED βœ“βœ“βœ“

🎯 Design Answers (Your 5 Questions)

1) Data Structure Needed

# Pivot Table (pandas.DataFrame)
#                Entry-level  Mid-level  Senior
# Python               8         12         15
# SQL                  5         15         12
# AWS                  2         18          8
# React                6         14         10

# Structure:
# - Index: Top N tech skills (rows)
# - Columns: Seniority levels (sorted logically)
# - Values: Job counts (filled with 0 for missing combinations)

2) Generate from ReportMetrics

# Current: Separate counts only
metrics.tech_stack_counts       # (Python: 50, SQL: 48, ...)
metrics.seniority_counts        # (Mid-level: 25, Senior: 20, ...)
                                # βœ— No cross-tab

# Solution: Use raw records instead
records = dataset.records       # JoinedObservationRecord tuples
pivot = build_tech_seniority_pivot_from_records(records)  # βœ“ Works!

# How it works:
# 1. Counter all (tech, seniority) combinations
# 2. Select top N skills by total frequency
# 3. Create DataFrame with natural seniority ordering
# 4. Zero-fill missing cells

3) Best Practices for Seaborn

sns.heatmap(
    pivot,
    cmap="RdYlGn",              # βœ“ Diverging, shows contrasts
    vmin=0,                     # βœ“ Fair scale (not auto-normalized)
    linewidths=1,               # βœ“ Clear cell boundaries
    linecolor="white",          # βœ“ High contrast
    annot=True,                 # βœ“ Show actual counts
    fmt="d",                    # βœ“ Integer format
    square=False,               # βœ“ Allow readable skill names
)

# Layout:
plot = plt.subplots(figsize=(12, 7), dpi=100)  # βœ“ Good screen res
plt.tight_layout()              # βœ“ Prevent label cutoff
ax.tick_params(axis="x", rotation=45)  # βœ“ Prevent overlap

4) Seaborn to Base64 PNG

def figure_to_base64_seaborn(fig: plt.Figure) -> str:
    # Step 1: Render to memory buffer (no temp files)
    buffer = io.BytesIO()
    fig.savefig(buffer, format="png", dpi=100, bbox_inches="tight")
    buffer.seek(0)

    # Step 2: Read bytes
    img_bytes = buffer.read()

    # Step 3: Encode as base64
    b64_string = base64.b64encode(img_bytes).decode("utf-8")

    # Step 4: Create data URI
    return f"data:image/png;base64,{b64_string}"

# Result: ~50KB data URI
# Embed in HTML: <img src="{data_uri}" />
# Works: βœ“ Offline, βœ“ No server, βœ“ Fast, βœ“ Self-contained

5) Complete Sample Code

# ===== COMPLETE WORKFLOW: Records β†’ Base64 =====

# 1. Load raw records
from mexico_linkedin_jobs_portfolio.analytics.dataset import CuratedDatasetReader
from mexico_linkedin_jobs_portfolio.config import CuratedStorageConfig

reader = CuratedDatasetReader()
dataset = reader.load(CuratedStorageConfig(...))
records = dataset.records

# 2. Build cross-tab
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
    build_tech_seniority_pivot_from_records,
)

pivot = build_tech_seniority_pivot_from_records(
    records,
    top_n_skills=10,
)

# 3. Create heatmap
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
    create_seaborn_heatmap,
)

fig = create_seaborn_heatmap(
    pivot,
    title="Tech Skills by Seniority Level",
    figsize=(12, 7),
    cmap="RdYlGn",
    annot=True,
)

# 4. Convert to base64
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
    figure_to_base64_seaborn,
)

data_uri = figure_to_base64_seaborn(fig, dpi=100)

# 5. Create HTML report
html_report = f"""
<!DOCTYPE html>
<html>
<head>
    <title>Mexico Tech Jobs - Skills Analysis</title>
    <style>
        body {{ font-family: Arial, sans-serif; margin: 40px; }}
        img {{ max-width: 100%; height: auto; border: 1px solid #ddd; }}
    </style>
</head>
<body>
    <h1>Tech Skills by Seniority Level</h1>
    <p>How many jobs require each technology at each experience level.</p>
    <img src="{data_uri}" alt="Skills Heatmap" />
</body>
</html>
"""

# 6. Save and done!
with open("skill_heatmap_report.html", "w") as f:
    f.write(html_report)

print("βœ“ Report created: skill_heatmap_report.html")
# Open in browser β†’ fully embedded, works offline, no dependencies needed!

# ===== ALTERNATIVES =====

# Plotly version (interactive):
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
    create_plotly_heatmap,
)
fig = create_plotly_heatmap(pivot)
fig.show()  # Or: display in Streamlit/HTML

# One-shot workflow:
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
    seaborn_complete_workflow,
)
data_uri = seaborn_complete_workflow(records, "/tmp/report.html")

πŸ—οΈ Architecture Overview

Data Flow:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ JoinedObservationRecord tuple                              β”‚
β”‚ - job_id, tech_stack (tuple), seniority_level (str)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           ↓
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ build_tech_seniority_pivot_from...() β”‚
        β”‚ - Counter (tech, seniority)          β”‚
        β”‚ - Top N skills selection             β”‚
        β”‚ - Logic seniority ordering           β”‚
        β”‚ - Zero-fill matrix                   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       ↓
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚ pandas.DataFrame      β”‚
           β”‚ tech_stack Γ— seniorityβ”‚
           β”‚ with job counts       β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        ↓                     ↓
    Seaborn           Plotly
    (static PNG)      (interactive)
        ↓                     ↓
  matplotlib.Figure  go.Figure
        ↓                     ↓
   base64 PNG       JSON/HTML
        ↓                     ↓
  data:image/...    <script> tag
        ↓                     ↓
  <img> embed    Direct render

πŸ“Š Both Approaches Compared

Aspect Seaborn Plotly
Visualization Static PNG Interactive
File Format Base64 data URI JSON/HTML
Size ~50KB ~8-15KB JSON
Web Ready Needs conversion Native
Interactivity None Hover, zoom, pan
Screen Reader Image only HTML, accessible
Learning Curve Familiar Growing standard
Best For Exports, reports Web apps, dashboards

Recommendation: Use Plotly for web, Seaborn for exports.


πŸš€ Next Steps

Immediate (No Architecture Changes)

# Already works, use it:
from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
    seaborn_complete_workflow,
)
data_uri = seaborn_complete_workflow(your_records)

Short-term (Minimal Integration)

# Update charts.py:
def create_seniority_skills_heatmap(
    metrics: ReportMetrics,
    records: Optional[tuple] = None,  # ← Add this
) β†’ go.Figure:
    if records:
        pivot = build_tech_seniority_pivot_from_records(records)
        return create_plotly_heatmap(pivot)
    return _placeholder()

Long-term (Full Integration)

# Modify MetricsBuildResult:
@dataclass
class MetricsBuildResult:
    metrics: ReportMetrics
    latest_jobs: tuple[LatestJobRecord, ...]
    records: tuple[JoinedObservationRecord, ...]  # ← Add this

# Then update chart pipeline:
charts = create_all_charts(metrics, records=build_result.records)

πŸ“‹ Checklist

  • βœ… 1. Data structure designed (pandas DataFrame cross-tab)
  • βœ… 2. Pivot building implemented (from raw records)
  • βœ… 3. Seaborn best practices applied (RdYlGn, annotations, styling)
  • βœ… 4. Base64 conversion working (matplotlib β†’ data URI)
  • βœ… 5. Complete sample code provided (records β†’ HTML)
  • βœ… 6. Alternative Plotly approach included
  • βœ… 7. Tests written and passing
  • βœ… 8. Documentation comprehensive
  • βœ… 9. No external dependencies added (seaborn already in viz)
  • βœ… 10. Ready for production use

πŸ“š Documentation Files

  1. heatmap_design.py (364 lines)
  2. Core implementation with elegant API
  3. Extensive docstrings with examples
  4. Toggle between Seaborn and Plotly

  5. test_heatmap_design.py (217 lines)

  6. 5 test modules covering all functionality
  7. Sample data generation
  8. Usage patterns shown

  9. docs/HEATMAP_DESIGN.md (420 lines)

  10. Detailed architecture & rationale
  11. Best practices explained
  12. Integration paths documented

  13. docs/HEATMAP_QUICK_START.md (this file)

  14. Copy-paste ready code
  15. Common Q&A
  16. Visual reference

✨ Status: COMPLETE & TESTED

All functionality implemented, tested, and documented. Ready for immediate use or integration into charts.py.

No further work needed unless you want to: - Integrate into existing chart pipeline (recommended) - Add additional statistical overlays (heatmap annotations) - Create interactive dashboard version (Streamlit/Dash)

Start using immediately with:

from mexico_linkedin_jobs_portfolio.analytics.heatmap_design import (
    seaborn_complete_workflow,
)
data_uri = seaborn_complete_workflow(records)

Questions? All design decisions documented in HEATMAP_DESIGN.md.