All projects

LinkedIn Job Scraping & Data Export Pipeline

web-scrapingautomationdata-engineering

Screenshot coming in Phase 3

Problem

Teams often need recurring job-market or competitive-market datasets, but manual collection is slow, inconsistent, and hard to reuse.

Solution

Built a Python scraping pipeline that collects job listings, normalizes records, persists run history, and exports reusable datasets through CLI workflows.

Deliverables

  • Scraping library
  • CLI commands
  • SQLite-backed persistence
  • Export workflows
  • Managed artifacts
  • Documentation
  • CI/docs/release automation
  • Optional OpenAI enrichment

Why it matters

  • Turns manual job-market research into an automated, repeatable data product
  • CLI-driven — anyone on the team can run it, not just the person who built it
  • SQLite history lets you track market changes over time without re-scraping from scratch
  • Optional OpenAI enrichment adds structured tagging to raw scraped records

Tech Stack

PythonSQLiteCLI toolingTOML configGitHub ActionsPyPIOpenAI API

Services

Web ScrapingData ExtractionPython AutomationETL PipelinesReporting Automation