LinkedInWebScraper
LinkedInWebScraper provides a reusable workflow for scraping LinkedIn job listings, normalizing the results, persisting run history, and exporting datasets that can be rerun safely over time.
What It Does
- Scrapes LinkedIn search result pages and job detail pages
- Cleans and normalizes job metadata such as locations, job IDs, and extracted fields
- Supports single scrapes and daily multi-city runs
- Persists run history to SQLite through a clean application storage port
- Writes managed artifacts under
artifacts/jobs,artifacts/logs, andartifacts/state - Keeps OpenAI enrichment optional and isolated behind an extra plus runtime toggle
- Keeps runnable examples under
examples/
Runtime Surfaces
The project has two supported runtime modes:
- Programmatic library usage through
JobScraperConfig,LinkedInJobScraper, andDailyScrapeService - TOML-driven CLI usage through
linkedin-webscraper scrape once,scrape daily, andexport
The root scripts remain available for direct execution:
python main.py-> default daily runpython process_ds_jobs.py-> default single-location run
Defaults
- Bare log filenames resolve under
artifacts/logs - Bare CSV filenames resolve under
artifacts/jobs - Bare SQLite filenames resolve under
artifacts/state - Default managed DB path is
artifacts/state/linkedin_jobs.sqlite - OpenAI enrichment requires the optional extra and
OPENAI_API_KEY
Next Steps
- Follow Getting Started for library and CLI usage
- Use Configuration for config models, runtime TOML, and env overrides
- Use Runtime and Deployment for CLI, dry-run, and Docker workflows
- See API Reference for generated module documentation