Runtime and Deployment
CLI Commands
The canonical CLI entrypoint is:
linkedin-webscraper <command>
Supported commands:
scrape once: single-location scrape with persistence and CSV exportscrape daily: multi-city daily scrape with persistence and CSV exportexport: export a persisted run from SQLite to CSV
Examples:
linkedin-webscraper scrape once --config runtime.toml
linkedin-webscraper scrape daily --config runtime.toml
linkedin-webscraper export --config runtime.toml --run-id <run-id>
Dry Run
Use --dry-run to validate the resolved runtime plan without hitting LinkedIn:
linkedin-webscraper scrape once --config runtime.toml --dry-run
linkedin-webscraper scrape daily --config runtime.toml --dry-run
linkedin-webscraper export --config runtime.toml --run-id <run-id> --dry-run
Dry run prints the resolved command, key runtime inputs, output path, and storage URL.
Compatibility Wrappers
These scripts remain valid during the migration:
python main.py-> defaults toscrape dailypython process_ds_jobs.py-> defaults toscrape once
They delegate to the canonical package CLI instead of carrying separate runtime logic.
Runtime Config File
Use runtime.example.toml as the starting point for a local or scheduled runtime.toml.
Typical workflow:
copy runtime.example.toml runtime.toml
linkedin-webscraper scrape daily --config runtime.toml
You can also point to a config path through LINKEDIN_WEB_SCRAPER_CONFIG.
GitHub Actions Runtime
The scheduled automation uses .github/runtime/daily.toml.
That workflow-specific config keeps three contracts stable:
- SQLite state stays under
artifacts/state - CSV exports stay under
artifacts/jobs - logs stay under
artifacts/logs
The workflow then copies persisted state and dated exports onto the data branch.
Docker
The repo includes a slim Dockerfile and .dockerignore.
Build the image:
docker build -t linkedin-webscraper .
Build with the optional OpenAI extra:
docker build --build-arg INSTALL_EXTRAS=openai -t linkedin-webscraper:openai .
Run with mounted artifacts and runtime config:
docker run --rm \
-v ${PWD}/artifacts:/app/artifacts \
-v ${PWD}/runtime.toml:/app/runtime.toml \
-e LINKEDIN_WEB_SCRAPER_CONFIG=/app/runtime.toml \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
linkedin-webscraper:openai scrape daily
State And Artifacts
Default managed locations inside the container and local runtime are the same:
artifacts/jobsartifacts/logsartifacts/state
That keeps local runs, containers, and future CI schedulers on the same path contract.