Getting Started
Installation
Base install:
pip install LinkedInWebScraper
Install the optional OpenAI extra when you need enrichment:
pip install LinkedInWebScraper[openai]
For local development:
pip install -e .[dev]
Programmatic Scrape
Use canonical imports for new code:
from linkedin_web_scraper import (
JobScraperConfig,
LinkedInJobScraper,
RemoteType,
configure_logging,
)
logger = configure_logging(filename="example.log")
config = JobScraperConfig(
position="Data Scientist",
location="Monterrey",
remote=RemoteType.REMOTE,
)
jobs = LinkedInJobScraper(logger=logger, config=config).run()
print(jobs.head())
Enable OpenAI Enrichment
config = JobScraperConfig(
position="Data Scientist",
location="Monterrey",
remote=RemoteType.REMOTE,
openai_enabled=True,
openai_model="gpt-4o-mini",
)
Set OPENAI_API_KEY in the environment before running the scraper. The library does not load .env files during import.
For the current PowerShell session on Windows:
$env:OPENAI_API_KEY = "sk-..."
python examples/example_openai.py
To persist the key for your user account on Windows without committing it:
[Environment]::SetEnvironmentVariable("OPENAI_API_KEY", "sk-...", "User")
CLI Quickstart
Use the runtime template as a starting point:
copy runtime.example.toml runtime.toml
Preview a once scrape without hitting LinkedIn:
linkedin-webscraper scrape once --config runtime.toml --dry-run
Run the daily workflow:
linkedin-webscraper scrape daily --config runtime.toml
Export a persisted run from SQLite:
linkedin-webscraper export --config runtime.toml --run-id <run-id>
Examples
The runnable examples live under examples/:
python examples/example.py
python examples/example_advanced_config.py
python examples/example_openai.py
Root Runtime Scripts
The root runtime scripts remain available for direct execution:
python main.py
python process_ds_jobs.py
main.py defaults to the daily workflow. process_ds_jobs.py defaults to the single-location once workflow.
Managed Artifacts
By default:
- logs go to
artifacts/logs - CSV exports go to
artifacts/jobs - SQLite state goes to
artifacts/state
Validate Local Changes
Default local checks:
python -m pytest -q
python -m ruff check .
python -m mkdocs build --strict
Current enforced Pyrefly seam:
python -m pyrefly check src/linkedin_web_scraper/config/job_scraper_config.py src/linkedin_web_scraper/config/job_scraper_advanced_config.py src/linkedin_web_scraper/config/job_scraper_config_factory.py src/linkedin_web_scraper/config/openai.py src/linkedin_web_scraper/config/storage.py src/linkedin_web_scraper/config/options.py src/linkedin_web_scraper/config/runtime.py src/linkedin_web_scraper/application/daily_scrape_service.py src/linkedin_web_scraper/application/linkedin_job_scraper.py src/linkedin_web_scraper/application/storage.py src/linkedin_web_scraper/application/runtime_runner.py src/linkedin_web_scraper/domain/job_data_cleaner.py src/linkedin_web_scraper/domain/job_title_classifier.py src/linkedin_web_scraper/infra/logging.py src/linkedin_web_scraper/infra/paths.py src/linkedin_web_scraper/infra/http/policy.py src/linkedin_web_scraper/infra/http/utils.py src/linkedin_web_scraper/infra/http/job_scraper.py src/linkedin_web_scraper/infra/openai/models.py src/linkedin_web_scraper/infra/openai/openai_handler.py src/linkedin_web_scraper/infra/openai/job_description_processor.py src/linkedin_web_scraper/infra/storage/models.py src/linkedin_web_scraper/infra/storage/sqlite.py