Release and Automation
This repository ships four GitHub Actions workflows that cover validation, docs publishing, package publishing, and the scheduled daily scrape.
Workflow Inventory
ci.yml: runs the tox matrix on every push and pull request.docs.yml: builds MkDocs and deploys the generated site to GitHub Pages on pushes tomainand on manual dispatch.release.yml: auto-triggers from successful CI and Docs runs onmain, creates the GitHub Release object, and publishes to PyPI with trusted publishing.daily-scrape.yml: runs the scheduled multi-city scrape, preserves SQLite state on thedatabranch, uploads artifacts, and opens a failure issue when the automation breaks.
One-Time GitHub Setup
GitHub Pages
- In repository Settings > Pages, set the source to GitHub Actions before the first docs deployment.
- Keep
mkdocs.ymlaligned with the published Pages URL.
Trusted Publishing
- Create a GitHub environment named
pypiif you want environment-level approval or separation. - Configure a trusted publisher in PyPI so it trusts
.github/workflows/release.ymlfrom this repository. - No PyPI username/password or API token secret is required when trusted publishing is enabled.
- If you choose token-based publishing instead, that is a separate workflow path.
Recommended release posture:
- use
workflow_dispatchfor controlled PyPI recovery or manual release runs frommain - let the
workflow_runpath publish automatically when the version inpyproject.tomlincreases and both CI and Docs are green onmain - create the GitHub Release object before the PyPI publish step
Scheduled Runtime
- The scheduled scrape reads
.github/runtime/daily.toml. - Optional OpenAI use still requires
OPENAI_API_KEYas a repository secret. - The workflow keeps OpenAI disabled by default in TOML so the scheduled run stays resilient without external API dependencies.
Repository Permissions
- Allow GitHub Actions to push to the
databranch so the scheduled workflow can commit state. - Keep the
contents: writeandissues: writepermissions in the workflow file.
CI Flow
ci.yml is the push/PR gate.
It runs:
py311,py312,py313, andpy314linttypedocsbuild
This keeps the local tox contract and the GitHub CI contract identical.
Docs Publish Flow
docs.yml performs two jobs:
- install the docs build dependencies and run
python -m mkdocs build --strict - upload the
site/artifact and deploy it with the GitHub Pages deployment actions
The workflow is intentionally limited to main pushes and manual dispatch so preview behavior stays on the normal PR checks instead of publishing every branch.
Release Flow
release.yml now supports two release paths:
workflow_runonCIandDocscompletions for automatic PyPI releases frommainworkflow_dispatchfor controlled PyPI recovery or release reruns
The automated release job sequence is:
- confirm the current commit is on
mainand both CI and Docs succeeded for the same SHA - read the version from
pyproject.tomland compare it with the latest published release - skip if the version is not newer or the tag already exists
- build the sdist and wheel through
tox -e build - create the GitHub Release object and upload the built wheel and sdist
- publish the same built artifacts to PyPI with trusted publishing
Manual dispatch uses the same artifact flow, but it still respects the version gate so duplicate releases are skipped.
Rollback
PyPI does not allow overwriting a released version.
Rollback guidance:
- if a release is bad, yank it on PyPI
- fix the issue in the repo
- cut a new version and publish that replacement
- keep the GitHub Release notes clear about the superseding version
Daily Automation
daily-scrape.yml runs at 30 12 * * *, which is 12:30 UTC every day.
The workflow sequence is:
- install the package with the optional OpenAI extra available
- attach a
databranch worktree - restore the previous SQLite state from
data/state - initialize the SQLite schema before the scrape
- run a CLI dry run for visibility
- run
linkedin-webscraper scrape daily --config .github/runtime/daily.toml - copy
artifacts/stateback todata/state - copy current CSV exports to both
data/exports/latestanddata/exports/YYYY-MM-DD - commit and push the updated automation state back to
data - upload workflow artifacts and summarize the run
Failure Handling
The workflow includes:
- a dedicated concurrency group so two daily runs do not overlap
- artifact retention for 14 days
- an issue-based failure notification that opens or updates
[automation] Daily scrape failure - automatic closure of that issue once a later run succeeds
Operating Notes
- Keep secrets out of TOML and out of the repo.
- If you enable OpenAI for scheduled runs later, do it by combining a repo secret with
openai_enabled = truein.github/runtime/daily.tomlor a workflow env override. - The
databranch is the current persistence contract for GitHub-hosted automation. A future cloud database can replace it without changing the CLI surface. - Use
python -m tox -e preflightbefore risky pushes or merges. That local gate runs the same smoke, lint, type, docs, and build checks that the repo expects before release work.