Release and Automation

This repository ships four GitHub Actions workflows that cover validation, docs publishing, package publishing, and the scheduled daily scrape.

Workflow Inventory

ci.yml: runs the tox matrix on every push and pull request.
docs.yml: builds MkDocs and deploys the generated site to GitHub Pages on pushes to main and on manual dispatch.
release.yml: auto-triggers from successful CI and Docs runs on main, creates the GitHub Release object, and publishes to PyPI with trusted publishing.
daily-scrape.yml: runs the scheduled multi-city scrape, preserves SQLite state on the data branch, uploads artifacts, and opens a failure issue when the automation breaks.

One-Time GitHub Setup

GitHub Pages

In repository Settings > Pages, set the source to GitHub Actions before the first docs deployment.
Keep mkdocs.yml aligned with the published Pages URL.

Trusted Publishing

Create a GitHub environment named pypi if you want environment-level approval or separation.
Configure a trusted publisher in PyPI so it trusts .github/workflows/release.yml from this repository.
No PyPI username/password or API token secret is required when trusted publishing is enabled.
If you choose token-based publishing instead, that is a separate workflow path.

Scheduled Runtime

The scheduled scrape reads .github/runtime/daily.toml.
Optional OpenAI use still requires OPENAI_API_KEY as a repository secret.
The workflow keeps OpenAI disabled by default in TOML so the scheduled run stays resilient without external API dependencies.

Repository Permissions

Allow GitHub Actions to push to the data branch so the scheduled workflow can commit state.
Keep the contents: write and issues: write permissions in the workflow file.

CI Flow

ci.yml is the push/PR gate.

It runs:

py311, py312, py313, and py314
lint
type
docs
build

This keeps the local tox contract and the GitHub CI contract identical.

Docs Publish Flow

docs.yml performs two jobs:

install the docs build dependencies and run python -m mkdocs build --strict
upload the site/ artifact and deploy it with the GitHub Pages deployment actions

The workflow is intentionally limited to main pushes and manual dispatch so preview behavior stays on the normal PR checks instead of publishing every branch.

Release Flow

release.yml now supports two release paths:

workflow_run on CI and Docs completions for automatic PyPI releases from main
workflow_dispatch for controlled PyPI recovery or release reruns

The automated release job sequence is:

confirm the current commit is on main and both CI and Docs succeeded for the same SHA
read the version from pyproject.toml and compare it with the latest published release
skip if the version is not newer or the tag already exists
build the sdist and wheel through tox -e build
create the GitHub Release object and upload the built wheel and sdist
publish the same built artifacts to PyPI with trusted publishing

Manual dispatch uses the same artifact flow, but it still respects the version gate so duplicate releases are skipped.

Rollback

PyPI does not allow overwriting a released version.

Rollback guidance:

if a release is bad, yank it on PyPI
fix the issue in the repo
cut a new version and publish that replacement
keep the GitHub Release notes clear about the superseding version

Daily Automation

daily-scrape.yml runs at 30 12 * * *, which is 12:30 UTC every day.

The workflow sequence is:

install the package with the optional OpenAI extra available
attach a data branch worktree
restore the previous SQLite state from data/state
initialize the SQLite schema before the scrape
run a CLI dry run for visibility
run linkedin-webscraper scrape daily --config .github/runtime/daily.toml
copy artifacts/state back to data/state
copy current CSV exports to both data/exports/latest and data/exports/YYYY-MM-DD
commit and push the updated automation state back to data
upload workflow artifacts and summarize the run

Failure Handling

The workflow includes:

a dedicated concurrency group so two daily runs do not overlap
artifact retention for 14 days
an issue-based failure notification that opens or updates [automation] Daily scrape failure
automatic closure of that issue once a later run succeeds

Operating Notes

Keep secrets out of TOML and out of the repo.
If you enable OpenAI for scheduled runs later, do it by combining a repo secret with openai_enabled = true in .github/runtime/daily.toml or a workflow env override.
The data branch is the current persistence contract for GitHub-hosted automation. A future cloud database can replace it without changing the CLI surface.
Use python -m tox -e preflight before risky pushes or merges. That local gate runs the same smoke, lint, type, docs, and build checks that the repo expects before release work.