What the curriculum.yml GitHub Workflow Does in AI Engineering from Scratch: 4 Automated Jobs Explained
The curriculum.yml GitHub workflow in the AI Engineering from Scratch repository automatically audits lesson structure, synchronizes README statistics, rebuilds site data, and warns contributors about README drift on every push and pull request to main.
The curriculum.yml GitHub workflow is the automation backbone that keeps the curriculum coherent and publish-ready without manual intervention. Defined in .github/workflows/curriculum.yml, it runs four specialized jobs that enforce invariants and self-heal documentation. This article breaks down exactly what each job does, when it triggers, and how you can run the same checks locally.
Workflow Triggers and Scope
The workflow fires on every push to main and every pull request targeting main, but only when curriculum-related files change. This includes lessons, scripts, documentation, and the site generator. By filtering path changes, the workflow avoids unnecessary runs on unrelated updates.
The Four Jobs in curriculum.yml
The workflow’s responsibilities are split into four jobs that run in a specific order on main or provide feedback during code review.
audit: Validating the Curriculum Contract
The audit job runs on both pushes and pull requests. It executes scripts/audit_lessons.py to verify that every lesson follows the strict curriculum contract.
This check validates metadata, tests, and file layout. If any rule is broken, the script exits with a non-zero status and blocks the workflow.
python3 scripts/audit_lessons.py
Run this locally to lint all lesson directories before committing.
readme-counts-sync: Auto-Fixing README Statistics
The readme-counts-sync job runs only on pushes to main. It re-generates the lesson catalog and runs scripts/check_readme_counts.py --fix to rewrite the lesson-count tables in README.md.
After fixing the tables, the job commits the changes back to the branch. This prevents manual drift in the top-level README statistics. The push logic includes a safe retry loop with rebase handling, and it deliberately skips commits when the most recent message already indicates a bot-generated change.
To replicate this fix locally:
python3 scripts/build_catalog.py
python3 scripts/check_readme_counts.py --fix
git diff README.md
git add README.md
git commit -m "chore(readme): sync counts"
git push
site-rebuild: Keeping Site Data in Sync
The site-rebuild job runs on pushes to main after readme-counts-sync completes. It executes node site/build.js to rebuild site/data.js from the current catalog.
The rebuilt data file is then committed back to the repository. This ensures the public website always reflects the latest curriculum state.
Local usage mirrors the CI step exactly:
node site/build.js
git diff site/data.js
git add site/data.js
git commit -m "chore(site): rebuild data.js"
git push
readme-counts-drift: Advisory Checks for Pull Requests
The readme-counts-drift job runs only on pull requests. It builds the catalog and checks README counts, but instead of committing fixes, it emits a ::warning:: annotation if the README is out of sync.
This gives contributors early feedback that main will self-heal the counts on merge, avoiding unnecessary manual edits in the PR.
Simulate the advisory check locally with:
python3 scripts/build_catalog.py
if ! python3 scripts/check_readme_counts.py; then
echo "README counts out of sync – main branch will self-heal on merge"
fi
Key Files Behind the Automation
Several source files work together to power the curriculum.yml GitHub workflow:
.github/workflows/curriculum.yml— Defines the CI orchestration, job dependencies, and trigger conditions.scripts/audit_lessons.py— Validates lesson structure and metadata against the curriculum contract.scripts/build_catalog.py— Generates the temporary lesson catalog consumed by count checks and site rebuilds.scripts/check_readme_counts.py— Checks and optionally fixes the lesson-count tables inREADME.md.site/build.js— Re-createssite/data.jsfor the public website.README.md— The human-readable overview that the workflow auto-maintains.
Summary
- The
curriculum.ymlGitHub workflow triggers on pushes and PRs tomainwhen curriculum files change. - The
auditjob enforces lesson metadata and layout rules viascripts/audit_lessons.py. - The
readme-counts-syncjob auto-fixes and commits README count tables after every merge. - The
site-rebuildjob regeneratessite/data.jsvianode site/build.jsand commits the result. - The
readme-counts-driftjob warns PR authors if README counts are out of sync without blocking merges. - All scripts can be run locally to verify curriculum integrity before pushing.
Frequently Asked Questions
What triggers the curriculum.yml GitHub workflow in AI Engineering from Scratch?
The workflow triggers on every push to main and every pull request targeting main, but only when files affecting the curriculum are modified. This includes lesson content, scripts, documentation, and the site generator.
Why does the README update automatically instead of requiring manual edits?
The readme-counts-sync job runs scripts/check_readme_counts.py --fix on every push to main. This prevents human error and drift by automatically regenerating the lesson-count tables and committing the corrected README.md back to the repository.
Can I run the same checks locally that the GitHub workflow runs?
Yes. You can run python3 scripts/audit_lessons.py to validate lessons, python3 scripts/build_catalog.py followed by python3 scripts/check_readme_counts.py --fix to sync README counts, and node site/build.js to rebuild site data. These commands mirror the CI jobs in .github/workflows/curriculum.yml.
What happens if the audit job fails during a pull request?
If scripts/audit_lessons.py finds a curriculum contract violation, it exits with a non-zero status and fails the audit job. This blocks the workflow run and signals the contributor to fix the lesson structure, metadata, or file layout before merging.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →