how-to-guide

How to Locally Validate AI Engineering Lessons Before Submitting a PR

June 8, 2026 rohitg00/ai-engineering-from-scratch ↗

Run the audit_lessons.py, check_readme_counts.py, and lesson_run.py scripts from the repository's scripts/ directory to catch naming violations, documentation errors, and syntax issues before opening a pull request.

Contributing to the AI Engineering from Scratch curriculum requires ensuring that every new lesson follows strict structural invariants and that curriculum metadata stays synchronized. Before submitting a PR, you should locally validate AI engineering lessons using the built-in validation scripts that mirror the CI checks. These Python utilities verify everything from folder naming conventions to README badge counts, giving you immediate feedback without waiting for GitHub Actions.

Run the Lesson-Wide Invariant Checker (audit_lessons.py)

The scripts/audit_lessons.py script is the first line of defense against structural violations. It iterates over every phases/*/NN-slug/ directory to enforce the repository's content standards.

Execute the checker from the repository root:

python3 scripts/audit_lessons.py

The script validates the following invariants for each lesson:

Folder naming: The directory must match the NN-slug pattern (two-digit number followed by a hyphen and URL-safe slug).
Documentation requirements: docs/en.md must exist, be valid UTF-8, contain a top-level H1 heading, and be at least 200 bytes.
Code presence: The code/ folder must contain at least one non-ignored file.
Quiz schema: quiz.json must follow the canonical structure (exactly six questions with valid correct option indices).
Link integrity: All internal Markdown links must resolve to real files inside the repository.

A non-zero exit status indicates at least one lesson failed validation. Review the concise printed report to identify specific paths and errors.

Synchronize README Badge Counts (check_readme_counts.py)

The README.md file displays hard-coded badge counts for lessons, phases, skills, and prompts. These numbers must stay synchronized with the generated catalog.json file.

Run the count validator to detect drift:

python3 scripts/check_readme_counts.py

If the script reports mismatches between the README's regex-patterned badges and the actual curriculum metadata, auto-correct the README with:

python3 scripts/check_readme_counts.py --fix

This updates the badge values in README.md to match the current state of catalog.json, ensuring that contributors see accurate statistics when browsing the repository.

Smoke Test Python Code for Syntax Errors (lesson_run.py)

Before trusting that lesson code actually runs, execute scripts/lesson_run.py to perform a syntax-only smoke test using Python's built-in py_compile module.

Run the basic syntax check across all lessons:

python3 scripts/lesson_run.py

This walks all phases/*/*/code/**/*.py files and byte-compiles them, reporting any syntax errors without executing the actual logic. For stricter validation that fails immediately on the first error, add the --strict flag:

python3 scripts/lesson_run.py --strict

Optionally, you can execute the entry scripts (main.*) to verify runtime behavior. Lessons that declare heavy external dependencies via a header comment (e.g., # requires: torch, transformers) are automatically skipped:

python3 scripts/lesson_run.py --execute

Execute Unit Tests for Individual Lessons

Each lesson ships with its own test suite under code/tests/. After modifying a specific lesson, navigate to its directory and run the local unit tests to verify your implementation satisfies the lesson specifications.

Replace the placeholders with your actual phase and lesson identifiers:

cd phases/<phase-number>-<phase-slug>/<lesson-number>-<lesson-slug>/code
python3 -m unittest discover tests -v

This step catches logic errors that syntax checks miss, ensuring that algorithms, data transformations, and API integrations behave as expected according to the lesson requirements.

Regenerate the Catalog (Optional Sanity Check)

If you suspect metadata drift or want to verify that catalog.json accurately reflects the current lesson tree, regenerate the catalog manually:

python3 scripts/build_catalog.py

This updates catalog.json based on the current filesystem state, which you can then diff against the previous version to ensure no stray files are out of sync.

Summary

scripts/audit_lessons.py validates naming conventions, documentation structure, quiz schema, and internal link integrity across all lessons.
scripts/check_readme_counts.py ensures README badge counts match catalog.json, with an optional --fix flag to auto-correct discrepancies.
scripts/lesson_run.py performs syntax checks on all Python code and optionally executes entry points while skipping dependency-heavy lessons.
Lesson-specific unit tests live in phases/*/*/code/tests/ and verify implementation correctness for individual lessons.

Running these three core validation scripts before pushing your branch catches nearly all CI-blocking issues, allowing you to submit a clean PR with confidence.

Frequently Asked Questions

What does a non-zero exit status from audit_lessons.py indicate?

A non-zero exit status means at least one lesson violated the structural invariants. Check the printed report for specific failures such as missing docs/en.md files, invalid quiz schemas, broken internal links, or empty code/ directories. Fix the reported issues and re-run the script until it exits with status zero.

How can I automatically fix README badge count mismatches?

Pass the --fix argument to scripts/check_readme_counts.py. This updates the hard-coded counts in README.md to match the current state of catalog.json, eliminating manual editing and preventing CI failures due to stale statistics.

Can I execute lesson code locally without installing heavy dependencies like PyTorch?

Yes. When using python3 scripts/lesson_run.py --execute, the script respects dependency declarations in header comments (e.g., # requires: torch, transformers). Lessons declaring requirements that aren't satisfied are automatically skipped, allowing you to safely test lightweight lessons without installing heavy ML frameworks.

Where should unit tests be located for a new lesson?

Unit tests must reside in the tests/ subdirectory within the lesson's code/ folder, following the path pattern phases/<phase-number>-<phase-slug>/<lesson-number>-<lesson-slug>/code/tests/. The discovery command python3 -m unittest discover tests -v expects this structure to locate and execute your test modules.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how rohitg00/ai-engineering-from-scratch works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →