# How to Locally Validate AI Engineering Lessons Before Submitting a PR

> Validate AI engineering lessons locally before submitting a PR. Run audit, check_readme_counts, and lesson_run scripts to catch errors and ensure code quality before your pull request.

- Repository: [Rohit Ghumare/ai-engineering-from-scratch](https://github.com/rohitg00/ai-engineering-from-scratch)
- Tags: how-to-guide
- Published: 2026-06-08

---

**Run the [`audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/audit_lessons.py), [`check_readme_counts.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/check_readme_counts.py), and [`lesson_run.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/lesson_run.py) scripts from the repository's `scripts/` directory to catch naming violations, documentation errors, and syntax issues before opening a pull request.**

Contributing to the *AI Engineering from Scratch* curriculum requires ensuring that every new lesson follows strict structural invariants and that curriculum metadata stays synchronized. Before submitting a PR, you should **locally validate AI engineering lessons** using the built-in validation scripts that mirror the CI checks. These Python utilities verify everything from folder naming conventions to README badge counts, giving you immediate feedback without waiting for GitHub Actions.

## Run the Lesson-Wide Invariant Checker (audit_lessons.py)

The [`scripts/audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py) script is the first line of defense against structural violations. It iterates over every `phases/*/NN-slug/` directory to enforce the repository's content standards.

Execute the checker from the repository root:

```bash
python3 scripts/audit_lessons.py

```

The script validates the following invariants for each lesson:

- **Folder naming**: The directory must match the `NN-slug` pattern (two-digit number followed by a hyphen and URL-safe slug).
- **Documentation requirements**: [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) must exist, be valid UTF-8, contain a top-level H1 heading, and be at least 200 bytes.
- **Code presence**: The `code/` folder must contain at least one non-ignored file.
- **Quiz schema**: [`quiz.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/quiz.json) must follow the canonical structure (exactly six questions with valid correct option indices).
- **Link integrity**: All internal Markdown links must resolve to real files inside the repository.

A non-zero exit status indicates at least one lesson failed validation. Review the concise printed report to identify specific paths and errors.

## Synchronize README Badge Counts (check_readme_counts.py)

The [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) file displays hard-coded badge counts for lessons, phases, skills, and prompts. These numbers must stay synchronized with the generated [`catalog.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/catalog.json) file.

Run the count validator to detect drift:

```bash
python3 scripts/check_readme_counts.py

```

If the script reports mismatches between the README's regex-patterned badges and the actual curriculum metadata, auto-correct the README with:

```bash
python3 scripts/check_readme_counts.py --fix

```

This updates the badge values in [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) to match the current state of [`catalog.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/catalog.json), ensuring that contributors see accurate statistics when browsing the repository.

## Smoke Test Python Code for Syntax Errors (lesson_run.py)

Before trusting that lesson code actually runs, execute [`scripts/lesson_run.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/lesson_run.py) to perform a syntax-only smoke test using Python's built-in `py_compile` module.

Run the basic syntax check across all lessons:

```bash
python3 scripts/lesson_run.py

```

This walks all `phases/*/*/code/**/*.py` files and byte-compiles them, reporting any syntax errors without executing the actual logic. For stricter validation that fails immediately on the first error, add the `--strict` flag:

```bash
python3 scripts/lesson_run.py --strict

```

Optionally, you can execute the entry scripts (`main.*`) to verify runtime behavior. Lessons that declare heavy external dependencies via a header comment (e.g., `# requires: torch, transformers`) are automatically skipped:

```bash
python3 scripts/lesson_run.py --execute

```

## Execute Unit Tests for Individual Lessons

Each lesson ships with its own test suite under `code/tests/`. After modifying a specific lesson, navigate to its directory and run the local unit tests to verify your implementation satisfies the lesson specifications.

Replace the placeholders with your actual phase and lesson identifiers:

```bash
cd phases/<phase-number>-<phase-slug>/<lesson-number>-<lesson-slug>/code
python3 -m unittest discover tests -v

```

This step catches logic errors that syntax checks miss, ensuring that algorithms, data transformations, and API integrations behave as expected according to the lesson requirements.

## Regenerate the Catalog (Optional Sanity Check)

If you suspect metadata drift or want to verify that [`catalog.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/catalog.json) accurately reflects the current lesson tree, regenerate the catalog manually:

```bash
python3 scripts/build_catalog.py

```

This updates [`catalog.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/catalog.json) based on the current filesystem state, which you can then diff against the previous version to ensure no stray files are out of sync.

## Summary

- **[`scripts/audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py)** validates naming conventions, documentation structure, quiz schema, and internal link integrity across all lessons.
- **[`scripts/check_readme_counts.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/check_readme_counts.py)** ensures README badge counts match [`catalog.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/catalog.json), with an optional `--fix` flag to auto-correct discrepancies.
- **[`scripts/lesson_run.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/lesson_run.py)** performs syntax checks on all Python code and optionally executes entry points while skipping dependency-heavy lessons.
- **Lesson-specific unit tests** live in `phases/*/*/code/tests/` and verify implementation correctness for individual lessons.

Running these three core validation scripts before pushing your branch catches nearly all CI-blocking issues, allowing you to submit a clean PR with confidence.

## Frequently Asked Questions

### What does a non-zero exit status from audit_lessons.py indicate?

A non-zero exit status means at least one lesson violated the structural invariants. Check the printed report for specific failures such as missing [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) files, invalid quiz schemas, broken internal links, or empty `code/` directories. Fix the reported issues and re-run the script until it exits with status zero.

### How can I automatically fix README badge count mismatches?

Pass the `--fix` argument to [`scripts/check_readme_counts.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/check_readme_counts.py). This updates the hard-coded counts in [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) to match the current state of [`catalog.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/catalog.json), eliminating manual editing and preventing CI failures due to stale statistics.

### Can I execute lesson code locally without installing heavy dependencies like PyTorch?

Yes. When using `python3 scripts/lesson_run.py --execute`, the script respects dependency declarations in header comments (e.g., `# requires: torch, transformers`). Lessons declaring requirements that aren't satisfied are automatically skipped, allowing you to safely test lightweight lessons without installing heavy ML frameworks.

### Where should unit tests be located for a new lesson?

Unit tests must reside in the `tests/` subdirectory within the lesson's `code/` folder, following the path pattern `phases/<phase-number>-<phase-slug>/<lesson-number>-<lesson-slug>/code/tests/`. The discovery command `python3 -m unittest discover tests -v` expects this structure to locate and execute your test modules.