# How AI Engineering from Scratch Audits Lesson Structures: Automated Curriculum Validation

> Learn how ai-engineering-from-scratch audits lesson structures using an automated pipeline and GitHub Actions. Ensure curriculum integrity with strict validation rules.

- Repository: [Rohit Ghumare/ai-engineering-from-scratch](https://github.com/rohitg00/ai-engineering-from-scratch)
- Tags: how-to-guide
- Published: 2026-06-09

---

**The rohitg00/ai-engineering-from-scratch repository maintains curriculum integrity through an automated lesson-audit pipeline in [`scripts/audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py) that validates every lesson against 10+ strict rules—including directory naming, documentation standards, code presence, and quiz schema—blocking merges via GitHub Actions until all violations are resolved.**

The rohitg00/ai-engineering-from-scratch project manages a massive catalog of 435 lessons, requiring rigorous standardization to ensure every module follows consistent structural conventions. To prevent drift and maintain quality, the repository implements a **self-checking curriculum engine** that automatically audits lesson directories against invariant rules on every pull request. This system validates everything from file naming patterns to JSON schema compliance, enforcing a zero-tolerance policy for structural violations before code reaches the main branch.

## The Core Audit Pipeline

The validation logic resides in **[`scripts/audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py)**, which implements discrete check functions that inspect every lesson directory under `phases/`. When a rule violation is detected, the script instantiates an **`Issue`** object (defined at lines 38‑44) capturing the rule code, lesson path, file path, and a human-readable message. The script aggregates all issues and exits with **status 1** if any exist, causing dependent CI jobs to fail.

### Directory and Documentation Rules (L001‑L004)

The audit enforces strict organizational conventions through dedicated validation functions:

- **L001 – Directory Naming**: `check_lesson_dir_pattern()` (lines 85‑94) validates that every lesson folder matches the `NN-slug` pattern (e.g., `01-intro-to-nn`). This ensures lexical sorting correlates with curriculum progression.

- **L002‑L004 – Documentation Standards**: `check_docs_en_md()` (lines 97‑116) verifies that each lesson contains a [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) file that is UTF-8 encoded, exceeds **200 bytes** in size, and begins with a top-level `#` heading. These constraints guarantee that lessons contain substantive, properly formatted English documentation.

### Code and Quiz Validation (L005‑L009)

Beyond documentation, the audit verifies functional content and assessment integrity:

- **L005 – Code Presence**: `check_code_main()` (lines 119‑127) ensures the `code/` subdirectory contains at least one non-ignored source file, preventing empty lesson shells.

- **L006‑L009 – Quiz Schema**: `check_quiz()` (lines 129‑194) performs deep validation on [`quiz.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/quiz.json). It verifies valid JSON syntax and enforces the canonical schema requiring fields for `stage`, `question`, `options`, `correct`, and `explanation`. The function also validates that the `options` array contains between **2 and 6 items** (enforced via `MIN_OPTIONS = 2` and `MAX_OPTIONS = 6` at lines 76‑84), and that the `correct` index falls within range.

- **L007 – Legacy Format Detection**: Within `check_quiz()` (lines 57‑66), the audit detects obsolete key schemas like `q/choices/answer` and surfaces warnings, ensuring the curriculum migrates uniformly to the modern format.

### Link Integrity (L010)

The pipeline validates internal documentation links through **`check_internal_links()`** (lines 96‑112). This rule resolves all relative paths in [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) against the repository filesystem, ensuring that cross-references between lessons remain valid as the curriculum evolves.

## Continuous Integration Enforcement

The **[`.github/workflows/curriculum.yml`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/.github/workflows/curriculum.yml)** workflow automates the audit on every push to `main` and every pull request affecting curriculum files. The workflow executes three sequential steps: repository checkout, Python 3.12 setup, and execution of `python3 scripts/audit_lessons.py`. If the script returns a non-zero exit code due to rule violations, the workflow fails and blocks the PR from merging until developers resolve the underlying issues.

## Automated README Synchronization

To prevent documentation drift, the repository includes a complementary validator: **[`scripts/check_readme_counts.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/check_readme_counts.py)**. This script compares hard-coded lesson, phase, and skill counts in [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) against authoritative totals stored in **[`catalog.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/catalog.json)**.

The [`curriculum.yml`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/curriculum.yml) workflow defines a separate **`readme-counts-sync`** job that executes only on pushes to `main`. This job runs the script with the **`--fix`** flag, which rewrites [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) in-place with corrected statistics, then automatically commits the changes. This self-healing mechanism ensures the public-facing curriculum summary accurately reflects the actual 435-lesson catalog without requiring manual updates.

## Local Execution and Debugging

Developers can run the audit locally to catch violations before submitting pull requests:

```bash

# Generate human-readable report

python3 scripts/audit_lessons.py

# Output JSON for integration with other tooling

python3 scripts/audit_lessons.py --json

```

When violations occur, the script emits structured messages like the following:

```

[L004] phases/01-foundations/01-math-prereqs/docs/en.md: docs/en.md missing top-level H1

```

To synchronize README counts manually:

```bash
python3 scripts/check_readme_counts.py --fix
git diff README.md  # Review changes before committing

```

## Summary

- **Automated Validation**: The [`scripts/audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py) pipeline enforces 10+ invariant rules (L001‑L010) covering directory naming (`NN-slug`), documentation requirements (UTF-8, ≥200 bytes, H1 heading), code presence, and strict quiz JSON schemas.
- **CI Blocking**: The [`.github/workflows/curriculum.yml`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/.github/workflows/curriculum.yml) workflow executes the audit on every PR, exiting with status 1 when violations are detected to prevent broken lessons from merging.
- **Schema Rigor**: Quiz validation includes bounds checking (2‑6 options), legacy format detection, and link resolution to maintain curriculum-wide consistency.
- **Self-Healing Docs**: The [`check_readme_counts.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/check_readme_counts.py) utility with `--fix` automatically synchronizes [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) statistics against [`catalog.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/catalog.json) on every push to main.

## Frequently Asked Questions

### What triggers the lesson audit in AI Engineering from Scratch?

The audit triggers automatically via the **`curriculum`** GitHub Actions workflow defined in [`.github/workflows/curriculum.yml`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/.github/workflows/curriculum.yml). It runs on every push to the `main` branch and every pull request that modifies curriculum files, executing `python3 scripts/audit_lessons.py` to scan all lesson directories under `phases/`. If the script detects any rule violations, the workflow fails and prevents the PR from merging.

### How does the audit handle quiz schema violations?

The `check_quiz()` function validates that every [`quiz.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/quiz.json) file contains valid JSON adhering to the canonical schema with fields for `stage`, `question`, `options`, `correct`, and `explanation`. It enforces **L008** by verifying that the options array contains between 2 and 6 items using the `MIN_OPTIONS` and `MAX_OPTIONS` constants, and ensures the correct index points to a valid option. Legacy key formats such as `q/choices/answer` trigger **L007** warnings to prompt migration to the current schema.

### Can I run the lesson audit locally before submitting a PR?

Yes. Execute `python3 scripts/audit_lessons.py` from the repository root to receive a human-readable report, or append `--json` for machine-parseable output. The script replicates the CI environment exactly, including the **Issue** object aggregation (lines 38‑44) and exit code behavior, allowing you to fix structural errors—such as invalid directory names or missing [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) files—before the workflow blocks your pull request.

### Why does the README.md update automatically after pushes to main?

The **`readme-counts-sync`** job in the curriculum workflow runs only on the `main` branch, executing `python3 scripts/check_readme_counts.py --fix`. This command compares the hard-coded lesson counts in [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) against the authoritative [`catalog.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/catalog.json) and rewrites the markdown file in-place with accurate statistics. The workflow then commits these changes automatically, ensuring the repository documentation remains synchronized with the actual curriculum structure without manual intervention.