# Invariant Validation Checks and Rules in audit_lessons.py: Complete Guide to the Curriculum Auditor

> Master invariant validation checks including L001–L010 rules in audit_lessons.py. Ensure lesson directory structure, docs, code, and quiz schema integrity for the AI Engineering curriculum.

- Repository: [Rohit Ghumare/ai-engineering-from-scratch](https://github.com/rohitg00/ai-engineering-from-scratch)
- Tags: how-to-guide
- Published: 2026-06-04

---

**The [`audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/audit_lessons.py) script enforces ten structural invariant rules (L001–L010) that validate lesson directory naming, documentation completeness, code presence, and quiz schema integrity across the AI Engineering from Scratch curriculum.**

The [`audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/audit_lessons.py) file in the `rohitg00/ai-engineering-from-scratch` repository serves as the gatekeeper for curriculum quality, walking every lesson directory under `phases/` to ensure consistent structure. These **invariant validation checks** guarantee that each lesson meets strict standards for documentation, runnable code, and assessment data before merging. Understanding these rules helps contributors fix validation errors and maintain the repository's pedagogical integrity.

## How the Audit System Works

The validation engine centers on the `audit_lesson()` function (lines 14–22) which orchestrates the inspection sequence. For every lesson discovered, the auditor executes checks in the following order: **L001 → L004 → L005 → L006 → L008–L009 → L010**. Each failed check registers an issue tagged with a canonical rule code, enabling precise error tracking and automated CI integration.

## The 10 Invariant Validation Rules (L001–L010)

### Naming and Directory Structure (L001–L002)

**L001 – Lesson Directory Naming**
Directory names must match the pattern `NN-slug`, requiring a two-digit phase number, a hyphen, and lowercase alphanumerics or hyphens. This convention ensures consistent URL slugs and sorting across the curriculum. The regex validation appears in [[`scripts/audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py) lines 85–93](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py#L85-L93).

**L002 – Presence of docs/en.md**
Every lesson must contain a markdown file at [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md). This file serves as the primary English-language learning material. The existence check runs at [lines 99–101](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py#L99-L101).

### Documentation Quality Standards (L003–L004, L010)

**L003 – Minimum Documentation Size**
The [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) file must contain at least **200 bytes** of content. Files falling below this threshold are flagged as incomplete stubs. This length check executes at [lines 107–113](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py#L107-L113).

**L004 – Top-Level Heading**
The markdown must contain at least one H1 heading (`# …`) to ensure the lesson displays a visible title in rendered documentation. The regex scan appears at [lines 114–116](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py#L114-L116).

**L010 – Internal Markdown Links**
All relative links inside [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) must resolve to existing files or directories within the repository. External URLs, mailto links, and data URIs are ignored. This prevents broken cross-references between lessons. The link resolution logic spans [lines 996–1012](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py#L996-L1012).

### Code Presence Validation (L005)

**L005 – Non-Empty Code Directory**
If a lesson includes a `code/` folder, it must contain at least one source or configuration file, ignoring a small whitelist of system files (like `.DS_Store` or `__pycache__`). This ensures that lessons advertising code examples actually provide runnable material. The emptiness check runs at [lines 119–127](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py#L119-L127).

### Quiz Schema Integrity (L006–L009)

**L006 – Quiz JSON Schema**
The [`quiz.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/quiz.json) file (if present) must be valid JSON containing a non-empty `questions[]` array. Each question must include all canonical keys: `stage`, `question`, `options`, `correct`, and `explanation`. Missing keys, empty arrays, or malformed JSON trigger this error. See the validation at [lines 133–152](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py#L133-L152).

**L007 – Legacy Quiz Schema Detection**
This rule detects usage of deprecated schema keys (`q`, `choices`, `answer`) and triggers a warning encouraging migration to the current canonical schema. It helps maintain data consistency across the curriculum. The legacy-key detection appears at [lines 157–166](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py#L157-L166).

**L008 – Options Length**
Each `options` array within a quiz question must contain **2 to 6** entries. Values outside this range indicate either insufficient choices or excessive complexity. The bounds check executes at [lines 176–184](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py#L176-L184).

**L009 – Correct Answer Index**
The `correct` field must be an integer index satisfying `0 ≤ correct < len(options)`, ensuring the correct answer points to a valid option position. This prevents out-of-bound references in quiz rendering. The index validation runs at [lines 186–193](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py#L186-L193).

## Running the Audit Locally

Execute the validator from the repository root to scan the entire curriculum:

```bash

# Human-readable report

python scripts/audit_lessons.py

# JSON output for CI pipelines

python scripts/audit_lessons.py --json

# Limit audit to a specific phase (e.g., Phase 19)

python scripts/audit_lessons.py --phase 19

```

**Sample output format:**

```

audit_lessons.py — 435 lesson(s) checked, 3 issue(s)

  [L003] phases/08-generative-ai/04-conditional-gans-pix2pix/docs/en.md: docs/en.md shorter than 200 bytes (got 128)
  [L008] phases/10-llms-from-scratch/12-inference-optimization/quiz.json: question[2] options length must be 2..6 (got 1)
  [L010] phases/19-capstone-projects/84-refusal-evaluation/docs/en.md: internal link does not resolve: ./nonexistent.md

Summary by rule:
  L003: 1
  L008: 1
  L010: 1

```

**Programmatically parsing JSON output:**

```python
import json
import subprocess

result = subprocess.run(
    ["python", "scripts/audit_lessons.py", "--json"],
    capture_output=True, text=True, check=True
)
audit = json.loads(result.stdout)
print(f"Checked {audit['lessons_checked']} lessons, found {len(audit['issues'])} issues")

for issue in audit["issues"]:
    print(f"[{issue['rule']}] {issue['file']}: {issue['message']}")

```

## Summary

- **[`audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/audit_lessons.py)** implements ten invariant validation rules (L001–L010) that enforce directory naming, documentation standards, code presence, and quiz integrity.
- **L001–L002** validate physical structure and required files (`NN-slug` naming and [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) existence).
- **L003–L004** ensure documentation quality (minimum 200 bytes, H1 presence).
- **L005** requires non-empty `code/` directories when present.
- **L006–L009** enforce strict quiz JSON schema including valid options length (2–6) and correct answer indexing.
- **L010** prevents broken internal links by verifying all relative Markdown references resolve to existing repository paths.
- The `audit_lesson()` function at lines 14–22 orchestrates checks in a specific sequence, registering violations with canonical rule codes for actionable CI feedback.

## Frequently Asked Questions

### What happens if a lesson fails the L001 naming convention check?

The audit rejects the lesson with a violation indicating the directory name does not match the `NN-slug` pattern. Contributors must rename the folder to include a two-digit phase number followed by a hyphen and lowercase alphanumeric characters before the CI pipeline will pass.

### How does the audit handle missing quiz.json files?

Quiz validation rules (L006–L009) apply only when [`quiz.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/quiz.json) exists. Lessons without quizzes skip these checks entirely, but they must still satisfy documentation rules (L002–L004, L010) and the code presence rule (L005) if applicable.

### Can I run the audit on a specific phase only?

Yes. Use the `--phase` flag followed by the phase number to limit validation to a single phase. For example, `python scripts/audit_lessons.py --phase 19` audits only Phase 19, reducing execution time when developing or validating specific curriculum sections.

### What is the difference between L006 and L007 quiz validation rules?

**L006** enforces the current canonical schema requiring keys `stage`, `question`, `options`, `correct`, and `explanation`. **L007** specifically detects legacy schema usage (`q`, `choices`, `answer`) and issues a warning rather than an error, serving as a migration reminder for older lesson content.