How the rohitg00/ai-engineering-from-scratch Repository Is Organized by Lessons

The rohitg00/ai-engineering-from-scratch repository follows a strict hierarchical structure where 20 curriculum phases contain self-contained lessons, each standardized with documentation, code implementations, quizzes, and output artifacts in predictable folder locations.

The rohitg00/ai-engineering-from-scratch repository is a curriculum-style codebase designed for systematic AI engineering education. Every lesson follows a standardized folder structure enforced by the Lesson Contract documented in AGENTS.md, making the content discoverable for both human learners and automated tooling. This organizational pattern ensures that all 20 phases of the curriculum maintain consistency from "Math Foundations" through "Agent Engineering."

Hierarchical Phase-Lesson Structure

The repository organizes content under a phases/ directory using a strict naming convention that enables programmatic discovery and navigation.

Top-Level Organization

Each phase resides in phases/<phase-number>-<phase-slug>/, while individual lessons follow the pattern <lesson-number>-<lesson-slug>/ nested within their respective phases. For example, the first lesson of the first phase lives at phases/01-math-foundations/01-linear-algebra-intuition/.

The README.md (lines 51-66) serves as the central navigation hub, listing all phases in collapsible markdown tables. Each table entry links directly to the lesson directory and specifies the implementation languages, such as Python, TypeScript, Rust, or Julia.

The 20-Phase Curriculum

According to the source documentation, the curriculum spans 20 phases numbered 0 through 19. The README.md presents these phases with markdown tables structured as:

| # | Lesson | Type | Lang |

|---|--------|------|------|
| 01 | [Linear Algebra Intuition](phases/01-math-foundations/01-linear-algebra-intuition/) | Learn | Python, Julia |
| 02 | [Vectors, Matrices & Operations](phases/01-math-foundations/02-vectors-matrices-operations/) | Build | Python, Julia |

This table format allows the site generator (site/build.js) to parse links and produce site/data.js for the live web interface.

The Four Core Components of Every Lesson

Every lesson directory contains four mandatory components that satisfy the Lesson Contract defined in AGENTS.md (lines 63-84).

Documentation (docs/en.md)

The docs/en.md file contains the human-readable narrative, learning objectives, and prerequisites. This file includes front-matter metadata, notably the **Languages:** declaration, which must match the implementation files present in the code/ directory.

Code Implementation (code/)

The code/ directory houses main.<ext>—the minimal reference implementation using the extension appropriate for the lesson's declared languages (.py, .ts, .rs, or .jl). The directory also includes a tests/ subdirectory containing unit tests that verify the implementation.

Assessment (quiz.json)

Each lesson includes a quiz.json file containing exactly six questions: one pre-assessment, three checkpoint questions, and two post-assessment questions. This schema follows the strict structure:

{
  "questions": [
    {"stage": "pre", ...},
    {"stage": "check", ...},
    {"stage": "check", ...},
    {"stage": "check", ...},
    {"stage": "post", ...},
    {"stage": "post", ...}
  ]
}

Generated Artifacts (outputs/)

The outputs/ directory stores reusable AI artifacts produced by completing the lesson. These may include prompts, skills, agents, or MCP servers that can be installed or deployed directly into production environments.

Programmatic Lesson Discovery

Because the repository follows strict naming conventions, you can discover and validate lessons programmatically. The following Python script mirrors the directory conventions described in AGENTS.md and extracts lesson metadata:

import pathlib
import re

repo_root = pathlib.Path(__file__).parent.parent
lesson_pattern = re.compile(r"^\d{2}-(.+)$")

def discover_lessons():
    lessons = []
    for phase_dir in (repo_root / "phases").iterdir():
        if not phase_dir.is_dir():
            continue
        for lesson_dir in phase_dir.iterdir():
            if not lesson_dir.is_dir():
                continue
            doc_path = lesson_dir / "docs" / "en.md"
            title = None
            with doc_path.open() as f:
                for line in f:
                    if line.startswith("# "):

                        title = line[2:].strip()
                        break
            lessons.append({
                "phase": phase_dir.name,
                "lesson": lesson_dir.name,
                "title": title or "Untitled",
            })
    return lessons

if __name__ == "__main__":
    for info in discover_lessons():
        print(f"{info['phase']}{info['lesson']}: {info['title']}")

To validate that a lesson's quiz follows the required schema, use this validation function:

import json
from pathlib import Path

def load_quiz(lesson_path: Path) -> dict:
    with (lesson_path / "quiz.json").open() as f:
        return json.load(f)

def validate_quiz(quiz: dict) -> bool:
    if len(quiz.get("questions", [])) != 6:
        return False
    stages = [q["stage"] for q in quiz["questions"]]
    return stages == ["pre", "check", "check", "check", "post", "post"]

# Example usage

lesson_dir = Path("phases/01-math-foundations/01-linear-algebra-intuition")
quiz = load_quiz(lesson_dir)
print("Quiz valid?", validate_quiz(quiz))

The CI pipeline defined in .github/workflows/curriculum.yml automatically runs similar audits on every merge, ensuring the README counts stay synchronized and the site data rebuilds.

Summary

  • Hierarchical Structure: Lessons live at phases/<phase>-<slug>/<lesson>-<slug>/ following strict naming conventions.
  • Mandatory Components: Every lesson must contain docs/en.md, code/main.<ext> with tests, quiz.json, and an outputs/ directory.
  • Lesson Contract: AGENTS.md enforces that declared languages match implementation files and that quizzes contain exactly six questions in the prescribed order.
  • Automated Discovery: The consistent structure enables parsing by site/build.js and validation through CI workflows.

Frequently Asked Questions

What is the naming convention for lesson directories?

Lesson directories use the format <NN>-<lesson-slug> where NN is a two-digit number. This convention appears in both the folder structure and the markdown tables of README.md, allowing the site generator to parse paths programmatically.

How does the repository validate that lessons follow the required structure?

The Lesson Contract documented in AGENTS.md (lines 63-84) defines hard rules that are enforced by the CI workflow in .github/workflows/curriculum.yml. This pipeline audits lessons, verifies that front-matter languages match code implementations, and checks that each quiz.json contains exactly six questions in the correct sequence.

Where are the unit tests located within each lesson?

Unit tests reside in the code/tests/ subdirectory within each lesson folder. According to the repository conventions, every lesson must include tests that verify the main.<ext> implementation, with the file extension matching the languages declared in the docs/en.md front-matter.

How is the curriculum website generated from the repository structure?

The site/build.js script parses the markdown links in README.md, ROADMAP.md, and GLOSSARY.md to produce site/data.js. This data file powers the live website, and the process is automated via the curriculum.yml GitHub Actions workflow, which rebuilds the site on every merge to the main branch.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →