# AI Engineering From Scratch Folder Structure: Complete 20-Phase Curriculum Guide

> Explore the AI Engineering From Scratch folder structure with this comprehensive 20-phase curriculum guide. Discover the organized layout of code, docs, and outputs in the rohitg00/ai-engineering-from-scratch repository.

- Repository: [Rohit Ghumare/ai-engineering-from-scratch](https://github.com/rohitg00/ai-engineering-from-scratch)
- Tags: architecture
- Published: 2026-06-06

---

**The rohitg00/ai-engineering-from-scratch repository follows a strict hierarchical structure organized into 20 sequential phases, where every lesson directory contains exactly three sub-folders (`code/`, `docs/`, and `outputs/`) and is automatically validated by CI tooling.**

This curriculum-style repository implements a **flat-to-deep** organization that supports progressive learning from development setup through advanced capstone projects. The folder structure is designed not only for human readability but also for automated processing by build tools and validation scripts. Below is a comprehensive breakdown of how the repository organizes its 20 phases, supporting infrastructure, and lesson artifacts.

## Root-Level Directory Organization

The repository root contains eight top-level directories and several configuration files that govern the entire curriculum:

- **`phases/`** – The core curriculum containing 20 sequentially numbered phases (00 through 19)
- **`site/`** – Static site generator with [`build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/build.js), [`data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/data.js), and HTML templates
- **`scripts/`** – Python automation tools for validation and catalog generation
- **`glossary/`** – Shared terminology files ([`terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/terms.md), [`myths.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/myths.md)) referenced across lessons
- **`outputs/`** – Global storage for reusable artifacts produced by lessons (skills, prompts, agents)
- **`.github/`** – CI workflows ([`workflows/curriculum.yml`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/workflows/curriculum.yml)) and templates
- **[`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md)** – Front-page phase table that feeds the site navigation
- **[`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md)** – Machine-readable lesson status tracker
- **[`AGENTS.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/AGENTS.md)** – Internal operating manual for the project

## The Phases Directory Structure

Inside `phases/`, the curriculum is divided into 20 numbered directories (prefixed with two-digit codes) covering the complete AI engineering stack:

### Phase 00: Setup and Tooling

Contains foundational lessons like `01-dev-environment/` that establish the development workflow used throughout the curriculum.

### Phase 01: Math Foundations

Houses 22 individual lessons including `01-linear-algebra-intuition/`, each following the standardized three-folder layout.

### Remaining Phases (02-19)

The sequence continues through specialized domains:
- **`02-ml-fundamentals/`** – Core machine learning concepts
- **`03-deep-learning-core/`** – Neural network implementations
- **`04-computer-vision/`** – CV-specific architectures
- **`05-nlp-foundations-to-advanced/`** – Natural language processing
- **`06-speech-and-audio/`** – Audio ML applications
- **`07-transformers-deep-dive/`** – Transformer architecture details
- **`08-generative-ai/`** – GenAI systems
- **`09-reinforcement-learning/`** – RL algorithms
- **`10-llms-from-scratch/`** – LLM construction
- **`11-llm-engineering/`** – LLM deployment
- **`12-multimodal-ai/`** – Cross-modal systems
- **`13-tools-and-protocols/`** – MCP and agent protocols
- **`14-agent-engineering/`** – Agent loop implementations
- **`15-autonomous-systems/`** – Self-governing AI
- **`16-multi-agent-and-swarms/`** – Distributed agent systems
- **`17-infrastructure-and-production/`** – Production deployment
- **`18-ethics-and-alignment/`** – AI safety
- **`19-capstone-projects/`** – Final projects

## Standard Lesson Directory Layout

Every lesson within a phase follows an identical **three-folder contract**. For example, `phases/01-math-foundations/01-linear-algebra-intuition/` contains:

- **`code/`** – Runnable implementation files (Python, TypeScript, Rust, or Julia)
- **[`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md)** – The lesson narrative with YAML front-matter
- **`outputs/`** – Generated artifacts specific to that lesson (prompts, skills, agent specs)

This structure is enforced by [`scripts/audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py), which validates that no lesson is committed without all three components. The consistency allows the static site generator ([`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js)) to parse [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) links and automatically build the curriculum UI without per-lesson configuration.

## Automation and Tooling Directories

### CI Validation Scripts

The `scripts/` directory houses the authoritative tooling that maintains curriculum integrity:

- **[`audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/audit_lessons.py)** – Validates lesson layout, checks for required sub-folders, and verifies test coverage
- **[`build_catalog.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/build_catalog.py)** – Regenerates [`catalog.json`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/catalog.json) (git-ignored) from the current repository state
- **[`check_readme_counts.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/check_readme_counts.py)** – Ensures the README.md phase tables stay synchronized with actual lesson counts
- **[`install_skills.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/install_skills.py)** – Installs all generated SKILL.md files into the appropriate outputs directory

### Static Site Generation

The `site/` directory contains the build pipeline:
- **[`build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/build.js)** – Node.js script that parses [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) markdown tables to create [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js)
- **[`data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/data.js)** – Auto-generated JSON consumed by the interactive curriculum UI
- **[`index.html`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/index.html)** and **[`lesson.html`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/lesson.html)** – Presentation layer templates
- **`assets/`** – Images and SVG figures used in lesson rendering

### Global Outputs Storage

The top-level `outputs/` directory aggregates production artifacts from across all lessons:
- **`skills/`** – Reusable skill markdown files (e.g., [`skill-agent-loop.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/skill-agent-loop.md) from Phase 14)
- **`prompts/`** – Reusable prompt templates
- **`agents/`** – Generated agent SDK specifications
- **`mcp-servers/`** – Model Context Protocol server definitions

## Programmatically Exploring the Structure

You can navigate the curriculum hierarchy using standard Python path operations. The following snippets demonstrate how to validate the folder structure programmatically:

```python
import os
import pathlib

# List all phases

BASE = pathlib.Path("phases")
for phase_dir in sorted(BASE.iterdir()):
    print("Phase:", phase_dir.name)

# List lessons in a specific phase (e.g., Phase 01)

phase = BASE / "01-math-foundations"
for lesson_dir in sorted(phase.iterdir()):
    print("  Lesson:", lesson_dir.name)

# Verify a lesson contains required sub-folders

def check_lesson(path):
    required = {"code", "docs", "outputs"}
    present = {p.name for p in pathlib.Path(path).iterdir() if p.is_dir()}
    return required <= present

# Validate specific lesson integrity

print("Lesson 01-linear-algebra-intuition valid?",
      check_lesson("phases/01-math-foundations/01-linear-algebra-intuition"))

```

## Summary

- **The repository organizes 20 curriculum phases** (numbered 00-19) under the `phases/` directory, covering everything from development setup to capstone projects.
- **Each lesson follows a strict three-folder layout**: `code/` for implementations, [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) for narrative content, and `outputs/` for generated artifacts.
- **Automation is centralized** in `scripts/` (Python validation) and `site/` (Node.js site generation), with [`audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/audit_lessons.py) enforcing the one-commit-per-lesson rule.
- **Global resources** like [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) and the `outputs/` hierarchy support cross-lesson terminology and reusable AI components.
- **The entire structure is machine-readable**, allowing [`build_catalog.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/build_catalog.py) and [`check_readme_counts.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/check_readme_counts.py) to maintain synchronization between code and documentation without manual intervention.

## Frequently Asked Questions

### What is the standard folder structure inside each lesson directory?

Every lesson directory must contain exactly three sub-folders: `code/` for runnable implementations (Python, TypeScript, Rust, or Julia), `docs/` containing [`en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/en.md) (the lesson narrative with front-matter), and `outputs/` for curriculum-produced artifacts like skills, prompts, or MCP server definitions. This structure is strictly enforced by [`scripts/audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py) in the CI pipeline.

### How many phases are included in the ai-engineering-from-scratch curriculum?

The repository contains **20 phases** (numbered 00 through 19) stored in the `phases/` directory. Phase 00 covers setup and tooling, Phase 01 contains 22 lessons on math foundations, and the sequence progresses through deep learning, transformers, LLMs, agent engineering, and concludes with capstone projects in Phase 19.

### What is the purpose of the top-level outputs/ directory?

The `outputs/` directory serves as a global aggregation point for reusable AI artifacts generated across all lessons. It contains four sub-directories: `skills/` (reusable skill markdown files), `prompts/` (prompt templates), `agents/` (agent SDK specifications), and `mcp-servers/` (Model Context Protocol definitions). These artifacts can be imported and reused by other lessons or external projects.

### How does the repository validate that lessons follow the correct structure?

Validation is handled by [`scripts/audit_lessons.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/audit_lessons.py), which runs in the GitHub Actions workflow defined in [`.github/workflows/curriculum.yml`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/.github/workflows/curriculum.yml). This script checks that every lesson directory contains the required `code/`, `docs/`, and `outputs/` sub-folders, verifies that tests exist, and ensures lesson counts match the tables in [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) through [`scripts/check_readme_counts.py`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/check_readme_counts.py).