AI Engineering From Scratch Folder Structure: Complete 20-Phase Curriculum Guide

The rohitg00/ai-engineering-from-scratch repository follows a strict hierarchical structure organized into 20 sequential phases, where every lesson directory contains exactly three sub-folders (code/, docs/, and outputs/) and is automatically validated by CI tooling.

This curriculum-style repository implements a flat-to-deep organization that supports progressive learning from development setup through advanced capstone projects. The folder structure is designed not only for human readability but also for automated processing by build tools and validation scripts. Below is a comprehensive breakdown of how the repository organizes its 20 phases, supporting infrastructure, and lesson artifacts.

Root-Level Directory Organization

The repository root contains eight top-level directories and several configuration files that govern the entire curriculum:

  • phases/ – The core curriculum containing 20 sequentially numbered phases (00 through 19)
  • site/ – Static site generator with build.js, data.js, and HTML templates
  • scripts/ – Python automation tools for validation and catalog generation
  • glossary/ – Shared terminology files (terms.md, myths.md) referenced across lessons
  • outputs/ – Global storage for reusable artifacts produced by lessons (skills, prompts, agents)
  • .github/ – CI workflows (workflows/curriculum.yml) and templates
  • README.md – Front-page phase table that feeds the site navigation
  • ROADMAP.md – Machine-readable lesson status tracker
  • AGENTS.md – Internal operating manual for the project

The Phases Directory Structure

Inside phases/, the curriculum is divided into 20 numbered directories (prefixed with two-digit codes) covering the complete AI engineering stack:

Phase 00: Setup and Tooling

Contains foundational lessons like 01-dev-environment/ that establish the development workflow used throughout the curriculum.

Phase 01: Math Foundations

Houses 22 individual lessons including 01-linear-algebra-intuition/, each following the standardized three-folder layout.

Remaining Phases (02-19)

The sequence continues through specialized domains:

  • 02-ml-fundamentals/ – Core machine learning concepts
  • 03-deep-learning-core/ – Neural network implementations
  • 04-computer-vision/ – CV-specific architectures
  • 05-nlp-foundations-to-advanced/ – Natural language processing
  • 06-speech-and-audio/ – Audio ML applications
  • 07-transformers-deep-dive/ – Transformer architecture details
  • 08-generative-ai/ – GenAI systems
  • 09-reinforcement-learning/ – RL algorithms
  • 10-llms-from-scratch/ – LLM construction
  • 11-llm-engineering/ – LLM deployment
  • 12-multimodal-ai/ – Cross-modal systems
  • 13-tools-and-protocols/ – MCP and agent protocols
  • 14-agent-engineering/ – Agent loop implementations
  • 15-autonomous-systems/ – Self-governing AI
  • 16-multi-agent-and-swarms/ – Distributed agent systems
  • 17-infrastructure-and-production/ – Production deployment
  • 18-ethics-and-alignment/ – AI safety
  • 19-capstone-projects/ – Final projects

Standard Lesson Directory Layout

Every lesson within a phase follows an identical three-folder contract. For example, phases/01-math-foundations/01-linear-algebra-intuition/ contains:

  • code/ – Runnable implementation files (Python, TypeScript, Rust, or Julia)
  • docs/en.md – The lesson narrative with YAML front-matter
  • outputs/ – Generated artifacts specific to that lesson (prompts, skills, agent specs)

This structure is enforced by scripts/audit_lessons.py, which validates that no lesson is committed without all three components. The consistency allows the static site generator (site/build.js) to parse README.md links and automatically build the curriculum UI without per-lesson configuration.

Automation and Tooling Directories

CI Validation Scripts

The scripts/ directory houses the authoritative tooling that maintains curriculum integrity:

Static Site Generation

The site/ directory contains the build pipeline:

Global Outputs Storage

The top-level outputs/ directory aggregates production artifacts from across all lessons:

  • skills/ – Reusable skill markdown files (e.g., skill-agent-loop.md from Phase 14)
  • prompts/ – Reusable prompt templates
  • agents/ – Generated agent SDK specifications
  • mcp-servers/ – Model Context Protocol server definitions

Programmatically Exploring the Structure

You can navigate the curriculum hierarchy using standard Python path operations. The following snippets demonstrate how to validate the folder structure programmatically:

import os
import pathlib

# List all phases

BASE = pathlib.Path("phases")
for phase_dir in sorted(BASE.iterdir()):
    print("Phase:", phase_dir.name)

# List lessons in a specific phase (e.g., Phase 01)

phase = BASE / "01-math-foundations"
for lesson_dir in sorted(phase.iterdir()):
    print("  Lesson:", lesson_dir.name)

# Verify a lesson contains required sub-folders

def check_lesson(path):
    required = {"code", "docs", "outputs"}
    present = {p.name for p in pathlib.Path(path).iterdir() if p.is_dir()}
    return required <= present

# Validate specific lesson integrity

print("Lesson 01-linear-algebra-intuition valid?",
      check_lesson("phases/01-math-foundations/01-linear-algebra-intuition"))

Summary

  • The repository organizes 20 curriculum phases (numbered 00-19) under the phases/ directory, covering everything from development setup to capstone projects.
  • Each lesson follows a strict three-folder layout: code/ for implementations, docs/en.md for narrative content, and outputs/ for generated artifacts.
  • Automation is centralized in scripts/ (Python validation) and site/ (Node.js site generation), with audit_lessons.py enforcing the one-commit-per-lesson rule.
  • Global resources like glossary/terms.md and the outputs/ hierarchy support cross-lesson terminology and reusable AI components.
  • The entire structure is machine-readable, allowing build_catalog.py and check_readme_counts.py to maintain synchronization between code and documentation without manual intervention.

Frequently Asked Questions

What is the standard folder structure inside each lesson directory?

Every lesson directory must contain exactly three sub-folders: code/ for runnable implementations (Python, TypeScript, Rust, or Julia), docs/ containing en.md (the lesson narrative with front-matter), and outputs/ for curriculum-produced artifacts like skills, prompts, or MCP server definitions. This structure is strictly enforced by scripts/audit_lessons.py in the CI pipeline.

How many phases are included in the ai-engineering-from-scratch curriculum?

The repository contains 20 phases (numbered 00 through 19) stored in the phases/ directory. Phase 00 covers setup and tooling, Phase 01 contains 22 lessons on math foundations, and the sequence progresses through deep learning, transformers, LLMs, agent engineering, and concludes with capstone projects in Phase 19.

What is the purpose of the top-level outputs/ directory?

The outputs/ directory serves as a global aggregation point for reusable AI artifacts generated across all lessons. It contains four sub-directories: skills/ (reusable skill markdown files), prompts/ (prompt templates), agents/ (agent SDK specifications), and mcp-servers/ (Model Context Protocol definitions). These artifacts can be imported and reused by other lessons or external projects.

How does the repository validate that lessons follow the correct structure?

Validation is handled by scripts/audit_lessons.py, which runs in the GitHub Actions workflow defined in .github/workflows/curriculum.yml. This script checks that every lesson directory contains the required code/, docs/, and outputs/ sub-folders, verifies that tests exist, and ensures lesson counts match the tables in README.md through scripts/check_readme_counts.py.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →