AI Engineering From Scratch Folder Structure: Complete 20-Phase Curriculum Guide
The rohitg00/ai-engineering-from-scratch repository follows a strict hierarchical structure organized into 20 sequential phases, where every lesson directory contains exactly three sub-folders (code/, docs/, and outputs/) and is automatically validated by CI tooling.
This curriculum-style repository implements a flat-to-deep organization that supports progressive learning from development setup through advanced capstone projects. The folder structure is designed not only for human readability but also for automated processing by build tools and validation scripts. Below is a comprehensive breakdown of how the repository organizes its 20 phases, supporting infrastructure, and lesson artifacts.
Root-Level Directory Organization
The repository root contains eight top-level directories and several configuration files that govern the entire curriculum:
phases/– The core curriculum containing 20 sequentially numbered phases (00 through 19)site/– Static site generator withbuild.js,data.js, and HTML templatesscripts/– Python automation tools for validation and catalog generationglossary/– Shared terminology files (terms.md,myths.md) referenced across lessonsoutputs/– Global storage for reusable artifacts produced by lessons (skills, prompts, agents).github/– CI workflows (workflows/curriculum.yml) and templatesREADME.md– Front-page phase table that feeds the site navigationROADMAP.md– Machine-readable lesson status trackerAGENTS.md– Internal operating manual for the project
The Phases Directory Structure
Inside phases/, the curriculum is divided into 20 numbered directories (prefixed with two-digit codes) covering the complete AI engineering stack:
Phase 00: Setup and Tooling
Contains foundational lessons like 01-dev-environment/ that establish the development workflow used throughout the curriculum.
Phase 01: Math Foundations
Houses 22 individual lessons including 01-linear-algebra-intuition/, each following the standardized three-folder layout.
Remaining Phases (02-19)
The sequence continues through specialized domains:
02-ml-fundamentals/– Core machine learning concepts03-deep-learning-core/– Neural network implementations04-computer-vision/– CV-specific architectures05-nlp-foundations-to-advanced/– Natural language processing06-speech-and-audio/– Audio ML applications07-transformers-deep-dive/– Transformer architecture details08-generative-ai/– GenAI systems09-reinforcement-learning/– RL algorithms10-llms-from-scratch/– LLM construction11-llm-engineering/– LLM deployment12-multimodal-ai/– Cross-modal systems13-tools-and-protocols/– MCP and agent protocols14-agent-engineering/– Agent loop implementations15-autonomous-systems/– Self-governing AI16-multi-agent-and-swarms/– Distributed agent systems17-infrastructure-and-production/– Production deployment18-ethics-and-alignment/– AI safety19-capstone-projects/– Final projects
Standard Lesson Directory Layout
Every lesson within a phase follows an identical three-folder contract. For example, phases/01-math-foundations/01-linear-algebra-intuition/ contains:
code/– Runnable implementation files (Python, TypeScript, Rust, or Julia)docs/en.md– The lesson narrative with YAML front-matteroutputs/– Generated artifacts specific to that lesson (prompts, skills, agent specs)
This structure is enforced by scripts/audit_lessons.py, which validates that no lesson is committed without all three components. The consistency allows the static site generator (site/build.js) to parse README.md links and automatically build the curriculum UI without per-lesson configuration.
Automation and Tooling Directories
CI Validation Scripts
The scripts/ directory houses the authoritative tooling that maintains curriculum integrity:
audit_lessons.py– Validates lesson layout, checks for required sub-folders, and verifies test coveragebuild_catalog.py– Regeneratescatalog.json(git-ignored) from the current repository statecheck_readme_counts.py– Ensures the README.md phase tables stay synchronized with actual lesson countsinstall_skills.py– Installs all generated SKILL.md files into the appropriate outputs directory
Static Site Generation
The site/ directory contains the build pipeline:
build.js– Node.js script that parsesREADME.mdmarkdown tables to createsite/data.jsdata.js– Auto-generated JSON consumed by the interactive curriculum UIindex.htmlandlesson.html– Presentation layer templatesassets/– Images and SVG figures used in lesson rendering
Global Outputs Storage
The top-level outputs/ directory aggregates production artifacts from across all lessons:
skills/– Reusable skill markdown files (e.g.,skill-agent-loop.mdfrom Phase 14)prompts/– Reusable prompt templatesagents/– Generated agent SDK specificationsmcp-servers/– Model Context Protocol server definitions
Programmatically Exploring the Structure
You can navigate the curriculum hierarchy using standard Python path operations. The following snippets demonstrate how to validate the folder structure programmatically:
import os
import pathlib
# List all phases
BASE = pathlib.Path("phases")
for phase_dir in sorted(BASE.iterdir()):
print("Phase:", phase_dir.name)
# List lessons in a specific phase (e.g., Phase 01)
phase = BASE / "01-math-foundations"
for lesson_dir in sorted(phase.iterdir()):
print(" Lesson:", lesson_dir.name)
# Verify a lesson contains required sub-folders
def check_lesson(path):
required = {"code", "docs", "outputs"}
present = {p.name for p in pathlib.Path(path).iterdir() if p.is_dir()}
return required <= present
# Validate specific lesson integrity
print("Lesson 01-linear-algebra-intuition valid?",
check_lesson("phases/01-math-foundations/01-linear-algebra-intuition"))
Summary
- The repository organizes 20 curriculum phases (numbered 00-19) under the
phases/directory, covering everything from development setup to capstone projects. - Each lesson follows a strict three-folder layout:
code/for implementations,docs/en.mdfor narrative content, andoutputs/for generated artifacts. - Automation is centralized in
scripts/(Python validation) andsite/(Node.js site generation), withaudit_lessons.pyenforcing the one-commit-per-lesson rule. - Global resources like
glossary/terms.mdand theoutputs/hierarchy support cross-lesson terminology and reusable AI components. - The entire structure is machine-readable, allowing
build_catalog.pyandcheck_readme_counts.pyto maintain synchronization between code and documentation without manual intervention.
Frequently Asked Questions
What is the standard folder structure inside each lesson directory?
Every lesson directory must contain exactly three sub-folders: code/ for runnable implementations (Python, TypeScript, Rust, or Julia), docs/ containing en.md (the lesson narrative with front-matter), and outputs/ for curriculum-produced artifacts like skills, prompts, or MCP server definitions. This structure is strictly enforced by scripts/audit_lessons.py in the CI pipeline.
How many phases are included in the ai-engineering-from-scratch curriculum?
The repository contains 20 phases (numbered 00 through 19) stored in the phases/ directory. Phase 00 covers setup and tooling, Phase 01 contains 22 lessons on math foundations, and the sequence progresses through deep learning, transformers, LLMs, agent engineering, and concludes with capstone projects in Phase 19.
What is the purpose of the top-level outputs/ directory?
The outputs/ directory serves as a global aggregation point for reusable AI artifacts generated across all lessons. It contains four sub-directories: skills/ (reusable skill markdown files), prompts/ (prompt templates), agents/ (agent SDK specifications), and mcp-servers/ (Model Context Protocol definitions). These artifacts can be imported and reused by other lessons or external projects.
How does the repository validate that lessons follow the correct structure?
Validation is handled by scripts/audit_lessons.py, which runs in the GitHub Actions workflow defined in .github/workflows/curriculum.yml. This script checks that every lesson directory contains the required code/, docs/, and outputs/ sub-folders, verifies that tests exist, and ensures lesson counts match the tables in README.md through scripts/check_readme_counts.py.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →