The 20 Phases of the AI Engineering Curriculum: A Complete Developer Roadmap

The AI engineering curriculum is organized into 20 sequential phases that build from fundamental tooling to advanced capstone projects, with each phase residing under the phases/ directory and containing structured lessons with code implementations, documentation, and assessments.

The open-source repository rohitg00/ai-engineering-from-scratch provides a comprehensive, implementation-first educational framework for mastering modern AI engineering. Spanning 20 phases of the AI engineering curriculum, this resource progresses from environment setup and mathematical foundations to autonomous multi-agent systems and production infrastructure, following a pedagogical structure defined in the repository's main README.md (lines 57-80).

Curriculum Architecture and Lesson Structure

Each phase follows a uniform directory structure to ensure consistency across the learning path. Within any phases/<NN>-<phase-name>/ directory, individual lessons contain four standardized components: a code/ folder with implementation files, a docs/en.md file with instructional content, an outputs/ directory for generated artifacts, and a quiz.json file for knowledge validation. This scaffolding is enforced by the LESSON_TEMPLATE.md file in the repository root, which guarantees that learners encounter predictable navigation patterns whether they are exploring phases/01-math-foundations or phases/14-agent-engineering.

The repository uses automated tooling to maintain this structure. The scripts/audit_lessons.py script validates lesson completeness, while site/build.js generates the public-facing website aiengineeringfromscratch.com from the markdown curriculum. Progress tracking is maintained in ROADMAP.md, which documents completion status and upcoming work for each phase.

The 20 Phases: From Zero to Production

The curriculum enforces a strict numerical ordering (00-19) where each phase depends on competencies developed in previous sections. According to the source code analysis of the repository, the phases are:

  1. Setup & Tooling (phases/00-setup-and-tooling) – Development environment configuration, Git workflows, Docker containerization, Jupyter notebooks, and system profiling.

  2. Math Foundations (phases/01-math-foundations) – Linear algebra, calculus, probability theory, optimization algorithms, and graph theory fundamentals.

  3. ML Fundamentals (phases/02-ml-fundamentals) – Classical machine learning including regression, decision trees, SVMs, clustering algorithms, and scikit-learn pipelines.

  4. Deep Learning Core (phases/03-deep-learning-core) – Perceptrons, multi-layer neural networks, backpropagation mechanics, optimizer implementations, and construction of a mini deep-learning framework.

  5. Vision (phases/04-computer-vision) – Convolutional operations, CNN architectures, object detection, image segmentation, diffusion models, Vision Transformers (ViT), and 3D vision systems.

  6. NLP: Foundations to Advanced (phases/05-nlp-foundations-to-advanced) – Text tokenization, word embeddings, sequence-to-sequence models, attention mechanisms, LLM-style text generation, and Retrieval-Augmented Generation (RAG).

  7. Speech & Audio (phases/06-speech-and-audio) – Waveform processing, spectrogram analysis, Automatic Speech Recognition (ASR), OpenAI Whisper integration, Text-to-Speech (TTS), and voice cloning techniques.

  8. Transformers Deep Dive (phases/07-transformers-deep-dive) – Self-attention mechanisms, multi-head attention, positional encodings, BERT/GPT architecture implementations, Mixture of Experts (MoE), and KV-cache optimization.

  9. Generative AI (phases/08-generative-ai) – Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), diffusion models, latent diffusion, ControlNet, and video/audio generation systems.

  10. Reinforcement Learning (phases/09-reinforcement-learning) – Markov Decision Processes (MDPs), dynamic programming, Q-learning, Deep Q-Networks (DQN), policy gradients, PPO, RLHF, and multi-agent environments.

  11. LLMs from Scratch (phases/10-llms-from-scratch) – Byte-Pair Encoding (BPE) tokenizers, mini-GPT pre-training implementations, distributed training strategies, RLHF fine-tuning, and model quantization.

  12. LLM Engineering (phases/11-llm-engineering) – Advanced prompt engineering, RAG pipeline construction, LoRA fine-tuning, function calling APIs, and safety guardrails.

  13. Multimodal AI (phases/12-multimodal-ai) – Vision-language models (CLIP, BLIP-2), audio-language integration, video understanding, and omni-modal architectures.

  14. Tools & Protocols (phases/13-tools-and-protocols) – Tool-use interfaces, Model Context Protocol (MCP) fundamentals, server/client architectures, security considerations, and request routing.

  15. Agent Engineering (phases/14-agent-engineering) – Agent control loops, planning algorithms, memory system design, LangGraph implementations, AutoGen frameworks, and agent evaluation benchmarks.

  16. Autonomous Systems (phases/15-autonomous-systems) – Architectures for self-contained agents and autonomous system design patterns.

  17. Multi-Agent & Swarms (phases/16-multi-agent-and-swarms) – Inter-agent coordination protocols, hierarchical orchestration, and emergent swarm dynamics.

  18. Infrastructure & Production (phases/17-infrastructure-and-production) – Model deployment strategies, observability tooling, logging infrastructure, autoscaling, and CI/CD pipelines for AI services.

  19. Ethics & Alignment (phases/18-ethics-and-alignment) – AI safety protocols, bias mitigation techniques, interpretability methods (mechanistic interpretability), and Constitutional AI.

  20. Capstone Projects (phases/19-capstone-projects) – End-to-end real-world projects requiring integration of the full technology stack, from data ingestion to deployed inference.

Programmatically Exploring the Curriculum

You can interact with the curriculum structure programmatically using the repository's predictable file organization. The phases/ directory contains numbered folders that allow for automated traversal and validation.

List All Lessons Within a Phase

To enumerate all lessons in a specific phase (for example, Phase 1), use Python to traverse the directory structure:

import os
import json
import pathlib

def list_lessons(phase_folder: str):
    base = pathlib.Path('phases') / phase_folder
    lessons = sorted(p.name for p in base.iterdir() if p.is_dir())
    return lessons

print(list_lessons('01-math-foundations'))   # → ['01-linear-algebra-intuition', ...]

Inspect Lesson Documentation Metadata

Each lesson's documentation resides in docs/en.md with standardized front-matter. Extract the metadata header using standard Unix tools:


# Show the metadata header of lesson 01 in Phase 1

sed -n '1,15p' phases/01-math-foundations/01-linear-algebra-intuition/docs/en.md

Execute Lesson Implementations

Individual lessons contain runnable code in their code/ directories. For example, to run the perceptron implementation from Phase 3:


# Python example – run the perceptron implementation from Phase 3

python phases/03-deep-learning-core/01-the-perceptron/code/perceptron.py

Query the Complete Phase Catalogue

To generate an index of all 20 phases programmatically, use Node.js to read the directory structure:

const fs = require('fs');
const path = require('path');

const phases = fs.readdirSync('phases')
  .filter(name => fs.lstatSync(path.join('phases', name)).isDirectory());

console.log('All phases:', phases);

Summary

  • The 20 phases of the AI engineering curriculum provide a sequential learning path from basic tooling (Phase 0) to complex capstone projects (Phase 19).
  • Each phase resides in a numbered folder under phases/ (e.g., phases/10-llms-from-scratch) and contains standardized lesson subdirectories with code/, docs/, outputs/, and quiz.json.
  • The curriculum structure is defined in the repository's README.md and enforced through templates (LESSON_TEMPLATE.md) and automation scripts (scripts/audit_lessons.py).
  • Key documentation includes ROADMAP.md for tracking completion status and glossary/terms.md for definitional references.
  • The repository supports programmatic exploration, allowing learners to list lessons, inspect documentation, and execute code implementations via Python, Bash, or Node.js.

Frequently Asked Questions

The curriculum is designed for strict sequential progression from Phase 0 through Phase 19. Each phase builds upon competencies developed in previous sections—for example, Phase 10 (LLMs from Scratch) requires understanding of transformers from Phase 7 and deep learning fundamentals from Phase 3. The README.md explicitly presents this as a directed graph in a Mermaid diagram illustrating dependencies.

How are lessons structured within each phase?

Every lesson follows a four-component template: implementation code in a code/ directory, instructional content in docs/en.md, output artifacts in outputs/, and assessment questions in quiz.json. This structure is standardized across all phases, from phases/00-setup-and-tooling to phases/19-capstone-projects, ensuring learners know exactly where to find practical implementations versus theoretical explanations.

What tools are available for tracking curriculum completion?

The repository includes automated validation scripts located in the scripts/ directory, particularly audit_lessons.py, which verifies that each phase adheres to the required structure. Additionally, ROADMAP.md provides a human-readable tracking document that denotes which phases are complete, work-in-progress, or planned, while site/build.js generates the public website that renders the curriculum state.

Does the curriculum cover production deployment and MLOps?

Yes, Phase 17 (Infrastructure & Production) specifically addresses deployment strategies, observability, logging, scaling, and CI/CD for AI services. Earlier phases (particularly Phase 13: Tools & Protocols) introduce the Model Context Protocol (MCP) and tool interfaces that are essential for production agent systems, while Phase 18 covers the ethical and alignment considerations necessary for safe deployment.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →