Clear Explanation of the Algorithms Used in AI Engineering From Scratch: A Complete Review
Yes, the AI Engineering From Scratch repository provides a clear, step-by-step explanation of every algorithm it covers, pairing theoretical markdown documentation with minimal, self-contained reference implementations and unit tests for each of its 435 lessons.
The AI Engineering From Scratch curriculum, maintained by rohitg00, is an open-source educational project that teaches modern AI systems from first principles across 435 structured lessons. Readers looking for a clear explanation of the algorithms used will find that every lesson couples a theoretical docs/en.md write-up with a minimal code/ implementation and automated tests. This design ensures that mathematical derivations for tokenizers, transformers, reinforcement learning, and optimization techniques are always traceable to short, runnable Python files.
Clear Explanation of the Algorithms Used: Structure of Every Lesson
Each lesson in the repository follows a strict three-part structure that separates why an algorithm works from how to build it.
-
Theory in plain English: The
docs/en.mdfile inside each lesson directory introduces the conceptual foundation, prerequisite knowledge, and learning objectives. Equations are rendered in LaTeX and link back to original research papers, such as Sennrich et al. 2016 for BPE or Schulman et al. 2017 for PPO. -
Minimal reference code: The companion
code/script contains only a few dozen lines, starts with a header comment pointing to the documentation, and avoids prohibited third-party dependencies enforced by CI. -
Unit-test proof: The
code/tests/directory for each lesson includes at least five tests that exercise the implementation and can be executed withpython3 -m unittest discover.
Because the entire curriculum is generated into a static site via site/build.js, the public README remains a browsable table that links directly to every algorithm’s derivation and source.
Algorithm Families Covered in the Curriculum
The repository spans foundational NLP, deep-learning architecture, reinforcement learning, distributed systems, and AI safety. Below are the major algorithm families and the exact paths where their clear explanations and implementations live.
-
Byte-Pair Encoding (BPE) tokenizer: The curriculum explains the greedy compression algorithm repurposed for sub-word tokenization, including the merge-selection rule, special-token handling, and training data pipelines. The derivation lives in
phases/10-llms-from-scratch/01-tokenizers/docs/en.md, and the reference implementation is inphases/10-llms-from-scratch/01-tokenizers/code/bpe.py. -
Transformer building blocks: Lessons cover scaled dot-product attention, multi-head attention, positional encoding, feed-forward blocks, residual connections, and layer normalization. The theory is documented in
phases/07-transformers-deep-dive/05-full-transformer/docs/en.md, with illustrative code inphases/07-transformers-deep-dive/05-full-transformer/code/transformer.py. -
Speculative decoding: Readers learn about draft-model generation, the verification step, failure-mode analysis, and the speed-vs.-quality trade-off. See
phases/10-llms-from-scratch/25-speculative-decoding/docs/en.mdandphases/10-llms-from-scratch/25-speculative-decoding/code/speculative.py. -
Reinforcement Learning (RL) algorithms: PPO, Q-learning, Monte-Carlo methods, Policy-Gradient, and RLHF are derived from first principles, including the clipped-objective, advantage estimators, and policy-gradient theorem. PPO theory is in
phases/09-reinforcement-learning/08-ppo/docs/en.mdwith code inphases/09-reinforcement-learning/08-ppo/code/ppo.py; Monte-Carlo methods are explained inphases/09-reinforcement-learning/03-monte-carlo-methods/docs/en.mdwith code inphases/09-reinforcement-learning/03-monte-carlo-methods/code/mc.py. -
Multi-Agent Reinforcement Learning (MARL): MADDPG, QMIX, and MAPPO are taught alongside discussions of non-stationarity, credit assignment, and cooperative versus competitive settings. Documentation is in
phases/16-multi-agent-and-swarms/20-marl-maddpg-qmix-mappo/docs/en.md, and the implementation is inphases/16-multi-agent-and-swarms/20-marl-maddpg-qmix-mappo/code/marl.py. -
Swarm optimization: Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), and Genetic Algorithms are mapped to prompt-parameter optimization, including fitness-function design and convergence diagnostics. The explanation is in
phases/16-multi-agent-and-swarms/19-swarm-optimization-pso-aco/docs/en.md, with code inphases/16-multi-agent-and-swarms/19-swarm-optimization-pso-aco/code/swarm.py. -
Differential privacy for LLMs: The curriculum defines (ε, δ)-differential privacy and implements the DP-SGD algorithm with clipping and noise injection. Theory is in
phases/18-ethics-safety-alignment/22-differential-privacy-for-llms/docs/en.md, and the implementation is inphases/18-ethics-safety-alignment/22-differential-privacy-for-llms/code/dp_sgd.py. -
Token-bucket rate limiting: A proof of burst handling, refill-rate math, and a practical API-gate implementation are provided in
phases/11-llm-engineering/11-caching-cost/docs/en.md, with the algorithm implemented inphases/11-llm-engineering/11-caching-cost/code/token_bucket.py. -
All-reduce collective operations: The two-pass reduce-scatter plus all-gather algorithm, bandwidth-optimal variants, and NCCL topology hints are explained in
phases/19-capstone-projects/76-collective-ops-from-scratch/docs/en.mdand implemented inphases/19-capstone-projects/76-collective-ops-from-scratch/code/all_reduce.py. -
Constitutional AI self-improvement: The generate-evaluate-select loop, deterministic grading rubric, and policy-gradient on synthetic rewards are covered in
phases/10-llms-from-scratch/09-constitutional-ai-self-improvement/docs/en.md, with code inphases/10-llms-from-scratch/09-constitutional-ai-self-improvement/code/constitutional_ai.py.
Code-Level Examples of Key Algorithms
To demonstrate how the repository grounds theory in practice, here are self-contained snippets taken directly from the reference implementations.
BPE Merge Step
The bpe_merge function in phases/10-llms-from-scratch/01-tokenizers/code/bpe.py implements a single iteration of the Byte-Pair Encoding merge rule.
# File: phases/10-llms-from-scratch/01-tokenizers/code/bpe.py
# See docs/en.md for the mathematical justification.
def bpe_merge(vocab: dict[str, int], merges: list[tuple[str, str]]) -> dict[str, int]:
"""Perform a single BPE merge on `vocab`."""
a, b = merges[0] # the most frequent pair
new_token = a + b
new_vocab = {}
for token, freq in vocab.items():
# Replace occurrences of the pair with the new token
new_tokenized = token.replace(a + " " + b, new_token)
new_vocab[new_tokenized] = freq
return new_vocab
The accompanying lesson explains why the most frequent adjacent pair is selected, how the merge reduces total symbol count, and how the loop updates the merge table.
Scaled Dot-Product Attention
The scaled dot-product attention mechanism is implemented in phases/07-transformers-deep-dive/05-full-transformer/code/attention.py using NumPy.
# File: phases/07-transformers-deep-dive/05-full-transformer/code/attention.py
# Minimal implementation of multi-head scaled dot-product attention.
import numpy as np
def attention(Q, K, V, mask=None):
"""Compute attention(Q, K, V) = softmax(QKᵀ / √d_k) V."""
dk = Q.shape[-1]
scores = Q @ K.transpose(-2, -1) / np.sqrt(dk)
if mask is not None:
scores = np.where(mask, scores, -1e9)
weights = np.exp(scores - scores.max(axis=-1, keepdims=True))
weights /= weights.sum(axis=-1, keepdims=True)
return weights @ V
The lesson documentation derives the scaling factor √dₖ and describes the purpose of the mask before showing how multiple heads are concatenated.
PPO Clipped Objective
Reinforcement learning in the curriculum culminates in the PPO clipped-objective, which is implemented in phases/09-reinforcement-learning/08-ppo/code/ppo.py.
# File: phases/09-reinforcement-learning/08-ppo/code/ppo.py
# Core PPO update step.
def ppo_loss(old_logp, new_logp, advantages, eps=0.2):
ratio = np.exp(new_logp - old_logp) # π_θ / π_θ_old
unclipped = ratio * advantages
clipped = np.clip(ratio, 1 - eps, 1 + eps) * advantages
return -np.mean(np.minimum(unclipped, clipped)) # negative for gradient descent
The documentation proves why clipping stabilizes training, and the unit tests compare this loss against a reference implementation.
Token-Bucket Rate Limiter
Engineering concepts are treated with the same rigor. The token-bucket algorithm in phases/11-llm-engineering/11-caching-cost/code/token_bucket.py demonstrates burst handling.
# File: phases/11-llm-engineering/11-caching-cost/code/token_bucket.py
import time
class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity # max tokens
self.tokens = capacity
self.refill_rate = refill_rate # tokens per second
self.last_ts = time.time()
def allow(self, n=1):
now = time.time()
# Refill tokens based on elapsed time
self.tokens = min(self.capacity,
self.tokens + (now - self.last_ts) * self.refill_rate)
self.last_ts = now
if self.tokens >= n:
self.tokens -= n
return True
return False
The lesson derives the refill-rate math and proves how the algorithm enables bursty traffic while guaranteeing a long-term average rate.
How the Curriculum Maintains Synchronization
A frequent problem in educational repositories is documentation drift. The AI Engineering From Scratch project mitigates this by treating each lesson as a single commit, as defined in AGENTS.md.
- The
LESSON_TEMPLATE.mdenforces standard front matter that lists learning objectives, prerequisites, and estimated time. - A CI-enforced dependency allow-list prevents prohibited third-party packages from entering reference implementations.
- The static site generator (
site/build.js→site/data.js) updates the public README automatically, ensuring browsable lesson tables always point to current doc and code paths.
Summary
- The AI Engineering From Scratch repository contains 435 lessons that each explain one algorithm or system concept from first principles.
- Every lesson provides a clear explanation of the algorithms used in
docs/en.md, a minimal implementation incode/, and unit tests incode/tests/. - Core algorithm families include BPE, transformer attention, PPO, speculative decoding, MARL, swarm optimization, differential privacy, and distributed collective operations.
- File paths such as
phases/10-llms-from-scratch/01-tokenizers/code/bpe.pyandphases/09-reinforcement-learning/08-ppo/code/ppo.pydirectly link the theory to executable source. - The repository uses automated tooling and a strict lesson template to prevent documentation drift.
Frequently Asked Questions
Does AI Engineering From Scratch explain the math behind each algorithm?
Yes. Every lesson includes a docs/en.md file that derives the exact mathematics in LaTeX, such as the BPE merge rule, the attention-score formula, and the PPO clipped-objective. These documents cite original research papers and directly reference the companion code files.
Are the code implementations runnable on their own?
Yes. Each code/ script is minimal and self-contained, typically only a few dozen lines. They avoid unnecessary dependencies through a CI-enforced allow-list, and every lesson includes at least five unit tests that can be executed with python3 -m unittest discover.
How does the repository prevent documentation from becoming outdated?
Each lesson is treated as a single commit per the guidelines in AGENTS.md. The curriculum is also generated into a static site via site/build.js, which produces site/data.js and updates the README automatically so that lesson tables always link to current documentation and implementations.
Which advanced algorithms are covered beyond basic transformers?
According to the source code, the curriculum covers speculative decoding, multi-agent reinforcement learning algorithms such as MADDPG and QMIX, swarm optimization including PSO and ACO, differential privacy via DP-SGD, and distributed all-reduce collective operations implemented from scratch.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →