What Does the `scaffold-lesson.sh` Automation Script Do?

The scaffold-lesson.sh automation script generates a standardized lesson skeleton—complete with directories, starter markdown, and a Python stub—for the AI Engineering From Scratch repository, ensuring every new lesson follows the curriculum's strict layout conventions.

The scaffold-lesson.sh automation script lives in the scripts/ directory of the open-source rohitg00/ai-engineering-from-scratch curriculum. Its primary purpose is to eliminate repetitive manual setup and enforce the repository's one-commit-per-lesson rule across all 435 lessons. When contributors invoke the script, it validates inputs, resolves paths from the repository root, generates a consistent folder hierarchy, and prints a checklist for the remaining manual steps.

Usage and Basic Syntax

The script accepts two required arguments and one optional argument: a phase directory, a lesson slug in NN-kebab-case format, and an optional human-readable title. If arguments are missing or malformed, it prints a usage help block (lines 4–15).


# Create lesson 03-tokenizers in the NLP foundations phase

scripts/scaffold-lesson.sh 05-nlp-foundations-to-advanced 03-tokenizers

# Provide an optional title for richer metadata

scripts/scaffold-lesson.sh 05-nlp-foundations-to-advanced 03-tokenizers "Tokenizers from Scratch"

After a successful run, the terminal displays a summary and a four-step checklist:

created phases/05-nlp-foundations-to-advanced/03-tokenizers/

next:
  1. edit phases/05-nlp-foundations-to-advanced/03-tokenizers/docs/en.md
  2. write phases/05-nlp-foundations-to-advanced/03-tokenizers/code/main.py
  3. add a markdown-link row to ROADMAP.md …
  4. atomic commit: git add phases/… ROADMAP.md && git commit -m "feat(phase-05/03): Tokenizers from Scratch"

How the Script Works

The scaffold-lesson.sh automation script processes each lesson request through a rigid, eight-step pipeline.

Input Validation

First, the script verifies that the caller supplied a phase directory, a properly formatted lesson slug, and optionally a title. When these are absent, it exits early and surfaces the built-in help text (lines 4–15). This gate prevents incomplete or malformed lesson entries from ever touching the filesystem.

Repository Root and Path Resolution

The script locates the repository root via git rev-parse (line 22). It then computes absolute paths for the target phase and the new lesson directory (lines 28–30). Anchoring everything to the Git root guarantees that the script behaves correctly regardless of the caller's current working directory.

Pre-Condition Checks

Before creating any files, the script enforces three hard constraints (lines 31–45):

  • The target phase directory must already exist.
  • The lesson slug must match the NN-kebab-case pattern.
  • The lesson directory must not already exist.

If any check fails, the script aborts immediately and reports the specific violation.

Directory Creation

Once validation passes, the script creates the required folder hierarchy under phases/<phase>/<lesson> (line 47). Every new lesson receives four standardized subdirectories:

  • code/ – Python source files
  • notebook/ – Jupyter notebooks
  • docs/ – curriculum markdown
  • outputs/ – generated artifacts

Generating Starter Files

The scaffold-lesson.sh automation script populates the new tree with three categories of starter content:

Curriculum Markdown. It writes a fully templated docs/en.md (lines 57–116) that includes the repository's mandated front-matter specification. Authors open this file and fill in lesson objectives, theory, and exercises without worrying about metadata formatting.

Python Stub. It drops a minimal code/main.py (lines 118–125) containing a single stub that raises NotImplementedError. This explicit placeholder reminds contributors exactly where to implement the lesson logic.

Git Placeholders. To ensure Git tracks the otherwise empty notebook/ and outputs/ directories, the script touches placeholder files inside them (lines 127–128).

Post-Creation Checklist

Finally, the script prints a numbered checklist (lines 30–38) that guides the contributor through the remaining manual workflow: editing the markdown, writing the Python implementation, appending the lesson row to ROADMAP.md, and committing atomically.

Key Files in the Scaffolding Workflow

The entire lesson-generation pipeline revolves around four critical paths in the rohitg00/ai-engineering-from-scratch repository:

  • scripts/scaffold-lesson.sh – The automation script itself.
  • phases/<phase>/<lesson>/docs/en.md – The templated curriculum file created by the script.
  • phases/<phase>/<lesson>/code/main.py – The NotImplementedError stub generated for every lesson.
  • ROADMAP.md – The master index where contributors must manually link the new lesson after scaffolding.

Summary

  • The scaffold-lesson.sh automation script validates phase directories, lesson slugs, and optional titles before any filesystem changes occur.
  • It resolves absolute paths from the Git repository root and enforces the NN-kebab-case naming convention.
  • The script creates a uniform four-directory hierarchy (code, notebook, docs, outputs) for every lesson.
  • It auto-generates a front-matter-ready docs/en.md, a NotImplementedError stub in code/main.py, and Git-safe placeholder files.
  • A printed post-creation checklist reminds contributors to update ROADMAP.md and commit atomically, preserving the one-commit-per-lesson policy.

Frequently Asked Questions

What arguments are required to run scaffold-lesson.sh?

The script requires a phase directory name and a lesson slug in NN-kebab-case format. An optional third argument supplies a human-readable title. If required arguments are missing, the script exits and prints usage help defined in lines 4–15.

What directory structure does the script create?

The script creates four subdirectories under phases/<phase>/<lesson>/: code/, notebook/, docs/, and outputs/. This layout is hard-coded in the script (line 47) and guarantees every lesson in the 435-lesson repository follows identical architectural conventions.

Why does the generated main.py raise NotImplementedError?

According to lines 118–125 of scripts/scaffold-lesson.sh, the script intentionally emits a minimal Python stub containing raise NotImplementedError. This serves as an explicit reminder for the contributor to implement the lesson logic before submission.

How does scaffold-lesson.sh enforce the NN-kebab-case naming rule?

During pre-condition checks (lines 31–45), the script validates the lesson slug against the NN-kebab-case pattern. If the slug violates this convention or if the target lesson directory already exists, the script aborts immediately and reports the error to the caller.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →