how-to-guide

How to Add New Terms to the Glossary in AI Engineering From Scratch

June 8, 2026 rohitg00/ai-engineering-from-scratch ↗

To add new terms to the glossary in AI Engineering From Scratch, edit glossary/terms.md to include an H3 heading followed by bullet lines for "What people say" and "What it actually means"—the exact format required by the parseGlossary() regex parser in site/build.js.

The AI Engineering From Scratch repository maintains a centralized terminology system in glossary/terms.md. This single file powers the curriculum website and the llms.txt knowledge base through an automated build pipeline that enforces strict markdown conventions.

Understanding the Glossary Parser

The build script at site/build.js contains the parseGlossary() function that validates and extracts entries using specific regex patterns. The parser identifies term boundaries by scanning for level-3 ATX headers and validates required content by searching for bold labeled lines:

// site/build.js – parseGlossary()
const termMatch = line.match(/^###\s+(.+)/);
const saysMatch  = line.match(/\*\*What people say:\*\*\s*"?(.+?)"?\s*$/);
const meansMatch = line.match(/\*\*What it actually means:\*\*\s*(.+)/);

Only entries containing both a "What people say" line and a "What it actually means" line are retained in the generated output and published to the site.

Required Entry Structure

Each glossary entry must follow this exact markdown pattern:

Header: An H3 heading (### Term Name) that captures the term identifier
Popular definition: A bullet line starting with **What people say:** describing common (often incorrect) usage
Technical definition: A bullet line starting with **What it actually means:** providing the precise technical explanation
Etymology (optional): A bullet line starting with **Why it's called that:** explaining the naming origin

Omitting either of the first two bullet lines causes the parser to discard the entry entirely, and the term will not appear in site/data.js or llms.txt.

Step-by-Step Guide to Adding Terms

Follow this workflow to add entries that pass validation and appear across the curriculum:

Open glossary/terms.md and locate the appropriate alphabetical section (e.g., the P section for "Prompt Injection")
Insert the entry using this exact template:


### Prompt Injection

- **What people say:** "Hacking the AI with words"
- **What it actually means:** An attack where malicious text in the input overrides the system prompt or instructions. Direct injection: user types "Ignore previous instructions." Indirect injection: a retrieved document contains hidden instructions. The LLM equivalent of SQL injection. No complete solution exists — defense is layers of input validation, output filtering, and privilege separation.
- **Why it's called that:** The term mirrors "SQL injection" because the attacker injects unauthorized commands into the prompt text.

Save and commit using conventional commit format:

git add glossary/terms.md
git commit -m "feat(glossary): add Prompt Injection term"
git push

The CI pipeline automatically runs site/build.js, re-parses the glossary, and updates both the site data and knowledge base files.

Build Pipeline Integration

When you push changes to glossary/terms.md, the following automation executes:

Validation: parseGlossary() scans each line, extracting only entries that match all required regex patterns
JSON Generation: Valid terms are structured into site/data.js for the website frontend
Knowledge Base Update: The llms.txt file receives an updated searchable list for AI agent consumption
Silent Failure: Invalid entries (missing required lines) are skipped without breaking the build, though they will not appear in output

The AGENTS.md file explicitly defines this contract, mandating that all new terminology reside in glossary/terms.md rather than being duplicated across lesson documentation.

Summary

Add new terms to the centralized glossary/terms.md file following the exact H3-plus-bullets structure required by the parser in site/build.js
Include both What people say and What it actually means bullet lines or the entry will be ignored
Place entries in alphabetical sections to maintain file organization
Commit changes trigger automatic regeneration of site/data.js and llms.txt via the parseGlossary() function
Never duplicate glossary definitions in individual lesson files; use the single source of truth pattern

Frequently Asked Questions

What happens if I omit the "What people say" line from a glossary entry?

The parseGlossary() function requires both regex matches for saysMatch and meansMatch to retain an entry. If either bold labeled line is missing, the parser skips that term during the build process, and it will not appear in the generated site/data.js or llms.txt output, even if the H3 header is present.

Can I add glossary terms anywhere in the `terms.md` file?

While the parser technically scans the entire file regardless of position, you should insert new terms within the appropriate alphabetical section (e.g., under the P section for "Prompt Engineering"). This maintains human-readable organization and simplifies manual editing, though the build process extracts entries based on pattern matching rather than section headers.

How do I verify my glossary entry was processed correctly?

After committing changes, check the generated files to confirm inclusion. Valid entries appear in site/data.js as JSON objects and in site/llms.txt as formatted text. If your term is missing from these generated files despite being in glossary/terms.md, verify that your markdown exactly matches the required bold-line patterns without extra whitespace or formatting errors that would break the regex match.

Why does the glossary use a centralized file instead of distributed definitions?

The AGENTS.md specification mandates a single source of truth in glossary/terms.md to ensure consistency across the curriculum. This prevents definition drift between lessons and enables the automated build pipeline to generate standardized knowledge bases for both human readers (via the website) and AI agents (via llms.txt) from one authority file, as implemented in rohitg00/ai-engineering-from-scratch.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how rohitg00/ai-engineering-from-scratch works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →