How to Add New Terms to the Glossary in AI Engineering From Scratch
To add new terms to the glossary in AI Engineering From Scratch, edit glossary/terms.md to include an H3 heading followed by bullet lines for "What people say" and "What it actually means"—the exact format required by the parseGlossary() regex parser in site/build.js.
The AI Engineering From Scratch repository maintains a centralized terminology system in glossary/terms.md. This single file powers the curriculum website and the llms.txt knowledge base through an automated build pipeline that enforces strict markdown conventions.
Understanding the Glossary Parser
The build script at site/build.js contains the parseGlossary() function that validates and extracts entries using specific regex patterns. The parser identifies term boundaries by scanning for level-3 ATX headers and validates required content by searching for bold labeled lines:
// site/build.js – parseGlossary()
const termMatch = line.match(/^###\s+(.+)/);
const saysMatch = line.match(/\*\*What people say:\*\*\s*"?(.+?)"?\s*$/);
const meansMatch = line.match(/\*\*What it actually means:\*\*\s*(.+)/);
Only entries containing both a "What people say" line and a "What it actually means" line are retained in the generated output and published to the site.
Required Entry Structure
Each glossary entry must follow this exact markdown pattern:
-
Header: An H3 heading (
### Term Name) that captures the term identifier -
Popular definition: A bullet line starting with
**What people say:**describing common (often incorrect) usage -
Technical definition: A bullet line starting with
**What it actually means:**providing the precise technical explanation -
Etymology (optional): A bullet line starting with
**Why it's called that:**explaining the naming origin
Omitting either of the first two bullet lines causes the parser to discard the entry entirely, and the term will not appear in site/data.js or llms.txt.
Step-by-Step Guide to Adding Terms
Follow this workflow to add entries that pass validation and appear across the curriculum:
-
Open
glossary/terms.mdand locate the appropriate alphabetical section (e.g., the P section for "Prompt Injection") -
Insert the entry using this exact template:
### Prompt Injection
- **What people say:** "Hacking the AI with words"
- **What it actually means:** An attack where malicious text in the input overrides the system prompt or instructions. Direct injection: user types "Ignore previous instructions." Indirect injection: a retrieved document contains hidden instructions. The LLM equivalent of SQL injection. No complete solution exists — defense is layers of input validation, output filtering, and privilege separation.
- **Why it's called that:** The term mirrors "SQL injection" because the attacker injects unauthorized commands into the prompt text.
- Save and commit using conventional commit format:
git add glossary/terms.md
git commit -m "feat(glossary): add Prompt Injection term"
git push
The CI pipeline automatically runs site/build.js, re-parses the glossary, and updates both the site data and knowledge base files.
Build Pipeline Integration
When you push changes to glossary/terms.md, the following automation executes:
- Validation:
parseGlossary()scans each line, extracting only entries that match all required regex patterns - JSON Generation: Valid terms are structured into
site/data.jsfor the website frontend - Knowledge Base Update: The
llms.txtfile receives an updated searchable list for AI agent consumption - Silent Failure: Invalid entries (missing required lines) are skipped without breaking the build, though they will not appear in output
The AGENTS.md file explicitly defines this contract, mandating that all new terminology reside in glossary/terms.md rather than being duplicated across lesson documentation.
Summary
- Add new terms to the centralized
glossary/terms.mdfile following the exact H3-plus-bullets structure required by the parser insite/build.js - Include both What people say and What it actually means bullet lines or the entry will be ignored
- Place entries in alphabetical sections to maintain file organization
- Commit changes trigger automatic regeneration of
site/data.jsandllms.txtvia theparseGlossary()function - Never duplicate glossary definitions in individual lesson files; use the single source of truth pattern
Frequently Asked Questions
What happens if I omit the "What people say" line from a glossary entry?
The parseGlossary() function requires both regex matches for saysMatch and meansMatch to retain an entry. If either bold labeled line is missing, the parser skips that term during the build process, and it will not appear in the generated site/data.js or llms.txt output, even if the H3 header is present.
Can I add glossary terms anywhere in the terms.md file?
While the parser technically scans the entire file regardless of position, you should insert new terms within the appropriate alphabetical section (e.g., under the P section for "Prompt Engineering"). This maintains human-readable organization and simplifies manual editing, though the build process extracts entries based on pattern matching rather than section headers.
How do I verify my glossary entry was processed correctly?
After committing changes, check the generated files to confirm inclusion. Valid entries appear in site/data.js as JSON objects and in site/llms.txt as formatted text. If your term is missing from these generated files despite being in glossary/terms.md, verify that your markdown exactly matches the required bold-line patterns without extra whitespace or formatting errors that would break the regex match.
Why does the glossary use a centralized file instead of distributed definitions?
The AGENTS.md specification mandates a single source of truth in glossary/terms.md to ensure consistency across the curriculum. This prevents definition drift between lessons and enables the automated build pipeline to generate standardized knowledge bases for both human readers (via the website) and AI agents (via llms.txt) from one authority file, as implemented in rohitg00/ai-engineering-from-scratch.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →