How `glossary/terms.md` Powers Consistent Terminology Across Lessons in AI Engineering from Scratch
glossary/terms.md serves as the single source of truth for all canonical definitions in the rohitg00/ai-engineering-from-scratch curriculum, parsed at build time by site/build.js into a GLOSSARY constant that every lesson page consumes for unified terminology.
In the rohitg00/ai-engineering-from-scratch repository, glossary/terms.md is the centralized dictionary that synchronizes technical language across 435 lessons. Rather than rewriting explanations in every module, authors reference terms from this file, ensuring concepts like Agent and Attention carry identical, version-controlled definitions wherever they appear. This architecture is maintained through automated build pipelines and strict contribution rules that prevent drift.
The Canonical Source in glossary/terms.md
The file at glossary/terms.md contains a markdown list of every canonical term used throughout the curriculum. Each entry follows a strict three-section template that separates colloquial usage from precise meaning:
### <Term>
- **What people say:** "<common-talk description>"
- **What it actually means:** <precise technical definition>
- **Why it's called that:** <historical / paper reference>
Early entries in the file define foundational concepts such as Agent, Attention, and Adam using this exact structure, making glossary/terms.md the authoritative starting point for how terminology is introduced to learners.
Build-Time Extraction in site/build.js
When the site is generated, the build script at site/build.js reads glossary/terms.md and executes a function named parseGlossary. This parser walks the file line-by-line and converts each term into a structured JavaScript object with three fields:
{
term: "Agent",
says: "An autonomous AI that thinks and acts on its own",
means: "A while loop where an LLM decides what tool to call next, executes it, sees the result, and repeats"
}
The resulting array is stored as the GLOSSARY constant inside the auto-generated file site/data.js. Because this happens at build time, the front-end receives a static, queryable data structure rather than parsing raw markdown in the browser.
Cross-Lesson Consistency and Policy Enforcement
Lesson authors do not rewrite definitions. Instead, they reference a canonical term by name, and the build process guarantees that glossary/terms.md remains the only source of truth. This design delivers two critical outcomes:
- Uniform wording — Every lesson that mentions a cached term shares the exact same explanation.
- Instant propagation — Updating a definition in
glossary/terms.mdand rebuilding automatically refreshes tooltips, links, and glossary page entries across all 435 lessons.
The repository’s contribution guidelines in AGENTS.md explicitly codify this workflow:
“When introducing a term used by more than one lesson, add it to
glossary/terms.md.”
This policy prevents duplicated or diverging explanations and keeps the curriculum coherent as it scales.
Runtime Discovery and llms.txt Generation
Beyond powering the static site, the GLOSSARY constant in site/data.js is consumed by the writeLlms script to generate llms.txt. That script embeds a count of glossary terms, creating a machine-readable summary that external agents can scrape for meta-learning. Thus, glossary/terms.md supports both human readers and automated tooling through a single parsed output.
Working with the Glossary: Code Examples
Adding a New Term
To introduce a concept such as KV Cache, an author appends the standard template directly to glossary/terms.md:
### KV Cache
- **What people say:** "Makes inference faster"
- **What it actually means:** "During autoregressive generation, caching the key and value matrices from previous tokens so you don't recompute them at each step."
- **Why it's called that:** "The cache stores the K (key) and V (value) tensors for reuse."
Committing this change and running npm run build (or waiting for CI) triggers parseGlossary in site/build.js and regenerates site/data.js with the new entry.
Accessing Parsed Data in a Client Script
Any front-end module can import the compiled definitions from the generated data file:
// Assume site/data.js has been loaded
import { GLOSSARY } from './data.js';
// Find the definition for "Agent"
const agentEntry = GLOSSARY.find(t => t.term === 'Agent');
console.log(agentEntry.means);
// → "A while loop where an LLM decides what tool to call next, executes it, sees the result, and repeats"
Rendering a Tooltip Inside a Lesson
A lesson page can annotate a term with a data-term attribute and hydrate a tooltip from the canonical store:
<span class="term" data-term="Agent">Agent</span>
<script>
const termSpans = document.querySelectorAll('.term');
termSpans.forEach(span => {
const term = span.dataset.term;
const entry = GLOSSARY.find(t => t.term === term);
if (entry) {
span.title = `${entry.says}\n\n${entry.means}`;
}
});
</script>
When the page loads, the script queries GLOSSARY and renders a double-line tooltip that remains synchronized with the master definition in glossary/terms.md.
Key Files in the Glossary Pipeline
Several files cooperate to turn the markdown master list into a curriculum-wide terminology layer:
glossary/terms.md— The master list of all curriculum terms in strict markdown format.site/build.js— ExecutesparseGlossaryto readterms.mdand write the compiled dataset.site/data.js(generated) — Exports theGLOSSARYarray consumed by the front-end and bywriteLlms.AGENTS.md— Contribution guidelines that mandate adding shared terms toglossary/terms.md.site/glossary.html— Renders the searchable glossary page using the importedGLOSSARYconstant.
Summary
glossary/terms.mdstores every canonical definition for the AI Engineering from Scratch curriculum using a rigid three-part markdown template.- At build time,
site/build.jsrunsparseGlossaryto convert the markdown into theGLOSSARYJavaScript array insidesite/data.js. - Lesson pages consume this constant for tooltips, links, and the searchable
glossary.htmlpage, ensuring identical wording across all 435 lessons. - The
AGENTS.mdcontribution policy forces authors to centralize new terms, preventing explanation drift. - The same
GLOSSARYstructure feeds thewriteLlmsscript for machine-readablellms.txtgeneration.
Frequently Asked Questions
What is the purpose of glossary/terms.md in AI Engineering from Scratch?
glossary/terms.md is the single source of truth for technical definitions used across the entire curriculum. It stores each term in a standardized three-section format so that concepts like Agent or KV Cache are explained identically in every lesson.
How does updating glossary/terms.md affect existing lessons?
Because site/build.js reparses the file into the GLOSSARY constant during each build, any edit to a definition automatically propagates to all lesson tooltips, glossary links, and the searchable glossary page. Authors never need to manually update individual lesson files.
What is the required format for new glossary entries?
Every term must follow the template established in glossary/terms.md: an H3 heading for the term name, followed by three bullet lines labeled What people say, What it actually means, and Why it's called that. This strict structure allows parseGlossary to split the entry into machine-readable fields.
Where is the parsed glossary data consumed besides the website?
In addition to rendering the glossary page and lesson tooltips, the generated GLOSSARY array in site/data.js is used by the writeLlms script to populate llms.txt with term counts and metadata, enabling external agents to discover curriculum concepts programmatically.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →