internals

How `glossary/terms.md` Powers Consistent Terminology Across Lessons in AI Engineering from Scratch

June 7, 2026 rohitg00/ai-engineering-from-scratch ↗

glossary/terms.md serves as the single source of truth for all canonical definitions in the rohitg00/ai-engineering-from-scratch curriculum, parsed at build time by site/build.js into a GLOSSARY constant that every lesson page consumes for unified terminology.

In the rohitg00/ai-engineering-from-scratch repository, glossary/terms.md is the centralized dictionary that synchronizes technical language across 435 lessons. Rather than rewriting explanations in every module, authors reference terms from this file, ensuring concepts like Agent and Attention carry identical, version-controlled definitions wherever they appear. This architecture is maintained through automated build pipelines and strict contribution rules that prevent drift.

The Canonical Source in `glossary/terms.md`

The file at glossary/terms.md contains a markdown list of every canonical term used throughout the curriculum. Each entry follows a strict three-section template that separates colloquial usage from precise meaning:


### <Term>

- **What people say:** "<common-talk description>"
- **What it actually means:** <precise technical definition>
- **Why it's called that:** <historical / paper reference>

Early entries in the file define foundational concepts such as Agent, Attention, and Adam using this exact structure, making glossary/terms.md the authoritative starting point for how terminology is introduced to learners.

Build-Time Extraction in `site/build.js`

When the site is generated, the build script at site/build.js reads glossary/terms.md and executes a function named parseGlossary. This parser walks the file line-by-line and converts each term into a structured JavaScript object with three fields:

{
  term: "Agent",
  says: "An autonomous AI that thinks and acts on its own",
  means: "A while loop where an LLM decides what tool to call next, executes it, sees the result, and repeats"
}

The resulting array is stored as the GLOSSARY constant inside the auto-generated file site/data.js. Because this happens at build time, the front-end receives a static, queryable data structure rather than parsing raw markdown in the browser.

Cross-Lesson Consistency and Policy Enforcement

Lesson authors do not rewrite definitions. Instead, they reference a canonical term by name, and the build process guarantees that glossary/terms.md remains the only source of truth. This design delivers two critical outcomes:

Uniform wording — Every lesson that mentions a cached term shares the exact same explanation.
Instant propagation — Updating a definition in glossary/terms.md and rebuilding automatically refreshes tooltips, links, and glossary page entries across all 435 lessons.

The repository’s contribution guidelines in AGENTS.md explicitly codify this workflow:

“When introducing a term used by more than one lesson, add it to glossary/terms.md.”

This policy prevents duplicated or diverging explanations and keeps the curriculum coherent as it scales.

Runtime Discovery and `llms.txt` Generation

Beyond powering the static site, the GLOSSARY constant in site/data.js is consumed by the writeLlms script to generate llms.txt. That script embeds a count of glossary terms, creating a machine-readable summary that external agents can scrape for meta-learning. Thus, glossary/terms.md supports both human readers and automated tooling through a single parsed output.

Working with the Glossary: Code Examples

Adding a New Term

To introduce a concept such as KV Cache, an author appends the standard template directly to glossary/terms.md:


### KV Cache

- **What people say:** "Makes inference faster"
- **What it actually means:** "During autoregressive generation, caching the key and value matrices from previous tokens so you don't recompute them at each step."
- **Why it's called that:** "The cache stores the K (key) and V (value) tensors for reuse."

Committing this change and running npm run build (or waiting for CI) triggers parseGlossary in site/build.js and regenerates site/data.js with the new entry.

Accessing Parsed Data in a Client Script

Any front-end module can import the compiled definitions from the generated data file:

// Assume site/data.js has been loaded
import { GLOSSARY } from './data.js';

// Find the definition for "Agent"
const agentEntry = GLOSSARY.find(t => t.term === 'Agent');
console.log(agentEntry.means);
// → "A while loop where an LLM decides what tool to call next, executes it, sees the result, and repeats"

A lesson page can annotate a term with a data-term attribute and hydrate a tooltip from the canonical store:

<span class="term" data-term="Agent">Agent</span>

<script>
  const termSpans = document.querySelectorAll('.term');
  termSpans.forEach(span => {
    const term = span.dataset.term;
    const entry = GLOSSARY.find(t => t.term === term);
    if (entry) {
      span.title = `${entry.says}\n\n${entry.means}`;
    }
  });
</script>

When the page loads, the script queries GLOSSARY and renders a double-line tooltip that remains synchronized with the master definition in glossary/terms.md.

Key Files in the Glossary Pipeline

Several files cooperate to turn the markdown master list into a curriculum-wide terminology layer:

glossary/terms.md — The master list of all curriculum terms in strict markdown format.
site/build.js — Executes parseGlossary to read terms.md and write the compiled dataset.
site/data.js (generated) — Exports the GLOSSARY array consumed by the front-end and by writeLlms.
AGENTS.md — Contribution guidelines that mandate adding shared terms to glossary/terms.md.
site/glossary.html — Renders the searchable glossary page using the imported GLOSSARY constant.

Summary

glossary/terms.md stores every canonical definition for the AI Engineering from Scratch curriculum using a rigid three-part markdown template.
At build time, site/build.js runs parseGlossary to convert the markdown into the GLOSSARY JavaScript array inside site/data.js.
Lesson pages consume this constant for tooltips, links, and the searchable glossary.html page, ensuring identical wording across all 435 lessons.
The AGENTS.md contribution policy forces authors to centralize new terms, preventing explanation drift.
The same GLOSSARY structure feeds the writeLlms script for machine-readable llms.txt generation.

Frequently Asked Questions

What is the purpose of `glossary/terms.md` in AI Engineering from Scratch?

glossary/terms.md is the single source of truth for technical definitions used across the entire curriculum. It stores each term in a standardized three-section format so that concepts like Agent or KV Cache are explained identically in every lesson.

How does updating `glossary/terms.md` affect existing lessons?

Because site/build.js reparses the file into the GLOSSARY constant during each build, any edit to a definition automatically propagates to all lesson tooltips, glossary links, and the searchable glossary page. Authors never need to manually update individual lesson files.

What is the required format for new glossary entries?

Every term must follow the template established in glossary/terms.md: an H3 heading for the term name, followed by three bullet lines labeled What people say, What it actually means, and Why it's called that. This strict structure allows parseGlossary to split the entry into machine-readable fields.

Where is the parsed glossary data consumed besides the website?

In addition to rendering the glossary page and lesson tooltips, the generated GLOSSARY array in site/data.js is used by the writeLlms script to populate llms.txt with term counts and metadata, enabling external agents to discover curriculum concepts programmatically.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how rohitg00/ai-engineering-from-scratch works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →