# How `glossary/terms.md` Powers Consistent Terminology Across Lessons in AI Engineering from Scratch

> Learn how glossary terms.md ensures consistent terminology in AI Engineering from Scratch. Discover how site build.js creates a GLOSSARY constant for unified lesson definitions.

- Repository: [Rohit Ghumare/ai-engineering-from-scratch](https://github.com/rohitg00/ai-engineering-from-scratch)
- Tags: internals
- Published: 2026-06-07

---

**[`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) serves as the single source of truth for all canonical definitions in the `rohitg00/ai-engineering-from-scratch` curriculum, parsed at build time by [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) into a `GLOSSARY` constant that every lesson page consumes for unified terminology.**

In the [`rohitg00/ai-engineering-from-scratch`](https://github.com/rohitg00/ai-engineering-from-scratch) repository, [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) is the centralized dictionary that synchronizes technical language across 435 lessons. Rather than rewriting explanations in every module, authors reference terms from this file, ensuring concepts like **Agent** and **Attention** carry identical, version-controlled definitions wherever they appear. This architecture is maintained through automated build pipelines and strict contribution rules that prevent drift.

## The Canonical Source in [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md)

The file at [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) contains a markdown list of every canonical term used throughout the curriculum. Each entry follows a strict three-section template that separates colloquial usage from precise meaning:

```markdown

### <Term>

- **What people say:** "<common-talk description>"
- **What it actually means:** <precise technical definition>
- **Why it's called that:** <historical / paper reference>

```

Early entries in the file define foundational concepts such as **Agent**, **Attention**, and **Adam** using this exact structure, making [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) the authoritative starting point for how terminology is introduced to learners.

## Build-Time Extraction in [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js)

When the site is generated, the build script at [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) reads [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) and executes a function named `parseGlossary`. This parser walks the file line-by-line and converts each term into a structured JavaScript object with three fields:

```js
{
  term: "Agent",
  says: "An autonomous AI that thinks and acts on its own",
  means: "A while loop where an LLM decides what tool to call next, executes it, sees the result, and repeats"
}

```

The resulting array is stored as the `GLOSSARY` constant inside the auto-generated file [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js). Because this happens at build time, the front-end receives a static, queryable data structure rather than parsing raw markdown in the browser.

## Cross-Lesson Consistency and Policy Enforcement

Lesson authors do not rewrite definitions. Instead, they reference a canonical term by name, and the build process guarantees that [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) remains the only source of truth. This design delivers two critical outcomes:

- **Uniform wording** — Every lesson that mentions a cached term shares the exact same explanation.
- **Instant propagation** — Updating a definition in [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) and rebuilding automatically refreshes tooltips, links, and glossary page entries across all 435 lessons.

The repository’s contribution guidelines in [`AGENTS.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/AGENTS.md) explicitly codify this workflow:

> *“When introducing a term used by more than one lesson, add it to [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md).”*

This policy prevents duplicated or diverging explanations and keeps the curriculum coherent as it scales.

## Runtime Discovery and [`llms.txt`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/llms.txt) Generation

Beyond powering the static site, the `GLOSSARY` constant in [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) is consumed by the `writeLlms` script to generate [`llms.txt`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/llms.txt). That script embeds a count of glossary terms, creating a machine-readable summary that external agents can scrape for meta-learning. Thus, [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) supports both human readers and automated tooling through a single parsed output.

## Working with the Glossary: Code Examples

### Adding a New Term

To introduce a concept such as **KV Cache**, an author appends the standard template directly to [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md):

```markdown

### KV Cache

- **What people say:** "Makes inference faster"
- **What it actually means:** "During autoregressive generation, caching the key and value matrices from previous tokens so you don't recompute them at each step."
- **Why it's called that:** "The cache stores the K (key) and V (value) tensors for reuse."

```

Committing this change and running `npm run build` (or waiting for CI) triggers `parseGlossary` in [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) and regenerates [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) with the new entry.

### Accessing Parsed Data in a Client Script

Any front-end module can import the compiled definitions from the generated data file:

```js
// Assume site/data.js has been loaded
import { GLOSSARY } from './data.js';

// Find the definition for "Agent"
const agentEntry = GLOSSARY.find(t => t.term === 'Agent');
console.log(agentEntry.means);
// → "A while loop where an LLM decides what tool to call next, executes it, sees the result, and repeats"

```

### Rendering a Tooltip Inside a Lesson

A lesson page can annotate a term with a `data-term` attribute and hydrate a tooltip from the canonical store:

```html
<span class="term" data-term="Agent">Agent</span>

<script>
  const termSpans = document.querySelectorAll('.term');
  termSpans.forEach(span => {
    const term = span.dataset.term;
    const entry = GLOSSARY.find(t => t.term === term);
    if (entry) {
      span.title = `${entry.says}\n\n${entry.means}`;
    }
  });
</script>

```

When the page loads, the script queries `GLOSSARY` and renders a double-line tooltip that remains synchronized with the master definition in [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md).

## Key Files in the Glossary Pipeline

Several files cooperate to turn the markdown master list into a curriculum-wide terminology layer:

- **[`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md)** — The master list of all curriculum terms in strict markdown format.
- **[`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js)** — Executes `parseGlossary` to read [`terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/terms.md) and write the compiled dataset.
- **[`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js)** *(generated)* — Exports the `GLOSSARY` array consumed by the front-end and by `writeLlms`.
- **[`AGENTS.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/AGENTS.md)** — Contribution guidelines that mandate adding shared terms to [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md).
- **[`site/glossary.html`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/glossary.html)** — Renders the searchable glossary page using the imported `GLOSSARY` constant.

## Summary

- [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) stores every canonical definition for the AI Engineering from Scratch curriculum using a rigid three-part markdown template.
- At build time, [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) runs `parseGlossary` to convert the markdown into the `GLOSSARY` JavaScript array inside [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js).
- Lesson pages consume this constant for tooltips, links, and the searchable [`glossary.html`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary.html) page, ensuring identical wording across all 435 lessons.
- The [`AGENTS.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/AGENTS.md) contribution policy forces authors to centralize new terms, preventing explanation drift.
- The same `GLOSSARY` structure feeds the `writeLlms` script for machine-readable [`llms.txt`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/llms.txt) generation.

## Frequently Asked Questions

### What is the purpose of [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) in AI Engineering from Scratch?

[`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) is the single source of truth for technical definitions used across the entire curriculum. It stores each term in a standardized three-section format so that concepts like **Agent** or **KV Cache** are explained identically in every lesson.

### How does updating [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) affect existing lessons?

Because [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) reparses the file into the `GLOSSARY` constant during each build, any edit to a definition automatically propagates to all lesson tooltips, glossary links, and the searchable glossary page. Authors never need to manually update individual lesson files.

### What is the required format for new glossary entries?

Every term must follow the template established in [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md): an H3 heading for the term name, followed by three bullet lines labeled **What people say**, **What it actually means**, and **Why it's called that**. This strict structure allows `parseGlossary` to split the entry into machine-readable fields.

### Where is the parsed glossary data consumed besides the website?

In addition to rendering the glossary page and lesson tooltips, the generated `GLOSSARY` array in [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) is used by the `writeLlms` script to populate [`llms.txt`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/llms.txt) with term counts and metadata, enabling external agents to discover curriculum concepts programmatically.