# How to Contribute Glossary Terms to the AI Engineering From-Scratch Repository

> Learn how to contribute glossary terms to the AI Engineering From-Scratch repository. Fork, add your term to terms.md, validate the index, and submit a pull request.

- Repository: [Rohit Ghumare/ai-engineering-from-scratch](https://github.com/rohitg00/ai-engineering-from-scratch)
- Tags: how-to-guide
- Published: 2026-06-08

---

**To contribute a glossary term, fork the repository, append a structured definition to [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md), run `node site/build.js` to validate the index, and submit a pull request following the conventional commit format `feat(glossary): add <term>`.**

The `rohitg00/ai-engineering-from-scratch` repository maintains a canonical glossary of AI Engineering terminology that powers the curriculum's cross-referencing system. Contributing a new term follows a lightweight, automated workflow designed to keep the 277-term lexicon consistent and searchable. Whether you're clarifying jargon like "Prompt Engineering" or "RAG," the process centers on editing a single markdown file and passing automated CI validation.

## Step-by-Step Contribution Workflow

### Fork and Branch the Repository

Start by forking the repository and creating a feature branch using the naming convention `add-glossary-<term>`. Clone your fork locally and switch to the new branch to isolate your changes.

```bash
git clone https://github.com/rohitg00/ai-engineering-from-scratch.git
cd ai-engineering-from-scratch
git checkout -b add-glossary-<term>

```

### Append the Term to glossary/terms.md

Open [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) and add your term using the established three-part structure. Each term requires an H3 heading followed by specific bullet points explaining common usage, technical definition, and etymology.

```markdown

### Prompt-Engineering

- **What people say:** "Crafting the best prompt for the model"
- **What it actually means:** The systematic design of input text (system prompt, few-shot examples, chain-of-thought instructions, etc.) to steer an LLM toward a desired behaviour while minimizing hallucination or bias.
- **Why it's called that:** The term borrows from software engineering—just as engineers write code, we "engineer" prompts to obtain reliable outputs.

```

Place the block at the end of the file or in alphabetical order to maintain consistency.

### Update Cross-References in Lesson Documentation

According to the automation contract in [`AGENTS.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/AGENTS.md), you should reference the new term in any relevant lesson files (typically under [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md)) using wiki-style links like `[[TermName]]`. This ensures the curriculum remains internally consistent and leverages the glossary for definitions.

### Validate with the Site Builder

Before committing, run the static site generator to verify your term is indexed correctly. The [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) script parses [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) and regenerates [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js), which serves as the JSON data source for the web UI.

```bash
node site/build.js

```

Check that [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) updates without errors. Never manually edit [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js), as CI regenerates it automatically after each merge.

### Commit and Submit Your Pull Request

Follow the **one-lesson-per-commit** policy (a glossary update counts as one logical change). Use the conventional commit message format `feat(glossary): add <term>`. Push your branch and open a PR that includes a concise description of the term and any lesson files touched.

## CI Automation and Quality Gates

Once you open the pull request, three automated checks must pass before merge:

- **audit**: Validates lesson structure and markdown syntax
- **readme-counts-sync**: Ensures documentation statistics are current
- **site-rebuild**: Regenerates [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) to confirm the glossary index is valid

These checks are defined in [`.github/workflows/curriculum.yml`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/.github/workflows/curriculum.yml).

## Key Files in the Contribution Pipeline

- **[`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md)**: The central canonical store for all 277+ term definitions.
- **[`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js)**: The Node.js parser that transforms glossary markdown into structured JSON.
- **[`AGENTS.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/AGENTS.md)**: The automation contract specifying that cross-lesson terms must be added to the glossary surface.
- **[`CONTRIBUTING.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/CONTRIBUTING.md)**: The comprehensive guide to fork, branch, and PR workflows.
- **[`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js)**: The auto-generated JSON output consumed by the site frontend.

## Summary

- Fork the repository and create a branch named `add-glossary-<term>`
- Append new terms to [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) using the "### Term" format with three descriptive bullets

- Run `node site/build.js` locally to validate the index generation
- Commit using `feat(glossary): add <term>` and open a PR
- Ensure CI passes `audit`, `readme-counts-sync`, and `site-rebuild` checks

## Frequently Asked Questions

### What file should I edit to add a new glossary term?

You must edit [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) directly. This file serves as the single source of truth for the curriculum's canonical glossary. Do not edit [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js), as it is auto-generated by the CI pipeline.

### How do I format a new glossary entry to match the existing style?

Use an H3 heading for the term name, followed by three bullet points: "What people say," "What it actually means," and "Why it's called that." This structure ensures consistency across all 277 terms and enables proper parsing by [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js).

### What CI checks validate my glossary contribution?

Your pull request must pass the `audit`, `readme-counts-sync`, and `site-rebuild` jobs defined in [`.github/workflows/curriculum.yml`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/.github/workflows/curriculum.yml). The `site-rebuild` check specifically verifies that `node site/build.js` executes successfully and that the generated [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) contains your new term.

### Should I manually update site/data.js when adding a term?

No. The [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) file is automatically regenerated by the `site-rebuild` CI job after your PR is merged. Manually editing it will cause conflicts and fail the build validation.