# How `data.js` Is Generated from `README.md` and `ROADMAP.md` in AI Engineering from Scratch

> Learn how site data js is generated in AI Engineering from Scratch. Discover how README md and ROADMAP md files are parsed and transformed into structured JavaScript objects for the project.

- Repository: [Rohit Ghumare/ai-engineering-from-scratch](https://github.com/rohitg00/ai-engineering-from-scratch)
- Tags: internals
- Published: 2026-06-08

---

**The [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) script generates [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) by parsing [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) and [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) into structured JavaScript objects, cross-referencing completion status, and enriching each lesson with metadata pulled from its [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) file.**

In the `rohitg00/ai-engineering-from-scratch` repository, the curriculum website is powered by a fully automated build pipeline. Rather than maintaining JSON or JavaScript data files by hand, the project generates [`data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/data.js) from [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) and [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) every time the source changes. This keeps the human-readable markdown files as the single source of truth while producing a structured payload that the front-end consumes directly.

## The Role of [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js)

The entire pipeline lives in [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) as implemented in `rohitg00/ai-engineering-from-scratch`. According to the source code, this file is the single source of truth that turns the human-written curriculum files into the JavaScript data model consumed by the website. The process runs automatically on every push via GitHub Actions and can be invoked locally with `node site/build.js`.

## Step 1: Reading the Source Files

At the start of the build, the script loads the three core markdown files into memory.

The relevant lines read:

```js
const readme   = fs.readFileSync(README_PATH, 'utf8');   // L4-L6
const roadmap  = fs.readFileSync(ROADMAP_PATH, 'utf8'); // L4-L6
const glossary = fs.readFileSync(GLOSSARY_PATH, 'utf8'); // L4-L6

```

The script reads [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) for the public overview and lesson tables, [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) for the phase and lesson status matrix, and [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) for the glossary definitions.

## Step 2: Parsing [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) with `parseRoadmap`

To determine completion status, the build script calls **`parseRoadmap`**, which scans each line of [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) for phase headers and individual lesson rows.

As implemented in the file at lines **30-61**, the function identifies patterns such as `## Phase 0 … — ✅` for phases and `| 01 | Dev Environment | ✅ |` for lessons. It then builds a nested map with the structure:

```

{ Phase → { phaseStatus, lessons: { lessonName → status } } }

```

This status map becomes the authority for whether a lesson is marked complete, in-progress, or planned.

## Step 3: Parsing [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) with `parseReadme`

Next, the function **`parseReadme`** walks [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) line-by-line to discover phases and lessons. As implemented at lines **63-130**, it locates phase headings—supporting both the legacy table-based format and the newer `<details>` blocks—and then parses the lesson tables that follow them.

For every lesson row, `parseReadme` extracts:

- The **lesson name** and optional link.
- The **lesson type** (e.g., "Build", "Learn").
- The **language list** via emoji-to-language conversion.
- The **GitHub URL** to the lesson source, if a link is present.

Crucially, it cross-references the status map produced by `parseRoadmap` to attach a `status` field to each lesson. If no match is found, the status falls back to `"planned"`.

## Step 4: Enriching Lessons with `extractLessonMeta`

After the curriculum structure is known, the build loop at lines **22-30** of the `build` function calls **`extractLessonMeta`** for every lesson that has a source URL. This helper opens the lesson's [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) file and pulls two pieces of metadata:

- The **first blockquote**, which is used as a one-sentence summary.
- **All H3 headings**, which are concatenated into a keyword string.

These values are added to the lesson object as `summary` and `keywords`, making the generated [`data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/data.js) searchable without manual indexing.

## Step 5: Assembling the Final JavaScript Payload

Once parsing and enrichment are complete, the script constructs three constants:

```js
const PHASES   = …   // full list of phases & lessons
const GLOSSARY = …   // parsed glossary terms
const ARTIFACTS = …  // discovered outputs

```

As implemented at lines **49-58** of [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js), the script stringifies these objects with indentation and writes them to [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js). The file is a valid JavaScript module that exports `PHASES`, `GLOSSARY`, and `ARTIFACTS` for the front-end to import.

The build also performs secondary tasks—such as updating README badges, site statistics, a sitemap, and an [`llms.txt`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/llms.txt) file—but the core data consumed by the website is the auto-generated `PHASES`, `GLOSSARY`, and `ARTIFACTS` payload.

## How to Run the Generator Locally

You can trigger the pipeline manually from the repository root.

```bash
node site/build.js

```

After execution, the console logs each stage:

```

📖 Reading source files...
🔍 Parsing ROADMAP.md...
🔍 Parsing README.md...
🔍 Parsing glossary/terms.md...
🔍 Discovering outputs + Phase 14 missions...
📚 Extracting lesson summaries + keywords from docs/en.md...
✅ Generated site/data.js

```

To inspect the output in a Node.js REPL or another script:

```js
const { PHASES, GLOSSARY, ARTIFACTS } = require('./site/data.js');

console.log(PHASES.length);          // number of phases
console.log(PHASES[0].lessons[0]);   // first lesson object
console.log(GLOSSARY[0]);            // first glossary entry

```

## Summary

- The **[`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js)** script is the sole build entry point that generates [`data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/data.js) from [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) and [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) in the `rohitg00/ai-engineering-from-scratch` project.
- **`parseRoadmap`** (lines 30-61) extracts completion status from [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) into a nested map.
- **`parseReadme`** (lines 63-130) discovers phases and lessons from [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) and attaches the roadmap status to each entry.
- **`extractLessonMeta`** enriches lessons with summaries and keywords by reading individual [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) files.
- The final `PHASES`, `GLOSSARY`, and `ARTIFACTS` constants are written to [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) at lines 49-58, producing a front-end-ready JavaScript module on every push.

## Frequently Asked Questions

### What triggers the generation of [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js)?

The generation runs automatically on every push via GitHub Actions, but you can also invoke it locally by running `node site/build.js` from the repository root.

### Which files does [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) read besides [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) and [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md)?

In addition to the two primary curriculum files, the script reads [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) for domain definitions and scans each lesson's `outputs/` folder to discover reusable artifacts.

### How does a lesson receive its completion status in [`data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/data.js)?

The `parseReadme` function cross-references the nested status map built by `parseRoadmap`. Each lesson is matched by name; if no match is found in [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md), the status defaults to `"planned"`.

### Can I use the generated [`data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/data.js) outside of the website?

Yes. Because [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) is a standard JavaScript module that exports `PHASES`, `GLOSSARY`, and `ARTIFACTS`, you can require or import it into any Node.js script or compatible bundler for custom reporting or integrations.