# How site/build.js Generates the data.js File in ai-engineering-from-scratch

> Learn how sitebuildjs in ai-engineering-from-scratch transforms markdown curriculum into structured JavaScript modules. Discover its content enrichment and export process.

- Repository: [Rohit Ghumare/ai-engineering-from-scratch](https://github.com/rohitg00/ai-engineering-from-scratch)
- Tags: internals
- Published: 2026-06-07

---

**The site/build.js script transforms the repository's markdown curriculum into a structured JavaScript module by parsing README.md and ROADMAP.md, enriching content with metadata from lesson directories, and exporting three constants to site/data.js for client-side consumption.**

The `rohitg00/ai-engineering-from-scratch` repository automates its website data generation through a Node.js build pipeline. Understanding how site/build.js generates the data.js file allows contributors to modify curriculum structures while ensuring the frontend remains synchronized with the source markdown.

## Step 1: Loading Curriculum Source Files

The build process begins by reading three primary markdown sources from the repository root. According to the source code in [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) [lines 4–9 and 20–28](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L4-L28), the script uses `fs.readFileSync()` to load:

- [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) – Contains the human-readable phase and lesson tables
- [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) – Tracks lesson completion status via emoji indicators  
- [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) – Stores definitions for curriculum terminology

## Step 2: Parsing Markdown Structures

After loading the raw content, [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) employs three specialized parser functions to extract structured data from the markdown sources.

### Extracting Lesson Status from ROADMAP.md

The `parseRoadmap()` function ([lines 30–61](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L30-L61)) scans [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) and extracts lesson-status emojis, returning a plain JavaScript object that maps lessons to their completion states.

### Processing Phase Tables in README.md

The `parseReadme()` function ([lines 63–131](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L63-L131)) walks through the phase tables in [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) to pull lesson names, GitHub links, content types, and language badges. During this process, it converts emoji badges to human-readable strings—such as translating `🐍` to "Python" ([lines 48–66](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L48-L66))—and matches each lesson against the roadmap status map. The function also implements a critical guard clause: if a lesson has a valid URL but its status is marked as `planned`, the script forces the status to `complete` ([lines 96–99](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L96-L99)).

### Building the Glossary Index

The `parseGlossary()` function ([lines 71–90](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L71-L90)) processes [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) to extract term definitions, returning a plain object that becomes the `GLOSSARY` export.

## Step 3: Artifact Discovery and Metadata Enrichment

Before writing the output, the script enriches lesson objects with metadata from the filesystem and lesson documentation.

### Scanning Output Directories

The `discoverArtifacts()` function ([lines 112–198](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L112-L198)) walks the `phases/*/*/outputs/` directories to gather deliverables. It collects files prefixed with `skill-`, `prompt-`, or `agent-`, along with phase-14 mission files, parsing each file's frontmatter using `parseFrontmatter()` to extract structured metadata.

### Extracting Lesson Summaries and Keywords

The `extractLessonMeta()` function ([lines 47–68](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L47-L68)) reads each lesson's [`docs/en.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/docs/en.md) file (when present) to extract a one-line summary and harvest all `###` headings as searchable keywords. This metadata enables client-side search functionality in the generated website.

The `lessonPath()` utility ([lines 23–28](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L23-L28)) transforms GitHub URLs into site-compatible paths by stripping the base URL, yielding paths formatted as `/lesson.html?path=…` for frontend routing.

## Step 4: Emitting the JavaScript Module

The final stage assembles the collected data into a JavaScript module. In [lines 49–59](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L49-L59), the script constructs a string containing three exported constants: `PHASES`, `GLOSSARY`, and `ARTIFACTS`. This string includes a header comment with a generation timestamp. The script then writes this content to [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) ([lines 60–62](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L60-L62)), making the curriculum data available for import by the website's client code.

## Running the Build Locally

To regenerate [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) outside of GitHub Actions:

```bash
cd /path/to/ai-engineering-from-scratch
node site/build.js

```

This command prints progress messages and creates the [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) file containing the exported data structures.

To inspect the generated module:

```javascript
// Import in a Node script or browser environment
import { PHASES, GLOSSARY, ARTIFACTS } from './site/data.js';

// Example: List all completed lessons in Phase 3
const completed = PHASES
  .find(p => p.id === 3)
  .lessons.filter(l => l.status === 'complete')
  .map(l => l.name);
console.log('Completed Phase‑3 lessons:', completed);

```

You can also utilize the exported data for search implementations:

```javascript
function searchLessons(keyword) {
  const lower = keyword.toLowerCase();
  return PHASES.flatMap(p =>
    p.lessons.filter(l =>
      (l.summary && l.summary.toLowerCase().includes(lower)) ||
      (l.keywords && l.keywords.toLowerCase().includes(lower))
    )
  );
}

```

## Summary

- **site/build.js** serves as the single source of truth for curriculum data generation in the rohitg00/ai-engineering-from-scratch repository.
- The script parses [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md), [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md), and [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) to construct lesson structures, status maps, and term definitions.
- **Artifact discovery** scans `phases/*/*/outputs/` directories to collect skill, prompt, and agent files, while **metadata extraction** harvests summaries from lesson documentation.
- Generated output includes three exported constants—`PHASES`, `GLOSSARY`, and `ARTIFACTS`—written to [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) with a generation timestamp.
- The build process is idempotent and runs automatically via GitHub Actions on every push, ensuring the website always reflects the current curriculum state.

## Frequently Asked Questions

### What input files does site/build.js require to generate data.js?

The script requires three markdown files from the repository root: [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) (for phase and lesson tables), [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) (for completion status emojis), and [`glossary/terms.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/glossary/terms.md) (for term definitions). It also dynamically scans the `phases/` directory tree to discover lesson artifacts and metadata.

### How does site/build.js determine if a lesson is complete?

The script cross-references the roadmap status with the presence of lesson content. According to [lines 96–99](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js#L96-L99), if a lesson has a valid GitHub URL (indicating content exists) but the roadmap marks it as `planned`, [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) overrides the status to `complete`.

### Can I run site/build.js locally without GitHub Actions?

Yes. You can execute `node site/build.js` from the repository root on any system with Node.js installed. This regenerates [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) using the current state of the markdown sources and directory structure, making it useful for local development and testing.

### What data structures does the generated site/data.js export?

The generated file exports three constants: `PHASES` (an array of phase objects containing lessons with metadata), `GLOSSARY` (an object mapping terms to definitions), and `ARTIFACTS` (an array of discovered skill, prompt, and agent files with parsed frontmatter). These constants are used by the website's client-side JavaScript to render the curriculum interface.