# How site/data.js URL Generation Works in AI Engineering From Scratch

> Explore how the site/data.js URL generation works in AI Engineering From Scratch. Discover how lesson links are parsed and converted into absolute URLs by the build script.

- Repository: [Rohit Ghumare/ai-engineering-from-scratch](https://github.com/rohitg00/ai-engineering-from-scratch)
- Tags: internals
- Published: 2026-06-07

---

**The [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) URLs are generated by the [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js) script, which parses lesson links from [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md), extracts their relative paths, and prefixes them with the GitHub repository base URL to create absolute links.**

In the `rohitg00/ai-engineering-from-scratch` repository, the web UI consumes a JavaScript array of lessons exported by [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js). This file is not maintained manually; instead, a build script automatically constructs the `url` properties for each lesson by transforming relative Markdown paths into full GitHub URLs. This ensures the lesson links remain synchronized with the actual repository structure.

## The Build Pipeline Architecture

According to the source code, the URL generation process is contained entirely within [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js). This script executes during every push via GitHub Actions, reading the master lesson table from [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) and outputting a structured JavaScript module to [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js). The pipeline combines data from [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) (lesson locations) and [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) (lesson status) to construct the final lesson objects.

## Step-by-Step URL Generation Logic

The transformation from Markdown link to absolute URL occurs in three distinct phases inside the `parseReadme` function.

### Parsing Lesson Links from README.md

The build script scans each row of the lesson table in [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) using a regular expression to identify Markdown links. When a lesson entry contains a link like `[Dev Environment](phases/00-setup-and-tooling/01-dev-environment/)`, the script captures both the display text and the relative path.

```javascript
// Inside parseReadme – line 65-78 region
const linkMatch = lessonCol.match(/\[(.+?)\]\((.+?)\)/);
let lessonName, url;
if (linkMatch) {
  lessonName = linkMatch[1];
  const relativePath = linkMatch[2];
  // Build the absolute GitHub URL
  url = GITHUB_BASE + relativePath.replace(/^\//, '');
}

```

### Normalizing Relative Paths

Before concatenation, the script sanitizes the extracted path by removing any leading forward slash using `.replace(/^\//, '')`. This normalization ensures consistent path joining regardless of whether the Markdown author included a leading slash in the link.

### Assembling Absolute GitHub URLs

The script prepends the sanitized relative path to a constant base URL defined at the top of [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js):

```javascript
const GITHUB_BASE = 'https://github.com/rohitg00/ai-engineering-from-scratch/tree/main/';

```

The resulting `url` string becomes the `url` property in the final lesson object stored in the `PHASES` array.

## Core Implementation in site/build.js

The critical logic resides around lines 65-78 of [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js). This section handles the regex extraction and URL construction for every lesson row detected in the source table. The script iterates through all matches, generating a complete URL for each valid lesson link before writing the aggregated data to [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js).

## Working with Generated URLs

Because the generation is automated, modifying the URL structure requires understanding how the source data flows through the build process.

### Adding a New Lesson

To generate a URL for a new lesson, simply add a Markdown link to the lesson table in [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md):

```markdown
| 23 | [My New Lesson](phases/05-new-phase/23-my-new-lesson/) | Build | Python |

```

The next CI run will automatically parse this entry and produce the corresponding absolute URL in [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js).

### Verifying the Output

After the build completes, each lesson entry in [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) contains a fully qualified GitHub URL:

```json
{
  "name": "Dev Environment",
  "status": "complete",
  "type": "Build",
  "lang": "Python",
  "url": "https://github.com/rohitg00/ai-engineering-from-scratch/tree/main/phases/00-setup-and-tooling/01-dev-environment/",
  "summary": "...",
  "keywords": "..."
}

```

You can validate the generation locally by running the build script and inspecting the first few entries:

```bash
node site/build.js   # regenerates site/data.js

grep '"url":' site/data.js | head

```

This command outputs the generated URLs, confirming that the pattern `GITHUB_BASE + relative_path` was applied correctly.

## Summary

- **[`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js)** automatically generates [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) by parsing [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) and [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md).
- URLs are constructed by extracting relative paths from Markdown lesson links using regex `/\[(.+?)\]\((.+?)\)/`.
- The **GitHub base URL** (`https://github.com/rohitg00/ai-engineering-from-scratch/tree/main/`) is prepended to normalized relative paths.
- The build runs on every push via GitHub Actions, ensuring URLs stay synchronized with repository changes.
- Manual verification is possible by running `node site/build.js` and inspecting the `url` fields in the generated output.

## Frequently Asked Questions

### What file generates site/data.js?

The **[`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js)** script generates [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js). It parses the repository's [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) for lesson structures and [`ROADMAP.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md) for status metadata, then outputs a JavaScript module containing the complete lesson array with absolute GitHub URLs.

### Where does the base URL come from?

The base URL is defined as a constant named **`GITHUB_BASE`** at the top of [`site/build.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/build.js). It is hardcoded to `https://github.com/rohitg00/ai-engineering-from-scratch/tree/main/` and serves as the prefix for all relative paths extracted from the README lesson links.

### How do I add a new lesson URL?

Add a new row to the lesson table in **[`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md)** containing a Markdown link with the relative path to the lesson directory. The build script will automatically detect the link format `[Lesson Name](relative/path/)`, extract the path, and generate the full GitHub URL in [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) during the next CI run.

### Does the URL generation run automatically?

Yes. The build process executes automatically via **GitHub Actions** on every push to the repository. This ensures that any changes to lesson locations, names, or additions in [`README.md`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/README.md) are immediately reflected in the [`site/data.js`](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/site/data.js) URLs without manual intervention.