How site/data.js URL Generation Works in AI Engineering From Scratch
The site/data.js URLs are generated by the site/build.js script, which parses lesson links from README.md, extracts their relative paths, and prefixes them with the GitHub repository base URL to create absolute links.
In the rohitg00/ai-engineering-from-scratch repository, the web UI consumes a JavaScript array of lessons exported by site/data.js. This file is not maintained manually; instead, a build script automatically constructs the url properties for each lesson by transforming relative Markdown paths into full GitHub URLs. This ensures the lesson links remain synchronized with the actual repository structure.
The Build Pipeline Architecture
According to the source code, the URL generation process is contained entirely within site/build.js. This script executes during every push via GitHub Actions, reading the master lesson table from README.md and outputting a structured JavaScript module to site/data.js. The pipeline combines data from README.md (lesson locations) and ROADMAP.md (lesson status) to construct the final lesson objects.
Step-by-Step URL Generation Logic
The transformation from Markdown link to absolute URL occurs in three distinct phases inside the parseReadme function.
Parsing Lesson Links from README.md
The build script scans each row of the lesson table in README.md using a regular expression to identify Markdown links. When a lesson entry contains a link like [Dev Environment](phases/00-setup-and-tooling/01-dev-environment/), the script captures both the display text and the relative path.
// Inside parseReadme – line 65-78 region
const linkMatch = lessonCol.match(/\[(.+?)\]\((.+?)\)/);
let lessonName, url;
if (linkMatch) {
lessonName = linkMatch[1];
const relativePath = linkMatch[2];
// Build the absolute GitHub URL
url = GITHUB_BASE + relativePath.replace(/^\//, '');
}
Normalizing Relative Paths
Before concatenation, the script sanitizes the extracted path by removing any leading forward slash using .replace(/^\//, ''). This normalization ensures consistent path joining regardless of whether the Markdown author included a leading slash in the link.
Assembling Absolute GitHub URLs
The script prepends the sanitized relative path to a constant base URL defined at the top of site/build.js:
const GITHUB_BASE = 'https://github.com/rohitg00/ai-engineering-from-scratch/tree/main/';
The resulting url string becomes the url property in the final lesson object stored in the PHASES array.
Core Implementation in site/build.js
The critical logic resides around lines 65-78 of site/build.js. This section handles the regex extraction and URL construction for every lesson row detected in the source table. The script iterates through all matches, generating a complete URL for each valid lesson link before writing the aggregated data to site/data.js.
Working with Generated URLs
Because the generation is automated, modifying the URL structure requires understanding how the source data flows through the build process.
Adding a New Lesson
To generate a URL for a new lesson, simply add a Markdown link to the lesson table in README.md:
| 23 | [My New Lesson](phases/05-new-phase/23-my-new-lesson/) | Build | Python |
The next CI run will automatically parse this entry and produce the corresponding absolute URL in site/data.js.
Verifying the Output
After the build completes, each lesson entry in site/data.js contains a fully qualified GitHub URL:
{
"name": "Dev Environment",
"status": "complete",
"type": "Build",
"lang": "Python",
"url": "https://github.com/rohitg00/ai-engineering-from-scratch/tree/main/phases/00-setup-and-tooling/01-dev-environment/",
"summary": "...",
"keywords": "..."
}
You can validate the generation locally by running the build script and inspecting the first few entries:
node site/build.js # regenerates site/data.js
grep '"url":' site/data.js | head
This command outputs the generated URLs, confirming that the pattern GITHUB_BASE + relative_path was applied correctly.
Summary
site/build.jsautomatically generatessite/data.jsby parsingREADME.mdandROADMAP.md.- URLs are constructed by extracting relative paths from Markdown lesson links using regex
/\[(.+?)\]\((.+?)\)/. - The GitHub base URL (
https://github.com/rohitg00/ai-engineering-from-scratch/tree/main/) is prepended to normalized relative paths. - The build runs on every push via GitHub Actions, ensuring URLs stay synchronized with repository changes.
- Manual verification is possible by running
node site/build.jsand inspecting theurlfields in the generated output.
Frequently Asked Questions
What file generates site/data.js?
The site/build.js script generates site/data.js. It parses the repository's README.md for lesson structures and ROADMAP.md for status metadata, then outputs a JavaScript module containing the complete lesson array with absolute GitHub URLs.
Where does the base URL come from?
The base URL is defined as a constant named GITHUB_BASE at the top of site/build.js. It is hardcoded to https://github.com/rohitg00/ai-engineering-from-scratch/tree/main/ and serves as the prefix for all relative paths extracted from the README lesson links.
How do I add a new lesson URL?
Add a new row to the lesson table in README.md containing a Markdown link with the relative path to the lesson directory. The build script will automatically detect the link format [Lesson Name](relative/path/), extract the path, and generate the full GitHub URL in site/data.js during the next CI run.
Does the URL generation run automatically?
Yes. The build process executes automatically via GitHub Actions on every push to the repository. This ensures that any changes to lesson locations, names, or additions in README.md are immediately reflected in the site/data.js URLs without manual intervention.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →