AI Engineering Best Practices Demonstrated in the ai-engineering-from-scratch Repository
This repository encodes ten production-grade best practices for AI engineering, including modular lesson architecture, dual implementation strategies, zero-dependency policies, and automated artifact generation.
The ai-engineering-from-scratch repository by Rohit Ghumare serves as both a comprehensive curriculum and a reference implementation of professional AI engineering standards. Unlike typical educational resources that prioritize theory over practice, this codebase demonstrates how to build, test, document, and ship AI systems using reproducible, auditable workflows. Below is a detailed breakdown of the architectural decisions, conventions, and automation strategies that make this repository a blueprint for scalable AI development.
Modular Lesson Architecture with Strict Directory Conventions
Every concept in the curriculum lives in a self-contained folder following a rigid three-part structure: code/, docs/, and outputs/. As documented in the README.md at lines 84-92, this layout guarantees that every lesson is reproducible, discoverable, and portable.
The code/ directory contains the algorithmic implementations, docs/ stores the explanatory material including front-matter metadata, and outputs/ houses reusable artifacts such as prompts, skills, and MCP servers. This separation of concerns allows automated tooling to parse the curriculum programmatically while keeping human-readable documentation adjacent to executable code.
The "Build-It / Use-It" Dual Implementation Strategy
A core pedagogical principle in this repository requires every algorithm to be implemented twice: first from raw mathematical foundations, then using production-grade libraries. This pattern, illustrated in the six-phase pipeline diagram, forces deep understanding of underlying mechanics before developers rely on black-box frameworks.
For example, a lesson on backpropagation would include a pure NumPy implementation in phases/03-deep-learning-core/03-backpropagation/code/backpropagation.py alongside a PyTorch equivalent. This approach ensures that engineers understand gradient flow at the tensor level while still being proficient with optimized industrial tools.
Zero-Dependency First Principles and Explicit Allowlists
The repository enforces a strict zero-dependency policy wherever possible, permitting only standard library modules or explicitly allowed packages documented in [AGENTS.md at lines 67-73](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/AGENTS.md#L67-L73). This constraint keeps examples lightweight, portable across environments, and easy to audit for security vulnerabilities.
When external dependencies are necessary, they must be declared in the lesson metadata and justified pedagogically. This practice mirrors production environments where dependency minimization reduces supply chain attack surfaces and simplifies deployment to containerized or edge environments.
Rigorous Per-Lesson Testing Contracts
Functional correctness is guaranteed through a mandatory testing structure where each code/ folder ships with a tests/ suite that must exit with code 0. The testing contract in AGENTS.md specifies that every lesson must include unit tests validating both the manual implementation and the library-based version.
To run validation locally, navigate to any lesson directory and execute the appropriate test runner:
cd phases/03-deep-learning-core/03-backpropagation
python -m unittest discover -v
This requirement serves as living documentation while preventing regressions as the curriculum evolves or dependencies update.
Automated CI Validation and Curriculum Integrity
The repository maintains quality through GitHub Actions defined in [.github/workflows/curriculum.yml](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/.github/workflows/curriculum.yml), which triggers on every pull request. This pipeline executes scripts/audit_lessons.py to validate lesson structure, metadata compliance, and test coverage, while scripts/check_readme_counts.py automatically synchronizes the README badge with the actual lesson count.
This automation prevents "drift" between documentation and code, enforces contribution standards, and ensures the public-facing site remains synchronized with the repository state. The CI pipeline also rebuilds the static site using site/build.js, which parses the declarative metadata from each lesson's docs/en.md front-matter.
Conventional Commits and Atomic Lesson Boundaries
Version control hygiene is enforced through strict commit conventions documented in [AGENTS.md at lines 12-16](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/AGENTS.md#L12-L16). The repository requires one commit per lesson with a concise, structured subject line following the pattern:
git add phases/15-agent-engineering/99-new-lesson/
git commit -m "feat(phase-15/99): add new-lesson"
git push origin my-branch
This atomic commit strategy makes history readable, supports automated changelog generation, and enables git bisect debugging when lessons introduce breaking changes.
Declarative Metadata for Tooling Integration
Each lesson includes structured front-matter in docs/en.md that declares the lesson type, programming language, prerequisites, and learning objectives. According to the lesson front-matter specification in AGENTS.md, this metadata enables the site/build.js generator to construct a navigable curriculum automatically.
This declarative approach allows the repository to function as a queryable knowledge graph, where learners can filter lessons by prerequisite knowledge or technology stack without manually browsing directories.
Reusable Artifacts and Skill Installation
A distinguishing feature of this repository is its treatment of learning outcomes as production assets. Each lesson produces artifacts—prompts, skills, agents, or MCP servers—stored in the outputs/ directory. These can be installed into development environments using the repository-wide installation script:
python3 scripts/install_skills.py
This script, located at [scripts/install_skills.py](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/scripts/install_skills.py), parses all outputs/*.md files and copies them into user-accessible directories for tools like Claude, Cursor, or other AI coding assistants. This bridges the gap between educational content and practical utility, turning theoretical knowledge into immediately deployable components.
Transparent Versioning and Roadmap Communication
Curriculum development is tracked transparently through [ROADMAP.md](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/ROADMAP.md), which documents phase status and lesson completion. The README features a live badge showing the current lesson count, maintained automatically by the CI pipeline.
This visibility provides a single source of truth for contributors and learners regarding curriculum maturity, planned content, and phase completion status. It demonstrates best practices for open-source project management, where roadmap alignment precedes code implementation.
Community Infrastructure and Legal Protection
The repository includes comprehensive community governance files: an MIT license, [CONTRIBUTING.md](https://github.com/rohitg00/ai-engineering-from-scratch/blob/main/CONTRIBUTING.md), and CODE_OF_CONDUCT.md. These documents establish contribution guidelines, define the dependency allowlist policy, and provide legal protection for both maintainers and contributors.
This infrastructure encourages community participation while maintaining quality standards, mirroring the governance models of major open-source AI projects like Hugging Face Transformers or PyTorch.
Practical Workflow Examples
To execute a single lesson implementation directly:
git clone https://github.com/rohitg00/ai-engineering-from-scratch.git
cd ai-engineering-from-scratch
python phases/01-math-foundations/01-linear-algebra-intuition/code/vectors.py
This pattern, documented in the README quick-start, demonstrates the repository's emphasis on immediate executability without complex environment setup.
Summary
The ai-engineering-from-scratch repository demonstrates that educational resources can simultaneously serve as production-grade reference implementations. Key takeaways include:
- Modular architecture with strict
code/docs/outputsseparation ensures reproducibility - Dual implementation (raw math + production libraries) builds deep understanding
- Zero-dependency policies and explicit allowlists minimize attack surfaces
- Mandatory testing per lesson enforces functional correctness
- Automated CI validation prevents documentation drift and enforces standards
- Atomic commits with conventional formatting maintain readable history
- Declarative metadata enables automated curriculum generation
- Reusable artifacts bridge learning and production via
scripts/install_skills.py - Transparent roadmapping aligns community expectations with development progress
- Complete governance infrastructure supports sustainable open-source growth
These conventions scale from single lessons to 500-plus lesson curricula while maintaining auditability and portability across environments.
Frequently Asked Questions
What makes the ai-engineering-from-scratch repository different from other AI courses?
Unlike traditional courses that provide only high-level explanations, this repository requires you to implement algorithms from mathematical foundations before using production libraries. According to the source code structure in phases/*/*/code/, every concept includes both a "raw" implementation and a library-based version, ensuring you understand the underlying mechanics rather than just API calls.
How does the repository ensure code quality across hundreds of lessons?
Quality is enforced through automated validation in .github/workflows/curriculum.yml and the scripts/audit_lessons.py tool. Every lesson must include a tests/ directory that passes with exit code 0, and the CI pipeline verifies metadata compliance, structure integrity, and lesson count accuracy on every pull request, as specified in AGENTS.md.
Can I use the lesson outputs in my own projects?
Yes. The repository treats learning outcomes as reusable assets. Each lesson produces artifacts in its outputs/ directory, and you can install these skills, prompts, or MCP servers into your development environment using python3 scripts/install_skills.py. This script parses the markdown outputs and makes them available to AI coding assistants like Claude or Cursor.
What are the contribution requirements for adding new lessons?
Contributions must follow the conventions in AGENTS.md: one atomic commit per lesson with a conventional commit message (e.g., feat(phase-15/99): add new-lesson), inclusion of a tests/ suite that passes validation, and adherence to the zero-dependency policy where possible. The lesson must also include declarative front-matter in docs/en.md describing prerequisites and learning objectives.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →