Recommended Tools and Frameworks for AI Engineering from Scratch

The AI Engineering from Scratch curriculum requires Python 3.11+ with scientific computing libraries, Node.js 20+ for TypeScript lessons, and optionally Rust for tokenizer implementations, following a minimal-dependency philosophy while supporting production-grade AI development.

The AI Engineering from Scratch repository by Rohit Ghumare provides a comprehensive, polyglot curriculum for building AI systems from first principles. Unlike framework-heavy tutorials, this project emphasizes understanding core mechanisms through practical implementations in multiple languages. Whether you are implementing neural networks or deploying Model-Context-Protocol (MCP) servers, using the correct tools and frameworks ensures you can execute every lesson from linear algebra foundations to production-ready agents.

Core Programming Languages and Runtimes

The curriculum adopts a polyglot approach to reinforce AI concepts across different execution environments.

  • Python 3.11+ – The primary language for deep learning, data processing, and agent implementation. All Python lessons assume a modern CPython interpreter with virtual environment support.
  • Node.js 20+ – Required for TypeScript implementations of frameworks and streaming servers, particularly in the deep learning core and agent engineering phases.
  • Rust – Used specifically for high-performance tokenizer implementations in the "LLMs from Scratch" phase, requiring the standard rustc compiler and Cargo toolchain.
  • Julia – Referenced for numerical computing lessons where high-performance scientific computing is emphasized.

Python Scientific Stack and Deep Learning Libraries

Located at the repository root, requirements.txt defines the minimal but complete Python dependency set. According to the source code, this includes:

  • Core scientific libraries: numpy, pandas, matplotlib for data manipulation and visualization
  • Deep learning frameworks: torch and torchvision for tensor operations and neural network training
  • NLP and LLM tooling: transformers, datasets, tokenizers, and accelerate for working with pre-trained models and efficient training pipelines
  • Interactive development: jupyter for notebook-based exploration throughout the curriculum

Install these dependencies using:

python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

pip install -r requirements.txt

Node.js and TypeScript Tooling

The TypeScript lessons, particularly in phases/03-deep-learning-core/ and agent engineering modules, rely on modern Node.js patterns. The site/build.js script and lesson implementations use:

  • Hono – Lightweight web framework for building API servers
  • Zod – Schema validation for type-safe configuration and data handling
  • WS – WebSocket library for real-time streaming implementations

Install lesson-specific dependencies by navigating to the code directory and running:

cd phases/03-deep-learning-core/10-mini-framework/code
npm install
npx ts-node main.ts

Rust Toolchain for Low-Level Components

For lessons covering tokenization and low-level algorithms, the repository includes Rust implementations that compile to native code. The phases/10-llms-from-scratch/01-tokenizers/code/ directory contains examples requiring:

cd phases/10-llms-from-scratch/01-tokenizers/code
rustc main.rs -O && ./main

This produces optimized binaries for understanding byte-pair encoding and vocabulary generation without Python overhead.

Auxiliary Development Tools

Beyond core languages, the project recommends:

  • Docker – Introduced in Phase 0 ("Docker for AI") for containerizing development environments and ensuring reproducible builds across operating systems
  • Jupyter Notebooks – Integrated throughout for exploratory data analysis and visualization, installed via the jupyter package in requirements.txt

Agent and Protocol Stack

The later phases implement reusable AI components using pure Python from the standard library plus optional dependencies. Key artifacts include:

  • Skills and Agents – Modular components defined in lesson outputs and installed via scripts/install_skills.py
  • MCP (Model-Context-Protocol) Servers – Lightweight protocol implementations for agent communication, located in phases/14-agent-engineering/ outputs

These components require no additional frameworks beyond the Python packages listed in requirements.txt, adhering to the project's minimal-dependency philosophy while enabling production deployment.

Running Your First Lesson

To verify your environment across all supported languages:


# Clone the repository

git clone https://github.com/rohitg00/ai-engineering-from-scratch.git
cd ai-engineering-from-scratch

# Python: Linear algebra foundations

python phases/01-math-foundations/01-linear-algebra-intuition/code/vectors.py

# TypeScript: Mini-framework demonstration

cd phases/03-deep-learning-core/10-mini-framework/code
npm install && npx ts-node main.ts

# Rust: Tokenization basics

cd phases/10-llms-from-scratch/01-tokenizers/code
rustc main.rs -O && ./main

Key Repository Files

Understanding the repository structure helps navigate the tool requirements:

  • README.md – Curriculum overview and phase structure
  • requirements.txt – Canonical Python dependency list including torch, transformers, and datasets
  • site/build.js – Node.js script for static site generation using modern web tooling
  • scripts/install_skills.py – Installer for reusable lesson artifacts (agents, skills, MCP servers)
  • phases/*/code/ – Runnable implementations for each lesson across all supported languages
  • phases/*/docs/en.md – Narrative documentation explaining concepts and learning objectives
  • phases/*/outputs/ – Generated artifacts including prompts, skills, and MCP server definitions

Summary

  • Python 3.11+ with packages from requirements.txt (numpy, pandas, torch, transformers, etc.) forms the primary development environment
  • Node.js 20+ with npm packages (hono, zod, ws) enables TypeScript lessons and framework implementations
  • Rust toolchain is required only for tokenizer and low-level performance-critical lessons
  • Docker support in Phase 0 provides containerized workflows for reproducible AI development
  • MCP servers and agent skills run on pure Python with minimal external dependencies, emphasizing production-ready simplicity

Frequently Asked Questions

Do I need to install all languages to use the project?

No. While the curriculum supports Python, TypeScript, Rust, and Julia, you can follow individual phases using only the language specified for that lesson. Python is the only required language for the majority of the AI/ML content; TypeScript and Rust are optional for specific implementation deep-dives.

Can I use a different Python version than 3.11?

The project is tested against Python 3.11+, as specified in the repository requirements. Earlier versions may work for basic lessons but could cause compatibility issues with newer torch or transformers features used in the deep learning and LLM phases.

What is the purpose of the install_skills.py script?

Located in scripts/install_skills.py, this utility installs reusable artifacts generated by lessons—including skills, agents, and MCP servers—into your local environment. It enables you to compose and reuse AI components built throughout the curriculum without manual copy-pasting.

Is Docker mandatory for running the lessons?

No. Docker is introduced in Phase 0 as an optional tool for containerizing development environments. You can run all Python, TypeScript, and Rust lessons directly on your host system provided you have the required language runtimes installed.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →