Recommended Tools and Frameworks for AI Engineering from Scratch
The AI Engineering from Scratch curriculum requires Python 3.11+ with scientific computing libraries, Node.js 20+ for TypeScript lessons, and optionally Rust for tokenizer implementations, following a minimal-dependency philosophy while supporting production-grade AI development.
The AI Engineering from Scratch repository by Rohit Ghumare provides a comprehensive, polyglot curriculum for building AI systems from first principles. Unlike framework-heavy tutorials, this project emphasizes understanding core mechanisms through practical implementations in multiple languages. Whether you are implementing neural networks or deploying Model-Context-Protocol (MCP) servers, using the correct tools and frameworks ensures you can execute every lesson from linear algebra foundations to production-ready agents.
Core Programming Languages and Runtimes
The curriculum adopts a polyglot approach to reinforce AI concepts across different execution environments.
- Python 3.11+ – The primary language for deep learning, data processing, and agent implementation. All Python lessons assume a modern CPython interpreter with virtual environment support.
- Node.js 20+ – Required for TypeScript implementations of frameworks and streaming servers, particularly in the deep learning core and agent engineering phases.
- Rust – Used specifically for high-performance tokenizer implementations in the "LLMs from Scratch" phase, requiring the standard
rustccompiler and Cargo toolchain. - Julia – Referenced for numerical computing lessons where high-performance scientific computing is emphasized.
Python Scientific Stack and Deep Learning Libraries
Located at the repository root, requirements.txt defines the minimal but complete Python dependency set. According to the source code, this includes:
- Core scientific libraries:
numpy,pandas,matplotlibfor data manipulation and visualization - Deep learning frameworks:
torchandtorchvisionfor tensor operations and neural network training - NLP and LLM tooling:
transformers,datasets,tokenizers, andacceleratefor working with pre-trained models and efficient training pipelines - Interactive development:
jupyterfor notebook-based exploration throughout the curriculum
Install these dependencies using:
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
Node.js and TypeScript Tooling
The TypeScript lessons, particularly in phases/03-deep-learning-core/ and agent engineering modules, rely on modern Node.js patterns. The site/build.js script and lesson implementations use:
- Hono – Lightweight web framework for building API servers
- Zod – Schema validation for type-safe configuration and data handling
- WS – WebSocket library for real-time streaming implementations
Install lesson-specific dependencies by navigating to the code directory and running:
cd phases/03-deep-learning-core/10-mini-framework/code
npm install
npx ts-node main.ts
Rust Toolchain for Low-Level Components
For lessons covering tokenization and low-level algorithms, the repository includes Rust implementations that compile to native code. The phases/10-llms-from-scratch/01-tokenizers/code/ directory contains examples requiring:
cd phases/10-llms-from-scratch/01-tokenizers/code
rustc main.rs -O && ./main
This produces optimized binaries for understanding byte-pair encoding and vocabulary generation without Python overhead.
Auxiliary Development Tools
Beyond core languages, the project recommends:
- Docker – Introduced in Phase 0 ("Docker for AI") for containerizing development environments and ensuring reproducible builds across operating systems
- Jupyter Notebooks – Integrated throughout for exploratory data analysis and visualization, installed via the
jupyterpackage inrequirements.txt
Agent and Protocol Stack
The later phases implement reusable AI components using pure Python from the standard library plus optional dependencies. Key artifacts include:
- Skills and Agents – Modular components defined in lesson outputs and installed via
scripts/install_skills.py - MCP (Model-Context-Protocol) Servers – Lightweight protocol implementations for agent communication, located in
phases/14-agent-engineering/outputs
These components require no additional frameworks beyond the Python packages listed in requirements.txt, adhering to the project's minimal-dependency philosophy while enabling production deployment.
Running Your First Lesson
To verify your environment across all supported languages:
# Clone the repository
git clone https://github.com/rohitg00/ai-engineering-from-scratch.git
cd ai-engineering-from-scratch
# Python: Linear algebra foundations
python phases/01-math-foundations/01-linear-algebra-intuition/code/vectors.py
# TypeScript: Mini-framework demonstration
cd phases/03-deep-learning-core/10-mini-framework/code
npm install && npx ts-node main.ts
# Rust: Tokenization basics
cd phases/10-llms-from-scratch/01-tokenizers/code
rustc main.rs -O && ./main
Key Repository Files
Understanding the repository structure helps navigate the tool requirements:
README.md– Curriculum overview and phase structurerequirements.txt– Canonical Python dependency list includingtorch,transformers, anddatasetssite/build.js– Node.js script for static site generation using modern web toolingscripts/install_skills.py– Installer for reusable lesson artifacts (agents, skills, MCP servers)phases/*/code/– Runnable implementations for each lesson across all supported languagesphases/*/docs/en.md– Narrative documentation explaining concepts and learning objectivesphases/*/outputs/– Generated artifacts including prompts, skills, and MCP server definitions
Summary
- Python 3.11+ with packages from
requirements.txt(numpy, pandas, torch, transformers, etc.) forms the primary development environment - Node.js 20+ with npm packages (hono, zod, ws) enables TypeScript lessons and framework implementations
- Rust toolchain is required only for tokenizer and low-level performance-critical lessons
- Docker support in Phase 0 provides containerized workflows for reproducible AI development
- MCP servers and agent skills run on pure Python with minimal external dependencies, emphasizing production-ready simplicity
Frequently Asked Questions
Do I need to install all languages to use the project?
No. While the curriculum supports Python, TypeScript, Rust, and Julia, you can follow individual phases using only the language specified for that lesson. Python is the only required language for the majority of the AI/ML content; TypeScript and Rust are optional for specific implementation deep-dives.
Can I use a different Python version than 3.11?
The project is tested against Python 3.11+, as specified in the repository requirements. Earlier versions may work for basic lessons but could cause compatibility issues with newer torch or transformers features used in the deep learning and LLM phases.
What is the purpose of the install_skills.py script?
Located in scripts/install_skills.py, this utility installs reusable artifacts generated by lessons—including skills, agents, and MCP servers—into your local environment. It enables you to compose and reuse AI components built throughout the curriculum without manual copy-pasting.
Is Docker mandatory for running the lessons?
No. Docker is introduced in Phase 0 as an optional tool for containerizing development environments. You can run all Python, TypeScript, and Rust lessons directly on your host system provided you have the required language runtimes installed.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →