parameter-golf

Train the smallest LM you can that fits in 16MB. Best model wins!

21 articles 4.9k View on GitHub ↗
21 articles
How to Configure Flash Attention with Grouped Query Attention (GQA) in Parameter-Golf

Learn to configure Flash Attention with Grouped Query Attention (GQA) in Parameter-Golf for faster training. Enable Flash Attention and set num_kv_heads below num_heads.

how-to-guide
Apr 17, 2026
Logit Softcap Transformation in OpenAI Parameter-Golf: PyTorch and MLX Implementation

Discover the logit softcap transformation in OpenAI parameter-golf. Learn how this PyTorch and MLX implementation bounds logits to prevent instability and preserve prediction distribution. Optimize your models!

internals
Apr 17, 2026
How Gradient Accumulation Works in an 8xH100 Distributed Setup

Discover how gradient accumulation optimizes an 8xH100 distributed setup. Learn how micro-batches and GPU processing maintain a constant effective batch size for superior performance.

deep-dive
Apr 17, 2026
How Tied Embeddings Are Implemented and Initialized Efficiently in Parameter-Golf

Learn how tied embeddings are efficiently implemented and initialized in parameter-golf. This technique halves memory usage by reusing input embedding weights for the language modeling head.

deep-dive
Apr 17, 2026
How to Configure bfloat16 Mixed Precision Training in Parameter-Golf

Configure bfloat16 mixed precision training in openai parameter-golf. Learn how to speed up models with PyTorch autocast or MLX backend settings for efficient deep learning.

how-to-guide
Apr 17, 2026
How to Configure Partial Rotary Position Embedding (RoPE) in OpenAI Parameter Golf

Learn to configure partial Rotary Position Embedding (RoPE) in OpenAI Parameter Golf. Reduce computation by applying RoPE to a subset of dimensions using rope_dims for improved efficiency.

how-to-guide
Apr 17, 2026
LeakyReLU Squared Activation Function Implementation in OpenAI Parameter-Golf

Discover how the LeakyReLU squared activation function is implemented in OpenAI's parameter-golf repository. Learn the efficient two-step inline operation.

internals
Apr 17, 2026
How LZMA Code Compression Reduces Submission Artifact Size in Parameter Golf

Discover how LZMA code compression slashes submission artifact size by 39% for OpenAI parameter golf. Learn about high-ratio entropy coding and quantized weight streams enabling smaller wrappers.

deep-dive
Apr 17, 2026
Exponential Moving Average (EMA) Implementation for Small Models in Parameter-Golf

Learn how Exponential Moving Average EMA is implemented for small models in Parameter-Golf. Discover its lightweight dictionary-based system.

internals
Apr 17, 2026
How the Learning Rate Warmdown Schedule Works in OpenAI Parameter Golf

Learn how the learning rate warmdown schedule in OpenAI parameter golf stabilizes convergence with linear decay. Understand its final training steps, wall-clock time, and budget fraction.

deep-dive
Apr 17, 2026
Optimal Hyperparameters for Training Under 10 Minutes: The Complete Parameter Golf Record

Discover optimal hyperparameters for training under 10 minutes. Achieve 1.0810 bits-per-byte using an 11-layer transformer, MuonEq-R, and int6 quantization on 8xH100 GPUs.

deep-dive
Apr 17, 2026
How Skip Weights Enable Depth Recurrence in Parameter-Golf's GPT Architecture

Discover how skip weights enable depth recurrence in Parameter-Golf's GPT architecture. Learn how learned scalars modulate encoder activations for dynamic revisiting of depth representations.

internals
Apr 17, 2026

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →