parameter-golf

Logit Softcap Transformation in OpenAI Parameter-Golf: PyTorch and MLX Implementation

Discover the logit softcap transformation in OpenAI parameter-golf. Learn how this PyTorch and MLX implementation bounds logits to prevent instability and preserve prediction distribution. Optimize your models!

How Gradient Accumulation Works in an 8xH100 Distributed Setup

Discover how gradient accumulation optimizes an 8xH100 distributed setup. Learn how micro-batches and GPU processing maintain a constant effective batch size for superior performance.

How Tied Embeddings Are Implemented and Initialized Efficiently in Parameter-Golf

Learn how tied embeddings are efficiently implemented and initialized in parameter-golf. This technique halves memory usage by reusing input embedding weights for the language modeling head.

How to Configure bfloat16 Mixed Precision Training in Parameter-Golf

Configure bfloat16 mixed precision training in openai parameter-golf. Learn how to speed up models with PyTorch autocast or MLX backend settings for efficient deep learning.

how-to-guide

How to Configure Partial Rotary Position Embedding (RoPE) in OpenAI Parameter Golf

Learn to configure partial Rotary Position Embedding (RoPE) in OpenAI Parameter Golf. Reduce computation by applying RoPE to a subset of dimensions using rope_dims for improved efficiency.

how-to-guide

LeakyReLU Squared Activation Function Implementation in OpenAI Parameter-Golf

Discover how the LeakyReLU squared activation function is implemented in OpenAI's parameter-golf repository. Learn the efficient two-step inline operation.

How LZMA Code Compression Reduces Submission Artifact Size in Parameter Golf

Discover how LZMA code compression slashes submission artifact size by 39% for OpenAI parameter golf. Learn about high-ratio entropy coding and quantized weight streams enabling smaller wrappers.

Exponential Moving Average (EMA) Implementation for Small Models in Parameter-Golf

Learn how Exponential Moving Average EMA is implemented for small models in Parameter-Golf. Discover its lightweight dictionary-based system.

How the Learning Rate Warmdown Schedule Works in OpenAI Parameter Golf

Learn how the learning rate warmdown schedule in OpenAI parameter golf stabilizes convergence with linear decay. Understand its final training steps, wall-clock time, and budget fraction.

Optimal Hyperparameters for Training Under 10 Minutes: The Complete Parameter Golf Record

Discover optimal hyperparameters for training under 10 minutes. Achieve 1.0810 bits-per-byte using an 11-layer transformer, MuonEq-R, and int6 quantization on 8xH100 GPUs.

How Skip Weights Enable Depth Recurrence in Parameter-Golf's GPT Architecture

Discover how skip weights enable depth recurrence in Parameter-Golf's GPT architecture. Learn how learned scalars modulate encoder activations for dynamic revisiting of depth representations.