internals

LeakyReLU Squared Activation Function Implementation in OpenAI Parameter-Golf

April 17, 2026 openai/parameter-golf ↗

The LeakyReLU squared activation function in the parameter-golf repository is implemented as an inline two-step operation: applying F.leaky_relu with a negative slope of 0.5 followed by an element-wise .square() operation.

The OpenAI parameter-golf repository explores parameter-efficient neural architectures through constrained training experiments. The LeakyReLU squared activation function appears consistently across multiple model configurations, providing a smooth, non-negative quadratic output while preserving gradient flow for negative inputs.

Implementation Pattern

The implementation is not encapsulated as a dedicated layer module. Instead, it is built inline where the MLP’s linear projection is applied, following a strict two-stage pipeline:

Leaky ReLU: Applied with negative_slope=0.5 to preserve gradient information for negative values
Element-wise squaring: The .square() operation ensures non-negative outputs and adds quadratic curvature

In train_gpt_decode.py at line 461, the activation appears as:

return self.proj(F.leaky_relu(self.fc(x), negative_slope=0.5).square())

This same pattern repeats across multiple experiment files in the records/track_10min_16mb/ directory:

train_gpt_human.py at line 429 for the GPTQ-Embeddings experiment
train_gpt.py at line 545 for the Vocab-4096 MLP-mult 4 configuration
train_gpt.py at line 331 for the Mini-Depth Recurrence model (using a configurable self.neg_slope parameter)

Technical Breakdown

Mathematical Formulation

The LeakyReLU squared activation function computes:


f(x) = max(0.5 * x, x)^2

Where:

For positive inputs: f(x) = x^2 (standard quadratic)
For negative inputs: f(x) = (0.5 * x)^2 = 0.25 * x^2 (attenuated quadratic)

This formulation ensures non-negative outputs and smooth gradients for all inputs, unlike standard ReLU which yields zero gradients for negative values.

Reusable Module Implementation

While the repository uses inline composition for brevity, the pattern can be encapsulated for reuse:

import torch
import torch.nn as nn
import torch.nn.functional as F

class LeakyReLUSquared(nn.Module):
    """Leaky ReLU with slope 0.5, followed by element-wise square."""
    def __init__(self, negative_slope: float = 0.5):
        super().__init__()
        self.negative_slope = negative_slope

    def forward(self, x):
        return F.leaky_relu(x, negative_slope=self.negative_slope).square()

# Example usage inside an MLP block

class MLPBlock(nn.Module):
    def __init__(self, dim_in: int, dim_out: int):
        super().__init__()
        self.fc = nn.Linear(dim_in, dim_out)
        self.proj = nn.Linear(dim_out, dim_out)
        self.act = LeakyReLUSquared()

    def forward(self, x):
        x = self.fc(x)
        x = self.act(x)
        return self.proj(x)

Running the example:

x = torch.randn(4, 8)
block = MLPBlock(8, 32)
y = block(x)
print(y.shape)  # torch.Size([4, 32])

Summary

The LeakyReLU squared activation function in parameter-golf is implemented as F.leaky_relu(..., negative_slope=0.5).square() inline within MLP blocks
Key source locations include train_gpt_decode.py (line 461), train_gpt_human.py (line 429), and train_gpt.py (lines 331 and 545)
The activation uses a negative slope of 0.5, providing gradients for negative inputs at 25% strength after squaring
No dedicated layer module exists; the function is composed directly using PyTorch operations for maximum parameter efficiency

Frequently Asked Questions

What is the negative slope value used in the LeakyReLU squared implementation?

The implementation uses a negative slope of 0.5 for the LeakyReLU component. This value appears consistently across all experiment files including the Hessian SD-Clip, GPTQ-Embeddings, and Vocab-4096 configurations. The Mini-Depth Recurrence experiment at line 331 of train_gpt.py uses a configurable self.neg_slope parameter, but the standard value remains 0.5.

Why square the output of LeakyReLU instead of using standard ReLU?

Squaring the LeakyReLU output creates a smooth, non-negative quadratic activation that provides non-linear curvature for all inputs while preserving gradients. Unlike standard ReLU which yields zero gradients for negative inputs, the LeakyReLU squared formulation maintains gradient flow at 25% strength (0.5 squared) for negative values. This approach also ensures strictly non-negative outputs without requiring absolute value operations.

Where can I find the exact implementation in the source code?

The exact implementation appears in multiple training scripts within the records/track_10min_16mb/ directory. The most prominent occurrence is in train_gpt_decode.py at line 461, where the activation appears as F.leaky_relu(self.fc(x), negative_slope=0.5).square(). You can also find identical patterns in train_gpt_human.py at line 429 and train_gpt.py at lines 331 and 545 across different experimental configurations.

Can I use a different negative slope value with this activation function?

Yes, the implementation supports configurable negative slopes. While the standard experiments use 0.5, the Mini-Depth Recurrence experiment in train_gpt.py at line 331 demonstrates a configurable implementation using self.neg_slope. When modifying the slope, remember that the final gradient scaling for negative inputs will be the square of your chosen slope value (e.g., slope 0.3 yields 0.09 gradient scaling).

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how openai/parameter-golf works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →