# How LZMA Code Compression Reduces Submission Artifact Size in Parameter Golf

> Discover how LZMA code compression slashes submission artifact size by 39% for OpenAI parameter golf. Learn about high-ratio entropy coding and quantized weight streams enabling smaller wrappers.

- Repository: [OpenAI/parameter-golf](https://github.com/openai/parameter-golf)
- Tags: deep-dive
- Published: 2026-04-17

---

**LZMA code compression reduces submission artifacts by 39% compared to zlib by applying high-ratio entropy coding to tightly packed, quantized weight streams, enabling a ~16.6 KB self-extracting wrapper that fits the 16 MB competition limit.**

The `openai/parameter-golf` repository demonstrates how extreme quantization combined with Lempel-Ziv-Markov chain Algorithm (LZMA) compression creates tiny model submissions. By first quantizing neural network weights to binary or ternary values, packing them into dense byte streams, and then applying LZMA's dictionary-based range coding, the pipeline achieves dramatic size reductions without sacrificing the ability to reconstruct full tensors at runtime.

## The Multi-Stage Compression Pipeline

The repository implements a three-stage pipeline where LZMA serves as the final entropy coding layer. Each stage progressively reduces the information entropy of the weight tensors to maximize compressibility.

### Quantization and Packing

Model weights are first quantized to ultra-low-precision formats to minimize information content. The `parameter-golf` codebase supports **1-bit binary** (values in `{-1, +1}`), **3-value ternary** (values in `{-1, 0, +1}`), and **int6/int8** quantizations.

After quantization, the discrete values are tightly packed into byte arrays:

- **Bit-packing** stores each binary weight in a single bit, yielding an 8× density improvement over `int8` storage.
- **Base-3 packing** maps ternary values to trits (0, 1, 2) and encodes 5 trits per byte (approximately 1.585 bits per trit), as implemented in the binary/ternary packing utilities.

These packed streams are already 30–40% smaller than a standard `int8` + `zlib` baseline before LZMA is applied, according to the results documented in [`records/track_non_record_16mb/2026-03-24_106M_Binary_Asymmetric_UNet_FP8_15L_8192BPE_YaRN_NeoMuon_Smear/RESULTS.md`](https://github.com/openai/parameter-golf/blob/main/records/track_non_record_16mb/2026-03-24_106M_Binary_Asymmetric_UNet_FP8_15L_8192BPE_YaRN_NeoMuon_Smear/RESULTS.md).

### LZMA Entropy Coding

The packed byte streams are handed to Python's `lzma` module with **preset 9** (the highest compression level). LZMA combines a large sliding-window LZ77 dictionary with range coding, which excels at exploiting the high redundancy and long runs of identical values that remain after quantization and packing.

For ternary models, this final compression stage achieves a **≈39% size reduction compared to the int8 + zlib baseline**, as reported in the same results file. The dictionary compression effectively encodes repetitive weight patterns—common in quantized neural networks—using back-references rather than storing literal values.

## Implementing the Self-Extracting Wrapper

The compressed payload is embedded into a minimal Python loader that reconstructs the original tensors at runtime. This wrapper is significantly smaller than the raw weight blob it replaces.

### Offline Compression Script

The following pattern, derived from the repository's training scripts, demonstrates how weights are packed and compressed for submission:

```python
import numpy as np
import lzma
import base64

# Example: ternary weights ∈ {-1, 0, +1}

weights = np.random.choice([-1, 0, 1], size=1_000_000).astype(np.int8)

# Base-3 packing: map -1→0, 0→1, +1→2

trits = (weights + 1).astype(np.uint8)

# Pack 5 trits per byte (simplified illustration)

# Actual implementation uses efficient bit manipulation

packed = np.packbits(np.unpackbits(trits)[:5_000_000])

# LZMA compress with preset 9 (highest ratio)

compressed = lzma.compress(packed.tobytes(), preset=9)

# Encode for embedding in the wrapper

payload = base64.b85encode(compressed).decode()
print(f"Payload length: {len(payload)} characters")

```

### Runtime Decompression Loader

The self-extracting wrapper, as implemented in [`records/track_10min_16mb/2026-04-09_SP8192_3LayerRecur_ParResid_QK525_LegalTTT/train_gpt.py`](https://github.com/openai/parameter-golf/blob/main/records/track_10min_16mb/2026-04-09_SP8192_3LayerRecur_ParResid_QK525_LegalTTT/train_gpt.py), decompresses and reconstructs the tensors:

```python
import lzma as L
import base64 as B
import torch
import numpy as np

# The compressed payload embedded as a base85 string

payload = "<payload>"

# Decompress to raw bytes

raw = L.decompress(B.b85decode(payload))

# Reconstruct as byte tensor then reshape

tensor = torch.from_numpy(
    np.frombuffer(raw, dtype=np.uint8)
).view(<shape>)  # Reshape to original dimensions

# Assign to model parameters

```

This wrapper approach saves approximately **43 KB per submission** compared to storing raw weight files, as documented in [`records/track_10min_16mb/2026-04-09_SP8192_3LayerRecur_ParResid_QK525_LegalTTT/README.md`](https://github.com/openai/parameter-golf/blob/main/records/track_10min_16mb/2026-04-09_SP8192_3LayerRecur_ParResid_QK525_LegalTTT/README.md).

## Alternative: LZMA2 Filters for Archive Bundling

For submissions requiring multiple files, the repository also demonstrates using LZMA2 filters directly with `tarfile` for maximum compression of the entire submission bundle:

```python
import tarfile
import lzma

with tarfile.open("submission.tar.xz", "w:xz",
                  format=tarfile.PAX_FORMAT,
                  filters=[{"id": lzma.FILTER_LZMA2}]) as tar:
    tar.add("model_weights.bin")
    tar.add("inference_code.py")

```

This pattern appears in [`records/track_10min_16mb/2026-04-03_MuonEqR_DepthRecurrence_WD090_AllInt6/train_gpt.py`](https://github.com/openai/parameter-golf/blob/main/records/track_10min_16mb/2026-04-03_MuonEqR_DepthRecurrence_WD090_AllInt6/train_gpt.py), showing how the competition entries leverage modern LZMA2 filtering for archive-level compression.

## Summary

- **LZMA code compression** in `openai/parameter-golf` serves as the final entropy coding stage after aggressive quantization and bit-packing.
- The algorithm achieves **≈39% size reduction** over int8+zlib baselines by exploiting redundancies in ternary and binary weight streams using dictionary compression and range coding.
- A **~16.6 KB self-extracting Python wrapper** decompresses the payload at runtime using `lzma.decompress()` and `base64.b85decode()`, saving approximately **43 KB per submission**.
- The pipeline automatically evaluates **bit-mask + LZMA** versus **base-3 + LZMA** to select the optimal compression strategy for each model architecture.

## Frequently Asked Questions

### How does LZMA achieve better compression than zlib for quantized neural networks?

LZMA utilizes a **large sliding-window dictionary** (up to 4 GB in theory, though typically smaller in practice) combined with **range coding**, whereas zlib uses DEFLATE with a smaller 32 KB window and Huffman coding. Quantized neural network weights exhibit **long runs of identical values** and **repeated patterns** after packing. LZMA's larger dictionary captures these redundancies across longer distances, while range coding achieves fractional bit-precision for frequent symbols, resulting in the **≈39% reduction** observed over zlib in the parameter-golf benchmarks.

### What is the size overhead of the self-extracting LZMA wrapper?

The **self-extracting Python loader** that embeds the compressed payload and decompresses it at runtime weighs approximately **16.6 KB**, as documented in [`records/track_10min_16mb/2026-04-09_SP8192_3LayerRecur_ParResid_QK525_LegalTTT/README.md`](https://github.com/openai/parameter-golf/blob/main/records/track_10min_16mb/2026-04-09_SP8192_3LayerRecur_ParResid_QK525_LegalTTT/README.md). This is significantly smaller than storing raw weight files or using less efficient compression schemes, effectively **saving roughly 43 KB per submission** compared to alternative packaging methods that don't use this tight integration of LZMA and base85 encoding.

### Can LZMA compression be used with other packing formats besides binary and ternary?

Yes, the parameter-golf repository demonstrates that LZMA is **format-agnostic** regarding the packed bitstream, as long as the data exhibits exploitable entropy patterns. The code evaluates both **bit-mask packing** (for binary weights) and **base-3 packing** (for ternary weights) before selecting the smaller LZMA-compressed artifact. The repository also shows LZMA2 filters being applied to **tar archives** containing arbitrary model components, indicating the compression works effectively across different serialization formats provided the underlying data has sufficient redundancy or low entropy from quantization.

### How does the LZMA preset level affect submission size and decompression speed?

The parameter-golf implementation uses **preset 9**, the highest compression level available in Python's `lzma` module. Higher presets increase the **dictionary size** and **compression depth**, yielding smaller payloads at the cost of slower compression time. Since submissions are compressed once offline but decompressed at runtime during evaluation, the trade-off favors maximum compression (preset 9) to minimize the final artifact size. Decompression speed remains fast enough for inference because LZMA decompression is typically much faster than compression and the weight tensors are loaded once at startup.