# How the refresh-free Command Selects and Tests OpenRouter Models

> Discover how the refresh-free command selects and tests OpenRouter models. It filters free models, health-checks candidates via API, and saves the best ones to your local config.

- Repository: [Peter Steinberger/summarize](https://github.com/steipete/summarize)
- Tags: internals
- Published: 2026-02-19

---

**The `refresh-free` command downloads the OpenRouter model catalog, filters for `:free` models, health-checks candidates through rate-limit-aware API calls, and persists the fastest, most capable models to your local configuration.**

The `steipete/summarize` repository provides a CLI tool for text summarization using large language models. The **refresh-free command** automates the discovery and validation of free-tier OpenRouter models, ensuring you always have working, high-performance options configured without manual testing.

## Step 1: Fetching the OpenRouter Model Catalog

The routine begins by retrieving the complete model listing from OpenRouter’s public endpoint. In [`src/refresh-free.ts`](https://github.com/steipete/summarize/blob/main/src/refresh-free.ts), the command issues a request to `https://openrouter.ai/api/v1/models` using the injected `fetchImpl` (see **[refresh-free.ts:37-45]**):

```typescript
const response = await fetchImpl("https://openrouter.ai/api/v1/models", …);
const entries = (Array.isArray(payload.data) ? payload.data : []) as unknown[];

```

The raw JSON payload is validated and cast to an array of unknown entries for further processing.

## Step 2: Normalizing and Filtering Model Entries

Each downloaded entry is normalized into a lightweight `OpenRouterModelEntry` structure. This object captures the model id, context length, maximum output tokens, supported modalities, and an inferred parameter size extracted via `inferParamBFromIdOrName` (see **[refresh-free.ts:54-100]**).

The command then applies three filters to the normalized list:

- **Free-tier restriction** – Only models with ids ending in `:free` are retained.
- **Maximum age** – If `--max-age-days` is provided, models older than the threshold are discarded.
- **Minimum parameter size** – If `--min-params` is specified (e.g., `8b`), models with inferred sizes below the limit are removed.

These filters are implemented in **[refresh-free.ts:104-115]**.

## Step 3: Ranking Candidates by Capability

Before health-checking, the remaining candidates are ranked using a *smart* sort heuristic. The algorithm prefers models with newer creation dates, larger context windows, higher token limits, and richer parameter sets. This ranking ensures that the most capable models are tested first, maximizing the chance that high-quality candidates survive the subsequent latency tests (see **[refresh-free.ts:57-71]**).

## Step 4: Health-Checking Models with Rate-Limit Awareness

The core validation logic performs an initial health-check (Pass 1) on **all** filtered candidates. For each model, the command invokes `generateTextWithModelId` with a minimal "OK" prompt to measure responsiveness.

Concurrency is controlled by `--concurrency` (default 4) using `mapWithConcurrency` to avoid overwhelming the API (see **[refresh-free.ts:88-151]** and **[refresh-free.ts:75-86]**).

Failures are classified via `classifyFailure` into specific categories: `empty`, `rateLimitMin`, `rateLimitDay`, `noProviders`, `timeout`, `providerError`, or `other`.

When a `rateLimitMin` error occurs, the routine enforces a global cooldown of `COOLDOWN_MS = 65000` milliseconds. Subsequent requests wait until this cooldown expires via `waitForCooldown`. If a retry succeeds, the model is marked as healthy; otherwise, the failure is recorded (see **[refresh-free.ts:25-38]**).

## Step 5: Selecting the Optimal Model Shortlist

After Pass 1, the command selects a short list of candidates (`MAX_CANDIDATES`, default 10) using two heuristics:

- **Smart first** – Orders by context size, output tokens, parameter count, success count, and latency.
- **Fast first** – Orders by success count then latency.

The algorithm populates the final list with the first `SMART` (default 3) models from the smart-sorted list, then fills remaining slots with fast-sorted models until `MAX_CANDIDATES` is reached (see **[refresh-free.ts:121-156]**).

## Step 6: Refining Timing with Pass 2

If `--runs` is greater than 1, the command performs a second pass (Pass 2) on the selected shortlist. Each model is exercised additional times (`EXTRA_RUNS`) to compute a stable median latency and accurate success count. This refinement step distinguishes between transient failures and consistently slow models (see **[refresh-free.ts:164-188]**).

## Step 7: Persisting Configuration

Once testing completes, the results are written to the user configuration file at `~/.summarize/config.json`. The selected model ids are stored in `models.free.rules[0].candidates`. If `--set-default` is provided, the top-level `model` field is also updated to `"free"`.

The write operation uses an atomic pattern: data is first written to a temporary file `.tmp-<pid>` and then renamed to the final destination to prevent configuration corruption (see **[refresh-free.ts:210-255]**).

Finally, the command prints a concise report showing latency deltas, context sizes, token limits, modalities, and inferred parameter sizes for the selected models (see **[refresh-free.ts:258-273]**).

## Using the refresh-free Command

You can invoke the routine directly from the terminal or integrate it programmatically into your own tooling.

### CLI Usage

Run the command with flags to control selection criteria and testing depth:

```bash

# Scan OpenRouter, keep the 5 smartest free models, retry each once,

# keep only models created in the last 90 days and set "free" as the default model

summarize refresh-free \
  --runs 1 \
  --smart 5 \
  --min-params 8b \
  --max-age-days 90 \
  --set-default \
  --verbose

```

- `--runs 1` performs only a single test per model, skipping the refinement pass.
- `--smart 5` increases the number of capability-ranked models in the final shortlist.
- `--verbose` emits detailed failure reasons and rate-limit back-off messages.

### Programmatic Usage

Import the `refreshFree` function to embed the logic in your own applications:

```ts
import { refreshFree } from "./src/refresh-free.js";

await refreshFree({
  env: process.env,
  fetchImpl: fetch,                // native fetch or a polyfill
  stdout: process.stdout,
  stderr: process.stderr,
  verbose: true,
  options: {
    runs: 2,          // one initial test + one extra run for timing refinement
    smart: 4,
    minParamB: 12,
    maxAgeDays: 120,
    setDefault: false,
    concurrency: 3,
  },
});

```

This call performs exactly the same steps as the CLI, writing the updated `~/.summarize/config.json` file when it finishes.

## Key Implementation Files

The **refresh-free command** spans several modules across the `steipete/summarize` codebase:

| File | Purpose |
|------|---------|
| **[src/refresh-free.ts](https://github.com/steipete/summarize/blob/main/src/refresh-free.ts)** | Core implementation of the workflow—fetching, filtering, testing, selecting, and persisting models. |
| **[src/run/help.ts](https://github.com/steipete/summarize/blob/main/src/run/help.ts)** | Supplies the user-facing help text describing the command’s purpose and options. |
| **[src/llm/generate-text.ts](https://github.com/steipete/summarize/blob/main/src/llm/generate-text.ts)** | Low-level wrapper that invokes OpenRouter (`generateTextWithModelId`) used during health checks. |
| **[src/daemon/server.ts](https://github.com/steipete/summarize/blob/main/src/daemon/server.ts)** | Exposes the functionality over HTTP (`POST /v1/refresh-free`) for the Chrome extension and other clients. |
| **[tests/refresh-free.test.ts](https://github.com/steipete/summarize/blob/main/tests/refresh-free.test.ts)** | Test suite validating selection logic, rate-limit handling, and config writing. |

## Summary

- The **refresh-free command** automates discovery of OpenRouter `:free` models by downloading the public catalog at `https://openrouter.ai/api/v1/models`.
- It filters candidates by age, parameter size (via `inferParamBFromIdOrName`), and modality, then ranks them by capability context and token limits.
- Health-checking occurs in two passes: an initial concurrency-limited test (`--concurrency` default 4) with a 65-second cooldown (`COOLDOWN_MS`) for rate limits, followed by a refinement pass (`--runs`) to stabilize latency measurements.
- The final selection blends "smart" (capability-weighted) and "fast" (latency-weighted) heuristics to populate `MAX_CANDIDATES` (default 10) slots.
- Results are atomically written to `~/.summarize/config.json`, with optional `--set-default` to enable the "free" model tier by default.

## Frequently Asked Questions

### What is the difference between Pass 1 and Pass 2 in the refresh-free command?

Pass 1 performs an initial health-check on all filtered candidates using `generateTextWithModelId` with limited concurrency to measure basic responsiveness and classify failures. Pass 2 occurs only if `--runs` is greater than 1, exercising the selected shortlist additional times to compute a stable median latency and distinguish transient failures from consistently slow models.

### How does the refresh-free command handle OpenRouter rate limits?

When a `rateLimitMin` error occurs during health-checking, the command enforces a global cooldown of `COOLDOWN_MS` (65,000 milliseconds). Subsequent requests wait until this cooldown expires via `waitForCooldown` before retrying. If the retry succeeds, the model is marked as healthy; otherwise, the failure is recorded and classified.

### What criteria does the refresh-free command use to rank OpenRouter models?

The command uses a "smart" sort heuristic that prefers models with newer creation dates, larger context windows, higher maximum output token limits, and greater inferred parameter counts. This ranking occurs before health-checking to ensure the most capable models are tested first, maximizing the quality of the final shortlist.

### Can I use the refresh-free command programmatically instead of via CLI?

Yes. You can import the `refreshFree` function from [`src/refresh-free.js`](https://github.com/steipete/summarize/blob/main/src/refresh-free.js) and invoke it with an options object containing `env`, `fetchImpl`, `stdout`, `stderr`, and configuration flags like `runs`, `smart`, and `minParamB`. This performs the identical workflow as the CLI, atomically writing the updated configuration to `~/.summarize/config.json` upon completion.