How the refresh-free Command Selects and Tests OpenRouter Models

The refresh-free command downloads the OpenRouter model catalog, filters for :free models, health-checks candidates through rate-limit-aware API calls, and persists the fastest, most capable models to your local configuration.

The steipete/summarize repository provides a CLI tool for text summarization using large language models. The refresh-free command automates the discovery and validation of free-tier OpenRouter models, ensuring you always have working, high-performance options configured without manual testing.

Step 1: Fetching the OpenRouter Model Catalog

The routine begins by retrieving the complete model listing from OpenRouter’s public endpoint. In src/refresh-free.ts, the command issues a request to https://openrouter.ai/api/v1/models using the injected fetchImpl (see [refresh-free.ts:37-45]):

const response = await fetchImpl("https://openrouter.ai/api/v1/models", …);
const entries = (Array.isArray(payload.data) ? payload.data : []) as unknown[];

The raw JSON payload is validated and cast to an array of unknown entries for further processing.

Step 2: Normalizing and Filtering Model Entries

Each downloaded entry is normalized into a lightweight OpenRouterModelEntry structure. This object captures the model id, context length, maximum output tokens, supported modalities, and an inferred parameter size extracted via inferParamBFromIdOrName (see [refresh-free.ts:54-100]).

The command then applies three filters to the normalized list:

  • Free-tier restriction – Only models with ids ending in :free are retained.
  • Maximum age – If --max-age-days is provided, models older than the threshold are discarded.
  • Minimum parameter size – If --min-params is specified (e.g., 8b), models with inferred sizes below the limit are removed.

These filters are implemented in [refresh-free.ts:104-115].

Step 3: Ranking Candidates by Capability

Before health-checking, the remaining candidates are ranked using a smart sort heuristic. The algorithm prefers models with newer creation dates, larger context windows, higher token limits, and richer parameter sets. This ranking ensures that the most capable models are tested first, maximizing the chance that high-quality candidates survive the subsequent latency tests (see [refresh-free.ts:57-71]).

Step 4: Health-Checking Models with Rate-Limit Awareness

The core validation logic performs an initial health-check (Pass 1) on all filtered candidates. For each model, the command invokes generateTextWithModelId with a minimal "OK" prompt to measure responsiveness.

Concurrency is controlled by --concurrency (default 4) using mapWithConcurrency to avoid overwhelming the API (see [refresh-free.ts:88-151] and [refresh-free.ts:75-86]).

Failures are classified via classifyFailure into specific categories: empty, rateLimitMin, rateLimitDay, noProviders, timeout, providerError, or other.

When a rateLimitMin error occurs, the routine enforces a global cooldown of COOLDOWN_MS = 65000 milliseconds. Subsequent requests wait until this cooldown expires via waitForCooldown. If a retry succeeds, the model is marked as healthy; otherwise, the failure is recorded (see [refresh-free.ts:25-38]).

Step 5: Selecting the Optimal Model Shortlist

After Pass 1, the command selects a short list of candidates (MAX_CANDIDATES, default 10) using two heuristics:

  • Smart first – Orders by context size, output tokens, parameter count, success count, and latency.
  • Fast first – Orders by success count then latency.

The algorithm populates the final list with the first SMART (default 3) models from the smart-sorted list, then fills remaining slots with fast-sorted models until MAX_CANDIDATES is reached (see [refresh-free.ts:121-156]).

Step 6: Refining Timing with Pass 2

If --runs is greater than 1, the command performs a second pass (Pass 2) on the selected shortlist. Each model is exercised additional times (EXTRA_RUNS) to compute a stable median latency and accurate success count. This refinement step distinguishes between transient failures and consistently slow models (see [refresh-free.ts:164-188]).

Step 7: Persisting Configuration

Once testing completes, the results are written to the user configuration file at ~/.summarize/config.json. The selected model ids are stored in models.free.rules[0].candidates. If --set-default is provided, the top-level model field is also updated to "free".

The write operation uses an atomic pattern: data is first written to a temporary file .tmp-<pid> and then renamed to the final destination to prevent configuration corruption (see [refresh-free.ts:210-255]).

Finally, the command prints a concise report showing latency deltas, context sizes, token limits, modalities, and inferred parameter sizes for the selected models (see [refresh-free.ts:258-273]).

Using the refresh-free Command

You can invoke the routine directly from the terminal or integrate it programmatically into your own tooling.

CLI Usage

Run the command with flags to control selection criteria and testing depth:


# Scan OpenRouter, keep the 5 smartest free models, retry each once,

# keep only models created in the last 90 days and set "free" as the default model

summarize refresh-free \
  --runs 1 \
  --smart 5 \
  --min-params 8b \
  --max-age-days 90 \
  --set-default \
  --verbose
  • --runs 1 performs only a single test per model, skipping the refinement pass.
  • --smart 5 increases the number of capability-ranked models in the final shortlist.
  • --verbose emits detailed failure reasons and rate-limit back-off messages.

Programmatic Usage

Import the refreshFree function to embed the logic in your own applications:

import { refreshFree } from "./src/refresh-free.js";

await refreshFree({
  env: process.env,
  fetchImpl: fetch,                // native fetch or a polyfill
  stdout: process.stdout,
  stderr: process.stderr,
  verbose: true,
  options: {
    runs: 2,          // one initial test + one extra run for timing refinement
    smart: 4,
    minParamB: 12,
    maxAgeDays: 120,
    setDefault: false,
    concurrency: 3,
  },
});

This call performs exactly the same steps as the CLI, writing the updated ~/.summarize/config.json file when it finishes.

Key Implementation Files

The refresh-free command spans several modules across the steipete/summarize codebase:

File Purpose
src/refresh-free.ts Core implementation of the workflow—fetching, filtering, testing, selecting, and persisting models.
src/run/help.ts Supplies the user-facing help text describing the command’s purpose and options.
src/llm/generate-text.ts Low-level wrapper that invokes OpenRouter (generateTextWithModelId) used during health checks.
src/daemon/server.ts Exposes the functionality over HTTP (POST /v1/refresh-free) for the Chrome extension and other clients.
tests/refresh-free.test.ts Test suite validating selection logic, rate-limit handling, and config writing.

Summary

  • The refresh-free command automates discovery of OpenRouter :free models by downloading the public catalog at https://openrouter.ai/api/v1/models.
  • It filters candidates by age, parameter size (via inferParamBFromIdOrName), and modality, then ranks them by capability context and token limits.
  • Health-checking occurs in two passes: an initial concurrency-limited test (--concurrency default 4) with a 65-second cooldown (COOLDOWN_MS) for rate limits, followed by a refinement pass (--runs) to stabilize latency measurements.
  • The final selection blends "smart" (capability-weighted) and "fast" (latency-weighted) heuristics to populate MAX_CANDIDATES (default 10) slots.
  • Results are atomically written to ~/.summarize/config.json, with optional --set-default to enable the "free" model tier by default.

Frequently Asked Questions

What is the difference between Pass 1 and Pass 2 in the refresh-free command?

Pass 1 performs an initial health-check on all filtered candidates using generateTextWithModelId with limited concurrency to measure basic responsiveness and classify failures. Pass 2 occurs only if --runs is greater than 1, exercising the selected shortlist additional times to compute a stable median latency and distinguish transient failures from consistently slow models.

How does the refresh-free command handle OpenRouter rate limits?

When a rateLimitMin error occurs during health-checking, the command enforces a global cooldown of COOLDOWN_MS (65,000 milliseconds). Subsequent requests wait until this cooldown expires via waitForCooldown before retrying. If the retry succeeds, the model is marked as healthy; otherwise, the failure is recorded and classified.

What criteria does the refresh-free command use to rank OpenRouter models?

The command uses a "smart" sort heuristic that prefers models with newer creation dates, larger context windows, higher maximum output token limits, and greater inferred parameter counts. This ranking occurs before health-checking to ensure the most capable models are tested first, maximizing the quality of the final shortlist.

Can I use the refresh-free command programmatically instead of via CLI?

Yes. You can import the refreshFree function from src/refresh-free.js and invoke it with an options object containing env, fetchImpl, stdout, stderr, and configuration flags like runs, smart, and minParamB. This performs the identical workflow as the CLI, atomically writing the updated configuration to ~/.summarize/config.json upon completion.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →