internals

How the Summarize CLI Handles URLs, Files, and STDIN

February 19, 2026 steipete/summarize ↗

The summarize CLI automatically classifies input strings as URLs, local files, or STDIN streams using the resolveInputTarget function in src/content/asset.ts, then dispatches to specialized processing flows—runUrlFlow, runAssetFlow, or runStdinFlow—that normalize content before passing it to the LLM summary engine.

The steipete/summarize repository provides a unified command-line interface capable of ingesting content from disparate sources. Whether processing a web article, local PDF, or piped audio stream, the CLI employs a consistent classification and routing architecture that seamlessly converges all input types on the same summarization pipeline.

Input Classification Logic

Before any content processing begins, the CLI must determine what type of resource the user provided. This happens in two stages: argument resolution and target classification.

The resolveInputTarget Implementation

The core classification logic resides in src/content/asset.ts. The resolveInputTarget function applies a sequential detection strategy:

// src/content/asset.ts
export function resolveInputTarget(raw: string): InputTarget {
  const normalized = raw.trim();
  if (!normalized) throw new Error("Missing input");

  // 1️⃣ File on disk?
  const asPath = path.resolve(normalized);
  if (existsSync(asPath)) return { kind: "file", filePath: asPath };

  // 2️⃣ Explicit STDIN marker?
  if (normalized === "-") return { kind: "stdin" };

  // 3️⃣ Otherwise treat as URL (after a few sanity checks)
  …
  return { kind: "url", url: normalizedUrlInput };
}

This function prioritizes local file system checks using fs.existsSync before falling back to the literal "-" STDIN marker. If neither condition matches, the input is normalized and treated as a URL. The returned InputTarget object contains a kind discriminator that drives subsequent routing decisions.

CLI Entry Point and Dispatch

The main runner in src/run/runner.ts consumes the classified target and branches accordingly:

// src/run/runner.ts
export async function runCli(argv, { env, fetch, stdout, stderr }) {
  const { inputTarget, url, cliProviderArgRaw } = resolveRunInput({
    program, cliFlagPresent, cliProviderArgRaw, stdout,
  });

  switch (inputTarget.kind) {
    case "url":
      return runUrlFlow({ url, env, fetch, stdout, stderr });
    case "file":
      return runAssetFlow({ filePath: inputTarget.filePath, env, fetch, stdout, stderr });
    case "stdin":
      return runStdinFlow({ stdin: process.stdin, env, fetch, stdout, stderr });
  }
}

Each flow handler receives environment variables, fetch implementations, and I/O streams, ensuring consistent behavior across input types.

URL Input Handling

When resolveInputTarget identifies a URL, control passes to src/run/flows/url/flow.ts.

The URL Flow Pipeline

The runUrlFlow function first classifies whether the URL points to a downloadable asset or a webpage requiring extraction:

// src/run/flows/url/flow.ts
export async function runUrlFlow({ url, env, fetch, stdout, stderr }) {
  const kind = await classifyUrl({ url, fetchImpl: fetch, timeoutMs: 10_000 });
  if (kind.kind === "asset") {
    // treat as a downloadable asset → use asset pipeline
    return runAssetFlow({ url, env, fetch, stdout, stderr });
  }
  // website → fetch HTML, extract text, then summarise
  const html = await fetch(url).then(r => r.text());
  const markdown = await extractMarkdownFromHtml(html);
  return summarizeMarkdown({ markdown, env, stdout, stderr });
}

Direct assets (PDFs, images, audio files) are routed to the asset pipeline with their remote URL preserved. Standard webpages undergo HTML fetching and markdown extraction via extractMarkdownFromHtml before summarization.

File Input Handling

Local files follow the runAssetFlow path defined in src/run/flows/asset/input.ts.

Asset Loading and Transcription

The asset flow distinguishes between image content requiring vision-model prompts and documents needing transcription:

// src/run/flows/asset/input.ts
export async function runAssetFlow({ filePath, url, env, fetch, stdout, stderr }) {
  const { sourceLabel, attachment } = filePath
    ? await loadLocalAsset({ filePath })
    : await loadRemoteAsset({ url, fetchImpl: fetch, timeoutMs: 10_000 });

  // Only image assets can be sent as "prompt messages".
  // For generic files we fall back to a "transcribe-then-summarise" pipeline.
  if (attachment.kind === "image") {
    const messages = buildAssetPromptMessages({ promptText: "Summarize this image", attachment });
    return runModelMessages({ messages, env, stdout, stderr });
  }

  // Non-image files → transcript via ONNX or Whisper, then summarise.
  const transcript = await transcribeFile(attachment);
  return summarizeMarkdown({ markdown: transcript, env, stdout, stderr });
}

Images bypass text transcription and are sent directly to the LLM as multimodal prompts using buildAssetPromptMessages. Non-image files (PDFs, audio, video) are processed through ONNX or Whisper transcription in packages/core/src/transcription/onnx-cli.ts to generate markdown text before summarization.

STDIN Stream Processing

The STDIN flow handles piped data through runStdinFlow in the same input.ts module.

Binary vs Text Detection

STDIN content is consumed as a stream and classified via content heuristics:

// src/run/flows/asset/input.ts
export async function runStdinFlow({ stdin, env, fetch, stdout, stderr }) {
  const chunks: Buffer[] = [];
  for await (const chunk of stdin) chunks.push(Buffer.from(chunk));
  const data = Buffer.concat(chunks);

  // Heuristic: if data looks like UTF‑8 text → treat as markdown.
  // Otherwise assume binary and route through the asset pipeline.
  if (isProbablyText(data)) {
    const markdown = data.toString("utf‑8");
    return summarizeMarkdown({ markdown, env, stdout, stderr });
  } else {
    // write to a temp file so `loadLocalAsset` can handle size limits
    const tmpPath = await writeTempFile(data);
    return runAssetFlow({ filePath: tmpPath, env, fetch, stdout, stderr });
  }
}

Textual input is converted directly to a markdown string and summarized immediately. Binary data (audio, images, PDFs) is written to a temporary file, then processed through runAssetFlow to leverage existing asset handling logic including size validation and MIME type detection.

Practical Usage Examples

The following commands demonstrate the automatic input type detection:

Summarize a webpage: summarize https://example.com/article
- Detected as URL → runUrlFlow → HTML extraction → summarization
Summarize a local PDF: summarize ./report.pdf
- Detected as file → runAssetFlow → transcription → summarization
Summarize markdown from STDIN: cat notes.md | summarize -
- Detected as STDIN → runStdinFlow → text detection → direct summarization
Summarize piped audio: ffmpeg -i video.mp4 -f wav - | summarize -
- Detected as STDIN → binary detection → temp file → transcription → summarization
Using explicit input flag: summarize --input-file -
- Flag value falls back to positional argument logic, triggering STDIN flow

Summary

Input classification occurs in src/content/asset.ts via resolveInputTarget, which checks for file existence, the "-" STDIN marker, or treats input as a URL.
URL processing in src/run/flows/url/flow.ts distinguishes between downloadable assets and webpages, extracting text content before summarization.
File handling through runAssetFlow routes images to vision models and transcribes documents using ONNX/Whisper pipelines.
STDIN ingestion buffers the stream, applies a text/binary heuristic, and either summarizes directly or routes binary data through the asset pipeline via temporary files.
Unified output: All flows converge on src/run/summary-engine.ts to generate LLM responses regardless of input source.

Frequently Asked Questions

How does the summarize CLI distinguish between a URL and a file path?

The CLI checks for local file existence first using fs.existsSync in resolveInputTarget (src/content/asset.ts). If the path exists on disk, it is treated as a file. Only if the file check fails does the CLI attempt to parse the string as a URL, ensuring that relative paths like ./report.pdf take precedence over URL interpretation.

Can I pipe binary data like audio or video to the summarize CLI?

Yes. When using summarize -, the runStdinFlow function buffers the entire STDIN stream and applies an isProbablyText heuristic. Binary data is written to a temporary file and routed through runAssetFlow, where it undergoes transcription via ONNX or Whisper models in packages/core/src/transcription/onnx-cli.ts before summarization.

What happens when I use "-" as the input argument?

The literal string "-" is explicitly mapped to the STDIN input kind in resolveInputTarget. This triggers runStdinFlow in src/run/flows/asset/input.ts, which reads from process.stdin rather than attempting to open a file or fetch a URL, enabling seamless pipe integration with other command-line tools.

How does the URL flow handle downloadable assets versus web pages?

The runUrlFlow function calls classifyUrl to inspect the URL's content type. If the URL points to a direct asset (PDF, image, audio), execution redirects to runAssetFlow with the remote URL. If it detects an HTML page, the flow fetches the content, extracts clean markdown using extractMarkdownFromHtml, and passes that text to the summarization engine.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how steipete/summarize works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →