How the Summarize CLI Handles URLs, Files, and STDIN
The summarize CLI automatically classifies input strings as URLs, local files, or STDIN streams using the resolveInputTarget function in src/content/asset.ts, then dispatches to specialized processing flows—runUrlFlow, runAssetFlow, or runStdinFlow—that normalize content before passing it to the LLM summary engine.
The steipete/summarize repository provides a unified command-line interface capable of ingesting content from disparate sources. Whether processing a web article, local PDF, or piped audio stream, the CLI employs a consistent classification and routing architecture that seamlessly converges all input types on the same summarization pipeline.
Input Classification Logic
Before any content processing begins, the CLI must determine what type of resource the user provided. This happens in two stages: argument resolution and target classification.
The resolveInputTarget Implementation
The core classification logic resides in src/content/asset.ts. The resolveInputTarget function applies a sequential detection strategy:
// src/content/asset.ts
export function resolveInputTarget(raw: string): InputTarget {
const normalized = raw.trim();
if (!normalized) throw new Error("Missing input");
// 1️⃣ File on disk?
const asPath = path.resolve(normalized);
if (existsSync(asPath)) return { kind: "file", filePath: asPath };
// 2️⃣ Explicit STDIN marker?
if (normalized === "-") return { kind: "stdin" };
// 3️⃣ Otherwise treat as URL (after a few sanity checks)
…
return { kind: "url", url: normalizedUrlInput };
}
This function prioritizes local file system checks using fs.existsSync before falling back to the literal "-" STDIN marker. If neither condition matches, the input is normalized and treated as a URL. The returned InputTarget object contains a kind discriminator that drives subsequent routing decisions.
CLI Entry Point and Dispatch
The main runner in src/run/runner.ts consumes the classified target and branches accordingly:
// src/run/runner.ts
export async function runCli(argv, { env, fetch, stdout, stderr }) {
const { inputTarget, url, cliProviderArgRaw } = resolveRunInput({
program, cliFlagPresent, cliProviderArgRaw, stdout,
});
switch (inputTarget.kind) {
case "url":
return runUrlFlow({ url, env, fetch, stdout, stderr });
case "file":
return runAssetFlow({ filePath: inputTarget.filePath, env, fetch, stdout, stderr });
case "stdin":
return runStdinFlow({ stdin: process.stdin, env, fetch, stdout, stderr });
}
}
Each flow handler receives environment variables, fetch implementations, and I/O streams, ensuring consistent behavior across input types.
URL Input Handling
When resolveInputTarget identifies a URL, control passes to src/run/flows/url/flow.ts.
The URL Flow Pipeline
The runUrlFlow function first classifies whether the URL points to a downloadable asset or a webpage requiring extraction:
// src/run/flows/url/flow.ts
export async function runUrlFlow({ url, env, fetch, stdout, stderr }) {
const kind = await classifyUrl({ url, fetchImpl: fetch, timeoutMs: 10_000 });
if (kind.kind === "asset") {
// treat as a downloadable asset → use asset pipeline
return runAssetFlow({ url, env, fetch, stdout, stderr });
}
// website → fetch HTML, extract text, then summarise
const html = await fetch(url).then(r => r.text());
const markdown = await extractMarkdownFromHtml(html);
return summarizeMarkdown({ markdown, env, stdout, stderr });
}
Direct assets (PDFs, images, audio files) are routed to the asset pipeline with their remote URL preserved. Standard webpages undergo HTML fetching and markdown extraction via extractMarkdownFromHtml before summarization.
File Input Handling
Local files follow the runAssetFlow path defined in src/run/flows/asset/input.ts.
Asset Loading and Transcription
The asset flow distinguishes between image content requiring vision-model prompts and documents needing transcription:
// src/run/flows/asset/input.ts
export async function runAssetFlow({ filePath, url, env, fetch, stdout, stderr }) {
const { sourceLabel, attachment } = filePath
? await loadLocalAsset({ filePath })
: await loadRemoteAsset({ url, fetchImpl: fetch, timeoutMs: 10_000 });
// Only image assets can be sent as "prompt messages".
// For generic files we fall back to a "transcribe-then-summarise" pipeline.
if (attachment.kind === "image") {
const messages = buildAssetPromptMessages({ promptText: "Summarize this image", attachment });
return runModelMessages({ messages, env, stdout, stderr });
}
// Non-image files → transcript via ONNX or Whisper, then summarise.
const transcript = await transcribeFile(attachment);
return summarizeMarkdown({ markdown: transcript, env, stdout, stderr });
}
Images bypass text transcription and are sent directly to the LLM as multimodal prompts using buildAssetPromptMessages. Non-image files (PDFs, audio, video) are processed through ONNX or Whisper transcription in packages/core/src/transcription/onnx-cli.ts to generate markdown text before summarization.
STDIN Stream Processing
The STDIN flow handles piped data through runStdinFlow in the same input.ts module.
Binary vs Text Detection
STDIN content is consumed as a stream and classified via content heuristics:
// src/run/flows/asset/input.ts
export async function runStdinFlow({ stdin, env, fetch, stdout, stderr }) {
const chunks: Buffer[] = [];
for await (const chunk of stdin) chunks.push(Buffer.from(chunk));
const data = Buffer.concat(chunks);
// Heuristic: if data looks like UTF‑8 text → treat as markdown.
// Otherwise assume binary and route through the asset pipeline.
if (isProbablyText(data)) {
const markdown = data.toString("utf‑8");
return summarizeMarkdown({ markdown, env, stdout, stderr });
} else {
// write to a temp file so `loadLocalAsset` can handle size limits
const tmpPath = await writeTempFile(data);
return runAssetFlow({ filePath: tmpPath, env, fetch, stdout, stderr });
}
}
Textual input is converted directly to a markdown string and summarized immediately. Binary data (audio, images, PDFs) is written to a temporary file, then processed through runAssetFlow to leverage existing asset handling logic including size validation and MIME type detection.
Practical Usage Examples
The following commands demonstrate the automatic input type detection:
-
Summarize a webpage:
summarize https://example.com/article- Detected as URL →
runUrlFlow→ HTML extraction → summarization
- Detected as URL →
-
Summarize a local PDF:
summarize ./report.pdf- Detected as file →
runAssetFlow→ transcription → summarization
- Detected as file →
-
Summarize markdown from STDIN:
cat notes.md | summarize -- Detected as STDIN →
runStdinFlow→ text detection → direct summarization
- Detected as STDIN →
-
Summarize piped audio:
ffmpeg -i video.mp4 -f wav - | summarize -- Detected as STDIN → binary detection → temp file → transcription → summarization
-
Using explicit input flag:
summarize --input-file -- Flag value falls back to positional argument logic, triggering STDIN flow
Summary
- Input classification occurs in
src/content/asset.tsviaresolveInputTarget, which checks for file existence, the"-"STDIN marker, or treats input as a URL. - URL processing in
src/run/flows/url/flow.tsdistinguishes between downloadable assets and webpages, extracting text content before summarization. - File handling through
runAssetFlowroutes images to vision models and transcribes documents using ONNX/Whisper pipelines. - STDIN ingestion buffers the stream, applies a text/binary heuristic, and either summarizes directly or routes binary data through the asset pipeline via temporary files.
- Unified output: All flows converge on
src/run/summary-engine.tsto generate LLM responses regardless of input source.
Frequently Asked Questions
How does the summarize CLI distinguish between a URL and a file path?
The CLI checks for local file existence first using fs.existsSync in resolveInputTarget (src/content/asset.ts). If the path exists on disk, it is treated as a file. Only if the file check fails does the CLI attempt to parse the string as a URL, ensuring that relative paths like ./report.pdf take precedence over URL interpretation.
Can I pipe binary data like audio or video to the summarize CLI?
Yes. When using summarize -, the runStdinFlow function buffers the entire STDIN stream and applies an isProbablyText heuristic. Binary data is written to a temporary file and routed through runAssetFlow, where it undergoes transcription via ONNX or Whisper models in packages/core/src/transcription/onnx-cli.ts before summarization.
What happens when I use "-" as the input argument?
The literal string "-" is explicitly mapped to the STDIN input kind in resolveInputTarget. This triggers runStdinFlow in src/run/flows/asset/input.ts, which reads from process.stdin rather than attempting to open a file or fetch a URL, enabling seamless pipe integration with other command-line tools.
How does the URL flow handle downloadable assets versus web pages?
The runUrlFlow function calls classifyUrl to inspect the URL's content type. If the URL points to a direct asset (PDF, image, audio), execution redirects to runAssetFlow with the remote URL. If it detects an HTML page, the flow fetches the content, extracts clean markdown using extractMarkdownFromHtml, and passes that text to the summarization engine.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →