Steipete Summarize Markdown-Mode Settings: Complete Guide to off, auto, llm, and readability

The --markdown-mode flag in steipete/summarize controls how HTML content converts to Markdown, offering four distinct strategies: off (disabled), readability (Mozilla Readability extraction, default), auto (intelligent fallback chain), and llm (AI-powered conversion).

The steipete/summarize CLI and core library use these settings exclusively when processing non-YouTube URLs with --format md. Understanding these markdown-mode options ensures you extract clean, structured Markdown from web pages using the optimal conversion pipeline for your environment.

What Are the Markdown-Mode Settings?

The --markdown-mode parameter accepts four string values defined in src/flags.ts: off, auto, llm, and readability. Each setting determines which engine converts raw HTML into Markdown format. According to the source code in src/run/flows/url/markdown.ts, the selection logic runs immediately after fetching web content, routing the HTML through different extraction pipelines based on your configuration.

This setting only affects HTML page processing. YouTube transcripts bypass these converters entirely, as they arrive in a structured format that requires no HTML-to-Markdown transformation.

Detailed Breakdown of Each Markdown-Mode Setting

off

Setting --markdown-mode off explicitly disables HTML-to-Markdown conversion. When combined with --format md, the CLI throws a validation error because Markdown output becomes impossible without a conversion engine.

In src/run/run-settings.ts line 26, the code defaults to "off" when the output format is not Markdown. However, src/run/flows/url/markdown.ts line 46 validates this combination and aborts with the message: "--format md conflicts with --markdown-mode off".

readability

readability serves as the default mode when you run --format md without specifying a markdown-mode. This option leverages Mozilla Readability to extract the article's main content from HTML before Markdown conversion.

The implementation in packages/core/src/content/link-preview/content/readability.ts handles the extraction, filtering boilerplate and navigation elements. In src/run/flows/url/markdown.ts lines 44-51, the code selects readability as the effectiveMarkdownMode when the extracted text exceeds MIN_READABILITY_CONTENT_CHARACTERS. This approach works best for blog posts, news articles, and documentation pages with clear semantic structure.

auto

The auto mode implements an intelligent fallback chain that selects the best available converter based on your system configuration. According to src/run/flows/url/markdown.ts lines 60-71, the priority works as follows:

  1. LLM conversion if any API key is configured (OpenAI, Anthropic, etc.)
  2. uvx markitdown if the uvx markitdown CLI is installed locally
  3. Readability as the final fallback

This mode provides maximum compatibility. If you have API keys set, you get high-quality AI conversion. If not, but you have the fast uvx tool installed, you get quick HTML-to-Markdown transformation. Otherwise, you still get reliable Readability extraction.

llm

Setting --markdown-mode llm forces the system to use the configured Large Language Model (specified via --model) for HTML-to-Markdown conversion. This bypasses Readability and markitdown entirely, sending the raw HTML (or Readability-derived article structure) directly to the LLM.

As implemented in src/run/flows/url/markdown.ts lines 66-71, this mode requires a valid API key for your selected provider. If no key exists, the CLI aborts with a clear error message: "--markdown-mode llm requires OPENAI_API_KEY" (or the appropriate provider key). The finish line logic in src/run/finish-line.ts emits "markdown via llm" when this path executes, confirming the conversion method in the output.

How the Default Mode Is Selected

When you omit the --markdown-mode flag, the resolution logic in src/run/run-settings.ts determines the effective setting:

// src/run/run-settings.ts
markdownMode: format === "markdown"
  ? ((markdownMode ?? markdown ?? "readability") as string)
  : "off",

For --format md commands, the system defaults to "readability". For all other output formats (text, JSON, etc.), the mode forces to "off" since Markdown conversion is irrelevant.

Error Cases and Limitations

stdin Incompatibility with Readability

The readability mode (including when selected by auto or used as default) rejects stdin input because Readability requires a base URL to resolve relative links and fetch resources. The test suite in tests/cli.stdin.test.ts lines 94-98 validates this restriction, throwing an error if you attempt to pipe HTML content while using readability-based conversion.

Missing API Key Validation

When using --markdown-mode llm or when auto selects the LLM path, the system validates API key presence before execution. As tested in tests/cli.errors.test.ts lines 66-71, missing credentials trigger an immediate error rather than failing during the network request phase.

Usage Examples

Default Readability Extraction

summarize "https://example.com/article" --format md

# Output: markdown via readability

Force LLM Conversion

export OPENAI_API_KEY=sk-...
summarize "https://example.com/article" --format md --markdown-mode llm

# Output: markdown via llm

Auto-Select with Fallback


# Without API key but with uvx installed

summarize "https://example.com/article" --format md --markdown-mode auto

# Output: markdown via uvx markitdown

Programmatic Usage

import { run } from "@steipete/summarize-core/run";

await run({
  url: "https://example.com/article",
  format: "markdown",
  markdownMode: "auto",   // "off" | "auto" | "llm" | "readability"
});

Summary

  • off disables conversion and conflicts with --format md, causing an error.
  • readability (default) uses Mozilla Readability to extract article content before Markdown conversion.
  • auto intelligently selects LLM if available, falls back to uvx markitdown, then Readability.
  • llm forces AI-powered conversion but requires a configured API key.
  • The default resolution logic lives in src/run/run-settings.ts, defaulting to readability for Markdown output.
  • Readability mode cannot process stdin input because it requires URL context for resource resolution.

Frequently Asked Questions

What happens if I use --markdown-mode off with --format md?

The CLI rejects this combination immediately with the error "--format md conflicts with --markdown-mode off". The off setting disables HTML-to-Markdown conversion entirely, making Markdown output impossible for HTML sources.

Does markdown-mode affect YouTube video processing?

No. The --markdown-mode setting only applies to HTML web pages. YouTube transcripts arrive in a structured text format that requires no HTML parsing or Markdown conversion, so these settings are ignored for video URLs.

Why does readability mode fail with stdin input?

Readability requires a base URL to resolve relative links, handle image paths, and fetch additional resources referenced in the HTML. When processing piped stdin content, no URL context exists, causing the validation logic in the test suite (tests/cli.stdin.test.ts) to reject the operation.

Which mode produces the highest quality Markdown?

llm typically produces the cleanest Markdown because the AI restructures content semantically rather than just converting HTML tags. However, auto provides the best balance of quality and availability, using LLM when configured but falling back to fast local tools or Readability when API keys are unavailable.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →