# LiteParse JSON vs Text Output Formats: Complete Technical Comparison

> Compare LiteParse JSON vs Text output formats. Discover structured data with metadata or plain text for readability and pipelining. Choose the best for your workflow.

- Repository: [LlamaIndex/liteparse](https://github.com/run-llama/liteparse)
- Tags: deep-dive
- Published: 2026-05-30

---

**LiteParse provides JSON output for structured data with full metadata including bounding boxes and OCR confidence, while Text output delivers a flattened plain-text string optimized for human readability and simple pipelining.**

The run-llama/liteparse repository offers two distinct output representations for parsed documents. Users select between **JSON** and **Text** formats through the `output_format` field in `LiteParseConfig` or the `--output-format` CLI argument. Both formats process the same underlying `ParsedPage` data structures but serve fundamentally different consumption patterns.

## Structural Data Hierarchy

### JSON Format Architecture

In [`crates/liteparse/src/output/json.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/output/json.rs), the `format_json` function serializes `ParsedResult` into a deeply nested JSON document using `serde_json`. The hierarchy follows `ParsedResult → pages[] → items[]`, where each item contains the extracted text alongside spatial and quality metadata.

### Text Format Architecture

The `format_text` function in [`crates/liteparse/src/output/text.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/output/text.rs) flattens the same `ParsedPage` list into a single string. It walks the parsed structure and joins the `text` fields with `\n` delimiters to preserve natural reading order across lines and pages.

## Metadata Retention and Richness

**JSON output** includes per-item metadata essential for programmatic processing:
- **`bbox`**: Bounding box coordinates for spatial positioning
- **`confidence`**: OCR confidence scores for quality auditing
- **`source`**: Flags distinguishing `native` text from `ocr` results

**Text output** strips all metadata, emitting only the concatenated human-readable content. This makes it ideal for quick inspection or standard Unix toolchains like `grep` and `cat`.

## Configuration and Implementation

The `OutputFormat` enum defined in [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs) declares both variants: `OutputFormat::Json` and `OutputFormat::Text`. The CLI dispatcher in [`crates/liteparse/src/main.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/main.rs) matches the configuration variant to invoke the appropriate formatter:

```rust
match config.output_format {
    OutputFormat::Json => json::format_json(&result.pages)?,
    OutputFormat::Text => text::format_text(&result.pages),
}

```

All language bindings (Node.js, Python, WASM) expose these variants as string options `"json"` and `"text"`, returning the formatted result to the respective runtime.

## Practical Usage Examples

### Command Line Interface

Select your format using the `--output-format` flag:

```bash
liteparse input.pdf --output-format json > result.json
liteparse input.pdf --output-format text > result.txt

```

### Node.js

Configure the output format during parser initialization:

```typescript
import { LiteParse } from "liteparse";

const jsonParser = new LiteParse({ output_format: "json" });
const structured = await jsonParser.parse("sample.pdf");  // JSON string

const textParser = new LiteParse({ output_format: "text" });
const plain = await textParser.parse("sample.pdf");     // Plain text

```

### Python

The Python bindings follow the same configuration pattern:

```python
from liteparse import LiteParse

json_result = LiteParse(output_format="json").parse("sample.pdf")
text_result = LiteParse(output_format="text").parse("sample.pdf")

```

### WebAssembly (Browser)

In browser environments using the WASM build:

```javascript
import { LiteParse } from "liteparse-wasm";

const parser = new LiteParse({ output_format: "json" });
const result = await parser.parse(file);  // Returns JSON string

```

## Summary

- **JSON output** in LiteParse provides machine-readable structured data with full metadata hierarchy (`ParsedResult → pages → items`), enabling spatial analysis and OCR quality auditing.
- **Text output** produces a flattened plain-text string by joining extracted text with newline characters, optimized for human reading and simple command-line pipelines.
- Both formats are controlled through the `OutputFormat` enum in [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs) and dispatched from [`crates/liteparse/src/main.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/main.rs).
- The JSON formatter lives in [`crates/liteparse/src/output/json.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/output/json.rs) and uses `serde_json`, while the text formatter resides in [`crates/liteparse/src/output/text.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/output/text.rs).

## Frequently Asked Questions

### When should I use JSON over Text output in LiteParse?

Use **JSON** when your application needs to perform spatial analysis, filter low-confidence OCR results, or render visual overlays using bounding box coordinates. Use **Text** for simple content extraction, full-text search, or when piping results to other command-line tools like `grep` or `awk`.

### Does Text output preserve document layout?

Text output preserves natural reading order by inserting line breaks between text items and pages, but it discards spatial metadata. If you need to reconstruct the original visual layout or extract tables with positional awareness, you must use the JSON format which retains `bbox` coordinates for every text fragment.

### Can I convert between JSON and Text formats after parsing?

LiteParse does not provide a built-in conversion utility between formats. Since Text output is a lossy representation that discards metadata, you cannot reconstruct the full JSON structure from a Text file. If you require both representations, parse the document once with `output_format: "json"` and extract the text fields programmatically from the resulting structure.

### Which output format has better performance?

Text output is marginally faster because it avoids JSON serialization overhead and produces smaller payloads. However, both formats process the same underlying `ParsedPage` data structures, so the performance difference is negligible compared to the initial PDF parsing and OCR operations. Choose based on data requirements rather than performance constraints.