# How to Configure LiteParse to Parse Specific Page Ranges Efficiently

> Efficiently configure LiteParse to parse specific page ranges using target_pages. Minimize resource usage by loading only requested pages from the PDFium document backend.

- Repository: [LlamaIndex/liteparse](https://github.com/run-llama/liteparse)
- Tags: how-to-guide
- Published: 2026-05-30

---

**LiteParse restricts document processing to only the pages you specify through the `target_pages` configuration option, which accepts compact range strings like `"1-5,10,15-20"` and minimizes resource usage by loading only requested pages from the PDFium document backend.**

When working with large PDF documents, parsing every page wastes significant CPU cycles and memory. LiteParse provides a granular page-filtering mechanism that lets you configure LiteParse to parse specific page ranges while completely skipping unrequested content. This feature is implemented consistently across the CLI, Node.js, Python, and WASM bindings through a unified configuration structure.

## Understanding the `target_pages` Configuration

The page-range feature centers on the `target_pages` field within the **`LiteParseConfig`** struct. According to the source code in [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs) (lines 14‑18), this field stores an optional string that describes which pages to process, alongside a **`max_pages`** safety cap that prevents accidental processing of pathologically large ranges.

The configuration accepts a compact string notation where comma-separated values define individual pages or hyphenated ranges. For example, `"1-3,7,10-12"` expands to pages 1, 2, 3, 7, 10, 11, and 12. The parser validates, sorts, and deduplicates these numbers before any document I/O occurs.

## The Page Range Parsing Pipeline

The implementation follows a four-stage pipeline that ensures efficiency at every step:

1. **Configuration Ingestion** – The CLI argument `--target-pages` (defined in [`crates/liteparse/src/main.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/main.rs) at lines 65‑71) or the corresponding constructor parameter in language bindings populates `LiteParseConfig.target_pages` verbatim.

2. **Range Expansion** – When `LiteParse::parse` executes (lines 91‑100 in [`crates/liteparse/src/parser.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/parser.rs)), it calls **`parse_target_pages`** (lines 66‑96 in [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs)). This function splits the string on commas, expands hyphenated ranges into individual `u32` values, trims whitespace, validates numeric conversion, then sorts and deduplicates the result into a `Vec<u32>`.

3. **Selective Extraction** – The validated page list passes as `Option<&[u32]>` to the extraction layer in [`crates/liteparse/src/extract.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/extract.rs). This module requests only the specified pages from the PDFium document handle, drastically reducing disk I/O and memory mapping operations.

4. **Conditional OCR** – Because the OCR pipeline runs after page selection, optical character recognition processes **only** the requested pages. This prevents wasted CPU cycles on irrelevant content.

## Practical Code Examples

### CLI Usage

Use the `--target-pages` flag with standard hyphen and comma notation:

```bash
liteparse parse report.pdf \
    --target-pages "1-3,7,10-12" \
    --max-pages 10 \
    --format json \
    --output selected_pages.json

```

The `--max-pages 10` argument provides a hard ceiling that limits total processed pages even if the range string specifies more.

### Node.js and TypeScript

The JavaScript bindings serialize the configuration object to the same Rust core:

```ts
import { LiteParse } from "liteparse";

const parser = new LiteParse({
  target_pages: "2-4,8",   // identical syntax to CLI
  max_pages: 5,            // optional safety guard
  ocr_enabled: false,      // disable OCR for faster text-only extraction
});

await parser.parse("report.pdf", { output: "out.json", format: "json" });

```

The constructor parameters map directly to `LiteParseConfig` fields in [`config.rs`](https://github.com/run-llama/liteparse/blob/main/config.rs).

### Python API

PyO3 marshals the configuration automatically:

```python
from liteparse import LiteParse

parser = LiteParse(
    target_pages="5,9-11",   # string format matches Rust parser

    max_pages=7,
    ocr_enabled=False,
)

result = parser.parse("report.pdf", format="json")
with open("selected.json", "w") as f:
    f.write(result)

```

### Screenshot Generation

The screenshot command reuses the same `parse_target_pages` logic (lines 42‑48 in [`main.rs`](https://github.com/run-llama/liteparse/blob/main/main.rs)) to render only specific pages:

```bash
liteparse screenshot report.pdf --target-pages "1,3,5" --output-dir pages/

```

## Performance Optimization Strategies

To maximize efficiency when processing partial documents:

- **Combine `target_pages` with `max_pages`** to guarantee an upper bound on computational work regardless of input string complexity.
- **Disable OCR** using `--no-ocr` or `ocr_enabled: false` when you only need native PDF text, as OCR represents the most expensive operation per page.
- **Prefer contiguous ranges** (e.g., `"1-100"` instead of `"1,2,3,...,100"`) to minimize string parsing overhead, though the deduplication logic in `parse_target_pages` handles both formats correctly.
- **Validate ranges programmatically** before passing them to the constructor to avoid the overhead of parsing invalid strings that will fail at the Rust boundary.

## Summary

- The **`target_pages`** option in `LiteParseConfig` accepts comma-separated page numbers and hyphenated ranges.
- The **`parse_target_pages`** function (lines 66‑96 in [`config.rs`](https://github.com/run-llama/liteparse/blob/main/config.rs)) validates, sorts, and deduplicates the input into a `Vec<u32>`.
- The extraction layer in **[`extract.rs`](https://github.com/run-llama/liteparse/blob/main/extract.rs)** loads only the specified pages from the PDFium document, minimizing I/O.
- OCR and rendering operations apply exclusively to the filtered page set, conserving CPU and memory.
- The **`max_pages`** field provides a safety ceiling against accidental oversized range specifications.

## Frequently Asked Questions

### What string format does `target_pages` accept?

`target_pages` accepts a comma-separated list where each element is either a single page number or a hyphenated range (e.g., `"1-5,10,15-20"`). The `parse_target_pages` function in [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs) expands these ranges, trims whitespace, sorts the results in ascending order, and removes duplicates before processing.

### Does OCR run on all pages or only the selected range?

OCR runs **only** on the selected range. The parsing pipeline resolves the `target_pages` list before invoking the OCR engine (if enabled), ensuring that computationally expensive text recognition occurs exclusively on the pages you requested, not the entire document.

### How does `max_pages` interact with `target_pages`?

`max_pages` acts as a hard ceiling on the total number of pages processed, while `target_pages` specifies which pages to include. If your range string expands to 50 pages but `max_pages` is set to 10, LiteParse stops after processing 10 pages. This safety guard, defined in `LiteParseConfig` (lines 14‑15 in [`config.rs`](https://github.com/run-llama/liteparse/blob/main/config.rs)), prevents resource exhaustion from malicious or accidental oversized inputs.

### Can I use page ranges with the screenshot command?

Yes. The screenshot command implements the same `target_pages` logic found in the parse command. When you pass `--target-pages` to `liteparse screenshot`, the tool only renders the specified pages, as implemented in the command definitions at lines 42‑48 of [`crates/liteparse/src/main.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/main.rs).