# How LiteParse Automatically Converts XLSX to PDF

> Discover how LiteParse automatically converts XLSX to PDF. Learn about LibreOffice integration and efficient PDF generation for your spreadsheet data.

- Repository: [LlamaIndex/liteparse](https://github.com/run-llama/liteparse)
- Tags: how-to-guide
- Published: 2026-05-30

---

**LiteParse automatically converts XLSX files to PDF by detecting the spreadsheet extension, invoking a headless LibreOffice process via the `convert_to_pdf` function, and returning the generated PDF for parsing.**

The run-llama/liteparse library handles spreadsheet documents by transparently converting them to PDF before text extraction. When you provide an XLSX file path to the parser, the Rust core immediately triggers a conversion pipeline that uses a sandboxed LibreOffice instance. This process requires no manual intervention and integrates seamlessly with the available Node.js and Python bindings.

## The XLSX-to-PDF Conversion Pipeline

### Extension Detection and Routing

In [`crates/liteparse/src/conversion.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/conversion.rs), the system maintains the `SPREADSHEET_EXTENSIONS` constant at lines 16-18, which includes `"xlsx"` alongside other spreadsheet formats. When a file enters the system, the `resolve_pdf_input` function (lines 97-112) checks whether the input is already a PDF. For non-PDF files, it delegates immediately to `convert_to_pdf`, initiating the automatic conversion flow.

### Tool Selection via Extension Groups

Inside `convert_to_pdf` (lines 68-71), the file extension is matched against three distinct extension groups. Because XLSX belongs to `SPREADSHEET_EXTENSIONS`, the function selects `ConversionTool::LibreOffice` as the appropriate converter. This variant routes the document through the office document conversion pipeline specifically designed for spreadsheet formats.

### LibreOffice Discovery and Execution

Before conversion begins, `find_libre_office_command` (lines 60-70) searches the system for a LibreOffice executable, checking for `libreoffice`, `soffice`, or known platform-specific installation paths. Once located, `convert_office_document` (lines 31-46) constructs a headless command using a temporary user-profile directory, the `--headless` flag, `--convert-to pdf`, and the output directory. This sandboxed execution converts the XLSX to PDF without launching the GUI.

### Result Handling and Cleanup

After LibreOffice writes the output, `find_pdf_in_dir` (lines 50-57) scans the temporary output folder to locate the generated `.pdf` file. Because LibreOffice may rename the file during conversion, the system uses directory scanning rather than deterministic naming. The discovered PDF path returns to the caller wrapped in a `PdfInputGuard`, which automatically cleans up temporary directories when parsing completes.

## Implementation in Language Bindings

Both the Node.js and Python wrappers expose this functionality through simple APIs that accept XLSX paths directly.

For TypeScript or Node.js applications:

```typescript
import { LiteParse } from "liteparse";

(async () => {
  // Input can be a path to an XLSX file
  const parser = new LiteParse({ input: "report.xlsx" });
  const result = await parser.parse();
  console.log(result.json()); // JSON output of the extracted text
})();

```

For Python applications:

```python
from liteparse import LiteParse

parser = LiteParse("report.xlsx")
result = parser.parse()
print(result.to_json())

```

Under the hood, these bindings invoke the same Rust `resolve_pdf_input` → `convert_to_pdf` → LibreOffice flow described above, eliminating the need for manual conversion steps.

## Key Source Files

The conversion logic spans several critical locations in the repository:

- **[`crates/liteparse/src/conversion.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/conversion.rs)**: Contains the central conversion logic, including extension tables (`SPREADSHEET_EXTENSIONS`), tool selection, LibreOffice discovery (`find_libre_office_command`), and the actual conversion functions (`convert_office_document`, `convert_to_pdf`).
- **[`crates/liteparse/src/parser.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/parser.rs)**: Orchestrates input resolution through `resolve_pdf_input` and passes the resulting PDF to the main parser.
- **[`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs)**: Holds configuration options that can enable or disable conversion features.

## Summary

- LiteParse detects XLSX files using the `SPREADSHEET_EXTENSIONS` constant in [`conversion.rs`](https://github.com/run-llama/liteparse/blob/main/conversion.rs) (lines 16-18).
- Non-PDF inputs trigger `convert_to_pdf`, which routes spreadsheets to the LibreOffice conversion tool.
- The system automatically discovers the LibreOffice binary via `find_libre_office_command` (lines 60-70).
- Headless conversion occurs through `convert_office_document` (lines 31-46) with sandboxed temporary directories.
- Generated PDFs are located via directory scanning and wrapped in `PdfInputGuard` for automatic cleanup.

## Frequently Asked Questions

### What conversion tool does LiteParse use for XLSX files?

LiteParse uses **LibreOffice** in headless mode for XLSX conversion. When `convert_to_pdf` detects a spreadsheet extension, it selects `ConversionTool::LibreOffice` and invokes the binary discovered by `find_libre_office_command` with the `--convert-to pdf` flag.

### How does LiteParse locate the LibreOffice executable?

The `find_libre_office_command` function in [`crates/liteparse/src/conversion.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/conversion.rs) (lines 60-70) searches for executables named `libreoffice`, `soffice`, or platform-specific installation paths. It returns the first valid command string found on the system.

### Can LiteParse convert other spreadsheet formats besides XLSX?

Yes. The `SPREADSHEET_EXTENSIONS` constant includes additional formats such as `xls` and other spreadsheet types. Any extension matching this group receives the same LibreOffice-based PDF conversion treatment as XLSX files.

### Is the temporary PDF file cleaned up after parsing?

Yes. LiteParse wraps the conversion result in a `PdfInputGuard` that manages temporary directories. When parsing completes and the guard drops out of scope, the cleanup process automatically removes the temporary files created during the conversion.