# How Turso Integrates Full-Text Search (FTS) with Tantivy: A Complete Technical Guide

> Discover how Turso integrates Full-Text Search using Tantivy. Learn about native FTS indexing and data storage via B-Tree for efficient searching.

- Repository: [Turso Database/turso](https://github.com/tursodatabase/turso)
- Tags: deep-dive
- Published: 2026-06-23

---

**Turso implements Full-Text Search through a native FTS index method that embeds the Tantivy search engine while storing all index data inside Turso's own B-Tree storage layer.**

Turso's integration with Tantivy provides SQLite-compatible full-text search semantics with superior performance for large text collections. The implementation bridges the Rust-based Tantivy library with Turso's storage engine through a custom directory implementation, enabling `CREATE INDEX ... USING fts` syntax and specialized SQL functions like `fts_match` and `fts_score`.

## Architecture Overview

The integration follows a three-layer architecture that separates SQL interface concerns from search engine implementation details.

### Index Method Registration

At the SQL layer, Turso exposes the `fts` index method through the `FtsIndexMethod` struct, which implements the `IndexMethod` trait in [[`core/index_method/fts.rs`](https://github.com/tursodatabase/turso/blob/main/core/index_method/fts.rs)](https://github.com/tursodatabase/turso/blob/main/core/index_method/fts.rs#L2890-L2896). This registration enables the `CREATE INDEX ... USING fts` syntax and wires up three core SQL functions: `fts_match`, `fts_score`, and `fts_highlight`.

### Schema and Query Handling

When building an index, the `FtsIndexAttachment::new` constructor (lines 2910-2950 in the same file) assembles a Tantivy `Schema` combining row IDs with configurable text fields. The constructor parses the optional `WITH (tokenizer, weights)` clause to configure tokenization and field boost factors, then creates a Tantivy `Index` and `Searcher` for each cursor.

### Directory Implementation: The HybridBTreeDirectory Bridge

The critical integration point is `HybridBTreeDirectory`, which implements Tantivy's `Directory` trait (lines 690-788) to map every index file to Turso's B-Tree storage. Files are transparently split into 1 MiB chunks with hot-caching for metadata, term dictionaries, and fast fields, while large segment files are lazily loaded on demand.

## Storage Layer Integration

Turso adapts Tantivy's file-based expectations to its B-Tree architecture through a sophisticated two-tier caching system and custom file handles.

### Two-Tier Caching Strategy

The storage layer employs distinct caching strategies for different access patterns:

- **Hot cache** (`LruCache<PathBuf>`): Stores entire small files including metadata, term dictionaries, and fast fields (see `HybridBTreeDirectory::hot_cache`, lines 85-92)
- **Chunk cache** (`LruCache<(PathBuf, i64)>`): Stores 1 MiB chunks of large segment files, loaded via `get_chunks_range_blocking` (lines 1350-1385)

The read path checks the hot cache first, then pending writes, and finally assembles the file from B-Tree chunks. Writes flow through `HybridWriter`, which buffers data, updates the in-memory catalog, and queues data for B-Tree flush operations (lines 1008-1040).

### File Handle Abstractions

Tantivy reads data through the `FileHandle` trait, which Turso implements with two concrete types:

- **`InMemoryFileHandle`**: Handles hot-cached data and pending writes (lines 74-86)
- **`LazyFileHandle`**: Lazily loads required chunks from the B-Tree on access (lines 24-32)

This abstraction allows Tantivy's search algorithms to operate normally while the actual bytes may reside distributed across Turso's B-Tree chunks.

## SQL Interface and Query Optimization

Turso provides first-class SQL support for full-text search through built-in functions and optimizer integration.

### Built-in FTS Functions

Three SQL functions expose Tantivy's capabilities:

- **`fts_match(column_list, query)`**: Tokenizes the query using the configured Tantivy tokenizer and returns matching rows
- **`fts_score(column_list, query)`**: Returns BM25 relevance scores computed by Tantivy, applying weights from the index `WITH` clause
- **`fts_highlight(text, query, prefix, suffix)`**: Wraps matching tokens with supplied tags, functioning even without an index using the same tokenizer logic

### Query Plan Optimization

The SQL optimizer rewrites `column MATCH 'term'` expressions into `fts_match` function calls through `transform_match_to_fts_match` in [[`core/translate/optimizer/mod.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/optimizer/mod.rs)](https://github.com/tursodatabase/turso/blob/main/core/translate/optimizer/mod.rs#L560-L630). This transformation enables the execution engine to recognize FTS patterns and select appropriate index access paths.

## Configuration and Tokenization

Turso supports six Tantivy tokenizers configurable at index creation time.

### Supported Tokenizers

The `SUPPORTED_TOKENIZERS` constant (lines 1341-1348) includes:
- `default`: Standard tokenizer splitting on punctuation and whitespace
- `ngram`: 2-3 character n-grams for fuzzy matching
- Additional specialized tokenizers for specific languages and use cases

### Creating an FTS Index

Configure tokenization and field weights using the `WITH` clause:

```sql
CREATE INDEX idx_articles_fts
    ON articles
    USING fts (title, body)
    WITH (tokenizer = 'default',
          weights   = 'title=2.0,body=1.0');

```

## Practical SQL Examples

### Basic Search with Relevance Scoring

```sql
SELECT fts_score(title, body, 'database') AS rank,
       id,
       title
FROM   articles
WHERE  fts_match(title, body, 'database')
ORDER  BY rank DESC
LIMIT  10;

```

### Highlighting Search Results

```sql
SELECT id,
       fts_highlight(body, 'database', '<mark>', '</mark>') AS snippet
FROM   articles
WHERE  fts_match(title, body, 'database');

```

### Using N-Gram Tokenization

```sql
CREATE INDEX idx_products_fts
    ON products
    USING fts (name)
    WITH (tokenizer = 'ngram');

```

## Key Source Files

| File | Role |
|------|------|
| [[`core/index_method/fts.rs`](https://github.com/tursodatabase/turso/blob/main/core/index_method/fts.rs)](https://github.com/tursodatabase/turso/blob/main/core/index_method/fts.rs) | Complete FTS implementation including `HybridBTreeDirectory`, `FtsIndexMethod`, and SQL function implementations |
| [[`core/Cargo.toml`](https://github.com/tursodatabase/turso/blob/main/core/Cargo.toml)](https://github.com/tursodatabase/turso/blob/main/core/Cargo.toml#L72-L75) | Declares Tantivy 0.26.0 as an optional dependency behind the `fts` feature flag |
| [[`core/translate/optimizer/mod.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/optimizer/mod.rs)](https://github.com/tursodatabase/turso/blob/main/core/translate/optimizer/mod.rs#L560-L630) | Contains the `transform_match_to_fts_match` optimizer hook |
| [`docs/manual.md`](https://github.com/tursodatabase/turso/blob/main/docs/manual.md) | High-level user documentation explaining Tantivy-powered FTS |
| `docs/sql-reference/functions/fts.mdx` | Complete SQL function reference |

## Summary

- Turso embeds Tantivy as an **optional dependency** (version 0.26.0) enabled via the `fts` feature flag, avoiding compilation overhead when not needed.
- **`HybridBTreeDirectory`** acts as the bridge between Tantivy's file-based expectations and Turso's B-Tree storage, implementing two-tier caching and lazy loading for optimal performance.
- The **SQL interface** exposes `fts_match`, `fts_score`, and `fts_highlight` functions, with the optimizer automatically rewriting `MATCH` expressions to use these primitives.
- **Configuration flexibility** allows per-index tokenizer selection (including `default` and `ngram`) and field-specific relevance weighting through the `WITH` clause.
- All index data resides within Turso's storage engine through the chunked file abstraction, maintaining transactional consistency and backup compatibility.

## Frequently Asked Questions

### How does Turso store Tantivy index files without a traditional filesystem?

Turso implements the `tantivy::directory::Directory` trait through `HybridBTreeDirectory`, which maps Tantivy's file operations to Turso's B-Tree storage. Files are transparently split into 1 MiB chunks and distributed across the B-Tree, with hot-caching for small metadata files and lazy loading for large segment files.

### Can I use Full-Text Search without enabling the `fts` feature?

No. Tantivy is compiled as an optional dependency in [`core/Cargo.toml`](https://github.com/tursodatabase/turso/blob/main/core/Cargo.toml) (lines 72-75). The `fts` feature must be enabled at build time to access the `FtsIndexMethod`, SQL functions, and directory implementations.

### What is the performance difference between `fts_match` and the `MATCH` operator?

There is no performance difference because the optimizer transforms `MATCH` syntax into `fts_match` function calls via `transform_match_to_fts_match` in [`core/translate/optimizer/mod.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/optimizer/mod.rs) (lines 560-630). Both approaches utilize the same Tantivy-powered index with identical execution paths.

### Which tokenizers are available for FTS indexes?

Turso supports six Tantivy tokenizers including `default` (whitespace and punctuation splitting) and `ngram` (2-3 character n-grams for partial matching), as defined in the `SUPPORTED_TOKENIZERS` constant in [`core/index_method/fts.rs`](https://github.com/tursodatabase/turso/blob/main/core/index_method/fts.rs) (lines 1341-1348).