How Turso Integrates Full-Text Search (FTS) with Tantivy: A Complete Technical Guide

Turso implements Full-Text Search through a native FTS index method that embeds the Tantivy search engine while storing all index data inside Turso's own B-Tree storage layer.

Turso's integration with Tantivy provides SQLite-compatible full-text search semantics with superior performance for large text collections. The implementation bridges the Rust-based Tantivy library with Turso's storage engine through a custom directory implementation, enabling CREATE INDEX ... USING fts syntax and specialized SQL functions like fts_match and fts_score.

Architecture Overview

The integration follows a three-layer architecture that separates SQL interface concerns from search engine implementation details.

Index Method Registration

At the SQL layer, Turso exposes the fts index method through the FtsIndexMethod struct, which implements the IndexMethod trait in [core/index_method/fts.rs](https://github.com/tursodatabase/turso/blob/main/core/index_method/fts.rs#L2890-L2896). This registration enables the CREATE INDEX ... USING fts syntax and wires up three core SQL functions: fts_match, fts_score, and fts_highlight.

Schema and Query Handling

When building an index, the FtsIndexAttachment::new constructor (lines 2910-2950 in the same file) assembles a Tantivy Schema combining row IDs with configurable text fields. The constructor parses the optional WITH (tokenizer, weights) clause to configure tokenization and field boost factors, then creates a Tantivy Index and Searcher for each cursor.

Directory Implementation: The HybridBTreeDirectory Bridge

The critical integration point is HybridBTreeDirectory, which implements Tantivy's Directory trait (lines 690-788) to map every index file to Turso's B-Tree storage. Files are transparently split into 1 MiB chunks with hot-caching for metadata, term dictionaries, and fast fields, while large segment files are lazily loaded on demand.

Storage Layer Integration

Turso adapts Tantivy's file-based expectations to its B-Tree architecture through a sophisticated two-tier caching system and custom file handles.

Two-Tier Caching Strategy

The storage layer employs distinct caching strategies for different access patterns:

  • Hot cache (LruCache<PathBuf>): Stores entire small files including metadata, term dictionaries, and fast fields (see HybridBTreeDirectory::hot_cache, lines 85-92)
  • Chunk cache (LruCache<(PathBuf, i64)>): Stores 1 MiB chunks of large segment files, loaded via get_chunks_range_blocking (lines 1350-1385)

The read path checks the hot cache first, then pending writes, and finally assembles the file from B-Tree chunks. Writes flow through HybridWriter, which buffers data, updates the in-memory catalog, and queues data for B-Tree flush operations (lines 1008-1040).

File Handle Abstractions

Tantivy reads data through the FileHandle trait, which Turso implements with two concrete types:

  • InMemoryFileHandle: Handles hot-cached data and pending writes (lines 74-86)
  • LazyFileHandle: Lazily loads required chunks from the B-Tree on access (lines 24-32)

This abstraction allows Tantivy's search algorithms to operate normally while the actual bytes may reside distributed across Turso's B-Tree chunks.

SQL Interface and Query Optimization

Turso provides first-class SQL support for full-text search through built-in functions and optimizer integration.

Built-in FTS Functions

Three SQL functions expose Tantivy's capabilities:

  • fts_match(column_list, query): Tokenizes the query using the configured Tantivy tokenizer and returns matching rows
  • fts_score(column_list, query): Returns BM25 relevance scores computed by Tantivy, applying weights from the index WITH clause
  • fts_highlight(text, query, prefix, suffix): Wraps matching tokens with supplied tags, functioning even without an index using the same tokenizer logic

Query Plan Optimization

The SQL optimizer rewrites column MATCH 'term' expressions into fts_match function calls through transform_match_to_fts_match in [core/translate/optimizer/mod.rs](https://github.com/tursodatabase/turso/blob/main/core/translate/optimizer/mod.rs#L560-L630). This transformation enables the execution engine to recognize FTS patterns and select appropriate index access paths.

Configuration and Tokenization

Turso supports six Tantivy tokenizers configurable at index creation time.

Supported Tokenizers

The SUPPORTED_TOKENIZERS constant (lines 1341-1348) includes:

  • default: Standard tokenizer splitting on punctuation and whitespace
  • ngram: 2-3 character n-grams for fuzzy matching
  • Additional specialized tokenizers for specific languages and use cases

Creating an FTS Index

Configure tokenization and field weights using the WITH clause:

CREATE INDEX idx_articles_fts
    ON articles
    USING fts (title, body)
    WITH (tokenizer = 'default',
          weights   = 'title=2.0,body=1.0');

Practical SQL Examples

Basic Search with Relevance Scoring

SELECT fts_score(title, body, 'database') AS rank,
       id,
       title
FROM   articles
WHERE  fts_match(title, body, 'database')
ORDER  BY rank DESC
LIMIT  10;

Highlighting Search Results

SELECT id,
       fts_highlight(body, 'database', '<mark>', '</mark>') AS snippet
FROM   articles
WHERE  fts_match(title, body, 'database');

Using N-Gram Tokenization

CREATE INDEX idx_products_fts
    ON products
    USING fts (name)
    WITH (tokenizer = 'ngram');

Key Source Files

File Role
[core/index_method/fts.rs](https://github.com/tursodatabase/turso/blob/main/core/index_method/fts.rs) Complete FTS implementation including HybridBTreeDirectory, FtsIndexMethod, and SQL function implementations
[core/Cargo.toml](https://github.com/tursodatabase/turso/blob/main/core/Cargo.toml#L72-L75) Declares Tantivy 0.26.0 as an optional dependency behind the fts feature flag
[core/translate/optimizer/mod.rs](https://github.com/tursodatabase/turso/blob/main/core/translate/optimizer/mod.rs#L560-L630) Contains the transform_match_to_fts_match optimizer hook
docs/manual.md High-level user documentation explaining Tantivy-powered FTS
docs/sql-reference/functions/fts.mdx Complete SQL function reference

Summary

  • Turso embeds Tantivy as an optional dependency (version 0.26.0) enabled via the fts feature flag, avoiding compilation overhead when not needed.
  • HybridBTreeDirectory acts as the bridge between Tantivy's file-based expectations and Turso's B-Tree storage, implementing two-tier caching and lazy loading for optimal performance.
  • The SQL interface exposes fts_match, fts_score, and fts_highlight functions, with the optimizer automatically rewriting MATCH expressions to use these primitives.
  • Configuration flexibility allows per-index tokenizer selection (including default and ngram) and field-specific relevance weighting through the WITH clause.
  • All index data resides within Turso's storage engine through the chunked file abstraction, maintaining transactional consistency and backup compatibility.

Frequently Asked Questions

How does Turso store Tantivy index files without a traditional filesystem?

Turso implements the tantivy::directory::Directory trait through HybridBTreeDirectory, which maps Tantivy's file operations to Turso's B-Tree storage. Files are transparently split into 1 MiB chunks and distributed across the B-Tree, with hot-caching for small metadata files and lazy loading for large segment files.

Can I use Full-Text Search without enabling the fts feature?

No. Tantivy is compiled as an optional dependency in core/Cargo.toml (lines 72-75). The fts feature must be enabled at build time to access the FtsIndexMethod, SQL functions, and directory implementations.

What is the performance difference between fts_match and the MATCH operator?

There is no performance difference because the optimizer transforms MATCH syntax into fts_match function calls via transform_match_to_fts_match in core/translate/optimizer/mod.rs (lines 560-630). Both approaches utilize the same Tantivy-powered index with identical execution paths.

Which tokenizers are available for FTS indexes?

Turso supports six Tantivy tokenizers including default (whitespace and punctuation splitting) and ngram (2-3 character n-grams for partial matching), as defined in the SUPPORTED_TOKENIZERS constant in core/index_method/fts.rs (lines 1341-1348).

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →