How Turso Integrates Full-Text Search (FTS) with Tantivy: A Complete Technical Guide
Turso implements Full-Text Search through a native FTS index method that embeds the Tantivy search engine while storing all index data inside Turso's own B-Tree storage layer.
Turso's integration with Tantivy provides SQLite-compatible full-text search semantics with superior performance for large text collections. The implementation bridges the Rust-based Tantivy library with Turso's storage engine through a custom directory implementation, enabling CREATE INDEX ... USING fts syntax and specialized SQL functions like fts_match and fts_score.
Architecture Overview
The integration follows a three-layer architecture that separates SQL interface concerns from search engine implementation details.
Index Method Registration
At the SQL layer, Turso exposes the fts index method through the FtsIndexMethod struct, which implements the IndexMethod trait in [core/index_method/fts.rs](https://github.com/tursodatabase/turso/blob/main/core/index_method/fts.rs#L2890-L2896). This registration enables the CREATE INDEX ... USING fts syntax and wires up three core SQL functions: fts_match, fts_score, and fts_highlight.
Schema and Query Handling
When building an index, the FtsIndexAttachment::new constructor (lines 2910-2950 in the same file) assembles a Tantivy Schema combining row IDs with configurable text fields. The constructor parses the optional WITH (tokenizer, weights) clause to configure tokenization and field boost factors, then creates a Tantivy Index and Searcher for each cursor.
Directory Implementation: The HybridBTreeDirectory Bridge
The critical integration point is HybridBTreeDirectory, which implements Tantivy's Directory trait (lines 690-788) to map every index file to Turso's B-Tree storage. Files are transparently split into 1 MiB chunks with hot-caching for metadata, term dictionaries, and fast fields, while large segment files are lazily loaded on demand.
Storage Layer Integration
Turso adapts Tantivy's file-based expectations to its B-Tree architecture through a sophisticated two-tier caching system and custom file handles.
Two-Tier Caching Strategy
The storage layer employs distinct caching strategies for different access patterns:
- Hot cache (
LruCache<PathBuf>): Stores entire small files including metadata, term dictionaries, and fast fields (seeHybridBTreeDirectory::hot_cache, lines 85-92) - Chunk cache (
LruCache<(PathBuf, i64)>): Stores 1 MiB chunks of large segment files, loaded viaget_chunks_range_blocking(lines 1350-1385)
The read path checks the hot cache first, then pending writes, and finally assembles the file from B-Tree chunks. Writes flow through HybridWriter, which buffers data, updates the in-memory catalog, and queues data for B-Tree flush operations (lines 1008-1040).
File Handle Abstractions
Tantivy reads data through the FileHandle trait, which Turso implements with two concrete types:
InMemoryFileHandle: Handles hot-cached data and pending writes (lines 74-86)LazyFileHandle: Lazily loads required chunks from the B-Tree on access (lines 24-32)
This abstraction allows Tantivy's search algorithms to operate normally while the actual bytes may reside distributed across Turso's B-Tree chunks.
SQL Interface and Query Optimization
Turso provides first-class SQL support for full-text search through built-in functions and optimizer integration.
Built-in FTS Functions
Three SQL functions expose Tantivy's capabilities:
fts_match(column_list, query): Tokenizes the query using the configured Tantivy tokenizer and returns matching rowsfts_score(column_list, query): Returns BM25 relevance scores computed by Tantivy, applying weights from the indexWITHclausefts_highlight(text, query, prefix, suffix): Wraps matching tokens with supplied tags, functioning even without an index using the same tokenizer logic
Query Plan Optimization
The SQL optimizer rewrites column MATCH 'term' expressions into fts_match function calls through transform_match_to_fts_match in [core/translate/optimizer/mod.rs](https://github.com/tursodatabase/turso/blob/main/core/translate/optimizer/mod.rs#L560-L630). This transformation enables the execution engine to recognize FTS patterns and select appropriate index access paths.
Configuration and Tokenization
Turso supports six Tantivy tokenizers configurable at index creation time.
Supported Tokenizers
The SUPPORTED_TOKENIZERS constant (lines 1341-1348) includes:
default: Standard tokenizer splitting on punctuation and whitespacengram: 2-3 character n-grams for fuzzy matching- Additional specialized tokenizers for specific languages and use cases
Creating an FTS Index
Configure tokenization and field weights using the WITH clause:
CREATE INDEX idx_articles_fts
ON articles
USING fts (title, body)
WITH (tokenizer = 'default',
weights = 'title=2.0,body=1.0');
Practical SQL Examples
Basic Search with Relevance Scoring
SELECT fts_score(title, body, 'database') AS rank,
id,
title
FROM articles
WHERE fts_match(title, body, 'database')
ORDER BY rank DESC
LIMIT 10;
Highlighting Search Results
SELECT id,
fts_highlight(body, 'database', '<mark>', '</mark>') AS snippet
FROM articles
WHERE fts_match(title, body, 'database');
Using N-Gram Tokenization
CREATE INDEX idx_products_fts
ON products
USING fts (name)
WITH (tokenizer = 'ngram');
Key Source Files
| File | Role |
|---|---|
[core/index_method/fts.rs](https://github.com/tursodatabase/turso/blob/main/core/index_method/fts.rs) |
Complete FTS implementation including HybridBTreeDirectory, FtsIndexMethod, and SQL function implementations |
[core/Cargo.toml](https://github.com/tursodatabase/turso/blob/main/core/Cargo.toml#L72-L75) |
Declares Tantivy 0.26.0 as an optional dependency behind the fts feature flag |
[core/translate/optimizer/mod.rs](https://github.com/tursodatabase/turso/blob/main/core/translate/optimizer/mod.rs#L560-L630) |
Contains the transform_match_to_fts_match optimizer hook |
docs/manual.md |
High-level user documentation explaining Tantivy-powered FTS |
docs/sql-reference/functions/fts.mdx |
Complete SQL function reference |
Summary
- Turso embeds Tantivy as an optional dependency (version 0.26.0) enabled via the
ftsfeature flag, avoiding compilation overhead when not needed. HybridBTreeDirectoryacts as the bridge between Tantivy's file-based expectations and Turso's B-Tree storage, implementing two-tier caching and lazy loading for optimal performance.- The SQL interface exposes
fts_match,fts_score, andfts_highlightfunctions, with the optimizer automatically rewritingMATCHexpressions to use these primitives. - Configuration flexibility allows per-index tokenizer selection (including
defaultandngram) and field-specific relevance weighting through theWITHclause. - All index data resides within Turso's storage engine through the chunked file abstraction, maintaining transactional consistency and backup compatibility.
Frequently Asked Questions
How does Turso store Tantivy index files without a traditional filesystem?
Turso implements the tantivy::directory::Directory trait through HybridBTreeDirectory, which maps Tantivy's file operations to Turso's B-Tree storage. Files are transparently split into 1 MiB chunks and distributed across the B-Tree, with hot-caching for small metadata files and lazy loading for large segment files.
Can I use Full-Text Search without enabling the fts feature?
No. Tantivy is compiled as an optional dependency in core/Cargo.toml (lines 72-75). The fts feature must be enabled at build time to access the FtsIndexMethod, SQL functions, and directory implementations.
What is the performance difference between fts_match and the MATCH operator?
There is no performance difference because the optimizer transforms MATCH syntax into fts_match function calls via transform_match_to_fts_match in core/translate/optimizer/mod.rs (lines 560-630). Both approaches utilize the same Tantivy-powered index with identical execution paths.
Which tokenizers are available for FTS indexes?
Turso supports six Tantivy tokenizers including default (whitespace and punctuation splitting) and ngram (2-3 character n-grams for partial matching), as defined in the SUPPORTED_TOKENIZERS constant in core/index_method/fts.rs (lines 1341-1348).
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →