Using Turso for Vector Search and Embedding Similarity
Turso provides native vector storage and similarity search through SQL functions like vector32(), vector_distance_cos(), and an inverted-file index for sparse vectors, enabling embedding-based search directly in your database.
Turso (the Rust-based SQLite rewrite) ships a first-class vector extension that lets you store, manipulate, and search high-dimensional embeddings using standard SQL. The implementation spans from low-level binary serialization in core/vector/vector_types.rs to high-level SQL functions documented in docs/sql-reference/functions/vector.mdx, making it possible to build semantic search applications without external vector databases.
Vector Storage and Type System
Turso stores vectors as compact binary blobs that preserve type information and dimensionality. When you insert a vector literal such as vector32('[0.1, 0.3, 0.5]'), the engine calls Vector::from_vec in core/vector/vector_types.rs to parse the JSON array and encode it into a dense binary format.
Binary Format and Type Identification
The binary layout includes a trailing type byte that identifies whether the vector is dense, sparse, or quantized. The Vector::vector_type method (lines 38-84 of vector_types.rs) matches this byte to determine the dimensionality (dims) and select the appropriate Rust type (Float32Dense, Float32Sparse, etc.). This allows the engine to safely reinterpret the payload during later operations without additional metadata lookups.
Supported Vector Types
According to the source code analysis, Turso supports:
- Dense vectors:
Float32Dense,Float64Dense,Float8Quantized - Sparse vectors:
Float32Sparse(optimized for the IVF index) - Quantized vectors: Space-efficient 8-bit representations via
vector8()
SQL Vector Functions
The vector extension exposes constructor and utility functions through the SQLite parser, integrated via core/translate/expr/vectors.rs.
Vector Constructors
These functions parse JSON arrays and return binary blobs:
| Function | Description |
|---|---|
vector32(json) |
32-bit float dense vector |
vector64(json) |
64-bit float dense vector |
vector8(json) |
8-bit quantized vector |
vector32_sparse(json) |
Sparse 32-bit float vector |
Example:
INSERT INTO documents (embedding)
VALUES (vector32('[0.12, -0.34, 0.56, 0.78]'));
Distance Metrics
Turso implements multiple similarity functions in core/vector/operations/:
vector_distance_cos(v1, v2): Cosine distance (1 − cosine similarity), implemented indistance_cos.rsvector_distance_l2(v1, v2): Euclidean distancevector_distance_jaccard(v1, v2): Jaccard distance for binary/sparse vectors, implemented indistance_jaccard.rs
Utility Functions
Additional helpers include:
vector_extract(v): Serializes a vector back to JSON for debuggingvector_concat(v1, v2, ...): Concatenates multiple vectorsvector_slice(v, start, end): Extracts sub-vectors
Indexing Sparse Vectors with IVF
For production workloads, linear scans over thousands of embeddings become inefficient. Turso addresses this with a toy vector sparse IVF index located in core/index_method/toy_vector_sparse_ivf.rs.
Index Architecture
The IVF (Inverted File) index accelerates Jaccard similarity searches on sparse vectors only. When you create the index:
CREATE INDEX vec_idx ON documents
USING toy_vector_sparse_ivf (embedding);
The attach method (lines 26-33) registers the index with the schema. During insertion, the cursor extracts non-zero components via Vector::as_f32_sparse and updates two B-Tree structures:
- An inverted index mapping component values to row IDs
- A stats tree storing per-component min/max/count for query pruning
Query Pattern Recognition
Turso recognizes specific query patterns to trigger index usage. When you execute:
SELECT vector_distance_jaccard(embedding, vector32_sparse('[1,0,0,0]')) AS distance
FROM documents
ORDER BY distance
LIMIT 10;
The query planner matches this against the registered pattern in VectorSparseInvertedIndexMethod::attach, allowing the engine to use the inverted index rather than a full table scan.
Query Execution and Pruning
The index implementation uses a state machine (CollectComponentsSeek, Seek) that:
- Gathers the most selective components based on stored statistics
- Walks the inverted index to find candidate rows
- Computes a lower bound on Jaccard distance to discard rows that cannot beat the current best plus the configured
delta
The algorithm is fully described in the query_start method (starting at line 40) and the state machine implementation (lines 88-210).
Configuration Parameters
The IVF index accepts optional parameters in the constructor (lines 61-77):
delta: Threshold for pruning candidatesscan_portion: Fraction of components to scanscan_order: Ordering strategy for index traversal
Important limitation: The IVF index only supports sparse vectors (Float32Sparse). Dense vectors require full table scans with vector_distance_cos() or vector_distance_l2().
Practical Implementation Examples
Storing Dense Embeddings
Create a table with a BLOB column and insert vectors using the constructor:
CREATE TABLE articles (
id INTEGER PRIMARY KEY,
title TEXT NOT NULL,
embedding BLOB
);
INSERT INTO articles VALUES (
1,
'Machine Learning Basics',
vector32('[0.12, -0.34, 0.56, 0.78, -0.11, 0.45, -0.23, 0.67]')
);
Source: This pattern appears in the integration tests at tests/integration/query_processing/test_vacuum.rs (lines 3785-3790).
Cosine Similarity Search
Find the 5 most similar articles to a query vector:
SELECT
id,
title,
vector_distance_cos(
embedding,
vector32('[0.1, -0.3, 0.5, 0.8, -0.1, 0.4, -0.2, 0.6]')
) AS distance
FROM articles
ORDER BY distance ASC
LIMIT 5;
This performs a linear scan using the cosine distance implementation from core/vector/operations/distance_cos.rs.
Sparse Vector Indexing
For sparse embeddings (where most dimensions are zero), use the sparse constructor and create an IVF index:
-- Insert sparse vector
INSERT INTO articles VALUES (
2,
'Database Design',
vector32_sparse('[1, 0, 0, 0, 1, 0, 0, 1]')
);
-- Create IVF index
CREATE INDEX idx_sparse ON articles
USING toy_vector_sparse_ivf (embedding);
Accelerated Jaccard Search
Query using the IVF index for fast similarity search:
SELECT
id,
vector_distance_jaccard(
embedding,
vector32_sparse('[1, 0, 0, 0, 1, 0, 0, 0]')
) AS distance
FROM articles
ORDER BY distance
LIMIT 10;
Because the query matches the registered pattern, Turso rewrites the execution plan to read from the inverted index, dramatically reducing row examinations.
Inspecting Vector Contents
Debug stored embeddings by converting back to JSON:
SELECT vector_extract(embedding) AS json_array
FROM articles
WHERE id = 1;
This calls Vector::as_f32_slice in vector_types.rs to deserialize the binary blob.
Summary
- Turso stores vectors as binary blobs with type-aware serialization in
core/vector/vector_types.rs, supporting dense, sparse, and quantized formats. - SQL functions provide native vector operations including constructors (
vector32,vector64,vector8), distance metrics (vector_distance_cos,vector_distance_l2,vector_distance_jaccard), and utilities (vector_extract,vector_concat). - Sparse vectors support IVF indexing via
toy_vector_sparse_ivfincore/index_method/toy_vector_sparse_ivf.rs, which accelerates Jaccard similarity searches through inverted file structures. - Dense vectors require linear scans while sparse vectors benefit from index-based pruning using component statistics and configurable delta thresholds.
- Pattern matching triggers index usage when queries follow the specific structure:
SELECT vector_distance_jaccard(<col>, ?) ... ORDER BY distance LIMIT ?.
Frequently Asked Questions
Does Turso support index-accelerated search for dense vectors?
No, the IVF index in core/index_method/toy_vector_sparse_ivf.rs only supports sparse vectors (Float32Sparse). Dense vectors must use linear scan searches with vector_distance_cos() or vector_distance_l2(). For dense embeddings, you should implement application-level filtering or limit result sets with LIMIT clauses.
What is the difference between vector32() and vector32_sparse()?
vector32() creates dense vectors where every dimension is stored sequentially as a 32-bit float, suitable for cosine and L2 distance calculations. vector32_sparse() stores only non-zero components with their indices, enabling efficient Jaccard similarity computation and IVF index support in toy_vector_sparse_ivf.rs.
How does Turso serialize vector embeddings internally?
Vectors are serialized into compact binary blobs via Vector::from_vec in core/vector/vector_types.rs. The format includes the raw component values followed by a type identification byte that distinguishes between dense, sparse, and quantized formats. This allows the engine to validate dimensionality and select appropriate distance algorithms at query time.
Can I use the IVF index with cosine similarity instead of Jaccard?
No, the toy_vector_sparse_ivf implementation specifically targets vector_distance_jaccard queries. The index method recognizes the pattern SELECT vector_distance_jaccard(<col>, ?) ... ORDER BY distance LIMIT ? in its attach method (lines 26-33 of toy_vector_sparse_ivf.rs). Cosine similarity searches on dense vectors do not trigger index usage and perform full table scans.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →