How Vector Embeddings Work in Turso: Storage, Search, and Indexing
Turso stores vector embeddings as binary blobs with embedded type headers, exposes SQL construction functions like vector32(), and provides scalar distance operators (vector_distance_cos, vector_distance_l2, vector_distance_jaccard, vector_distance_dot) for similarity search, plus an IVF inverted-file index for accelerating sparse-vector queries.
Turso, the open-source SQLite-compatible database maintained at tursodatabase/turso, natively supports vector embeddings at the storage engine level. This integration lets you store high-dimensional embeddings—dense, sparse, or quantized—alongside relational data and query them using standard SQL without external services.
Vector Storage Formats and Type System
In core/vector/vector_types.rs, Turso defines five distinct vector encodings. Each type stores raw data plus a one-byte type flag that determines how the engine parses dimensions and executes distance calculations.
| Vector Type | SQL Constructor | Storage Layout | Use Case |
|---|---|---|---|
| Float32Dense | vector32(...) |
Raw little-endian f32 values |
Standard dense embeddings (OpenAI, Cohere) |
| Float64Dense | vector64(...) |
Raw little-endian f64 values |
High-precision scientific vectors |
| Float32Sparse | vector32_sparse(...) |
idx:u32 + value:f32 pairs + 4-byte length |
High-dimensional text/keyword sparse vectors |
| Float1Bit | vector1bit(...) |
Packed bits (1 bit per dimension) | Binary hash-style embeddings |
| Float8 (quantized) | vector8(...) |
1-byte quantized values + alpha/shift metadata | Memory-constrained edge deployments |
The binary format guarantees zero-copy deserialization. When you insert a vector via vector32('[0.1, 0.2]'), the function writes the type byte followed by eight raw bytes representing two f32 values.
Parsing and Vector Construction
When SQL receives a vector literal, the engine dispatches to core/vector/mod.rs. The parse_vector function detects whether the input is TEXT (JSON array) or BLOB (pre-serialized) and returns a Vector struct containing:
- The detected VectorType (discriminant from the first byte)
- Dimensionality derived from byte length / type size
- A reference to the underlying byte buffer
This unified abstraction allows distance functions to operate on any vector type without SQL-layer branching.
Distance and Similarity Functions
Turso implements four metric primitives as scalar SQL functions. Each resides in a dedicated file under core/vector/operations/:
vector_distance_cos(distance_cos.rs): Computes cosine distance (1 − cos θ) using SIMD-acceleratedf32/f64kernels on dense vectors. ForFloat1Bitdata, it optimizes to Hamming-distance calculation (XOR popcount).vector_distance_l2(distance_l2.rs): Calculates Euclidean (L2) distance. Dense paths use thesimdcrate; sparse paths iterate sorted index/value pairs fromVectorSparse.vector_distance_jaccard(jaccard.rs): Measures Jaccard distance forFloat32Sparsevectors. Operates on the sorted non-zero component lists.vector_distance_dot(distance_dot.rs): Returns negative dot product (for use as a distance metric).
All functions accept two vectors of compatible types and return a DOUBLE. Quantized Float8 vectors are de-quantized on-the-fly using stored alpha and shift parameters before distance calculation.
Index-Accelerated Search with IVF
For high-dimensional sparse data, Turso provides the toy inverted-file (IVF) index implemented in core/index_method/toy_vector_sparse_ivf.rs. This index accelerates vector_distance_jaccard queries by avoiding full table scans.
Index Structure
- Component B-tree: Each non-zero dimension of a sparse vector becomes a key mapping to
(sum, rowid)pairs. - Statistics B-tree: Tracks per-component counts, min, and max to enable lower-bound estimation during query pruning.
Creating the index uses standard SQL syntax:
CREATE INDEX idx_sparse ON documents
USING toy_vector_sparse_ivf (embedding);
Query Execution
When you issue a Jaccard similarity query:
SELECT vector_distance_jaccard(embedding, vector32_sparse('[1,0,0,1]')) AS dist
FROM documents
ORDER BY dist
LIMIT 10;
Turso rewrites the plan to:
- Collect query components (subject to
scan_portionandscan_orderparameters) - Estimate a lower bound on Jaccard distance using component statistics (see
CollectComponentsSeekstate) - Scan only inverted index entries that can potentially beat the current best distance plus
delta, skipping irrelevant rows
This pruning drastically reduces I/O for sparse vectors with millions of dimensions but few non-zero values.
Practical SQL Examples
Dense Vector Similarity Search
Store and query dense f32 embeddings using cosine similarity:
CREATE TABLE products(
id INTEGER PRIMARY KEY,
name TEXT,
embedding BLOB
);
INSERT INTO products(name, embedding)
VALUES ('laptop', vector32('[0.1, 0.2, 0.3, 0.4]'));
-- Nearest neighbor search
SELECT name,
vector_distance_cos(embedding, vector32('[0.1, 0.2, 0.35, 0.4]')) AS similarity
FROM products
ORDER BY similarity
LIMIT 5;
Sparse Vectors with IVF Index
Index and search high-dimensional sparse vectors efficiently:
CREATE INDEX doc_idx ON articles
USING toy_vector_sparse_ivf (embedding);
INSERT INTO articles(title, embedding)
VALUES ('AI overview', vector32_sparse('[100, 0, 0, 50]'));
-- Uses the IVF index for pruning
SELECT title,
vector_distance_jaccard(embedding, vector32_sparse('[100, 0, 0, 45]')) AS dist
FROM articles
ORDER BY dist
LIMIT 3;
Quantized Float8 Vectors
Reduce storage by 4× using 8-bit quantization:
INSERT INTO products(name, embedding)
VALUES ('phone', vector8('[0.12, -0.34, 0.78, 0.01]'));
-- Distance calculation automatically handles de-quantization
SELECT vector_distance_cos(embedding, vector8('[0.10, -0.30, 0.80, 0.00]'))
FROM products;
Vector Extraction and Inspection
Debug stored blobs or export vectors to JSON:
SELECT id, vector_extract(embedding) AS json_array
FROM products;
Additionally, vector_slice(start, end) extracts sub-vectors for dimensionality reduction, and vector_concat(v1, v2) merges vectors for combined embeddings.
Summary
- Binary storage: Turso stores vectors in
core/vector/vector_types.rsas typed binary blobs supporting dense (Float32,Float64), sparse (Float32Sparse), quantized (Float8), and binary (Float1Bit) formats. - SQL interface: Construction functions (
vector32,vector64,vector32_sparse,vector8,vector1bit) incore/vector/mod.rshandle parsing and serialization. - Distance metrics: Four native functions (
vector_distance_cos,vector_distance_l2,vector_distance_jaccard,vector_distance_dot) with SIMD-optimized kernels for dense data and specialized paths for sparse and quantized types. - IVF indexing: The
toy_vector_sparse_ivfindex incore/index_method/toy_vector_sparse_ivf.rsprovides B-tree backed inverted-file acceleration for Jaccard distance queries on sparse vectors, using component statistics for aggressive pruning.
Frequently Asked Questions
What vector types does Turso support?
Turso supports Float32Dense, Float64Dense, Float32Sparse, Float1Bit, and Float8 (8-bit quantized). Each type uses a distinct binary layout in core/vector/vector_types.rs and exposes a specific SQL constructor like vector32() or vector32_sparse().
Does Turso use an index for vector search?
Turso provides the toy_vector_sparse_ivf index for sparse vectors, implemented in core/index_method/toy_vector_sparse_ivf.rs. This index accelerates Jaccard distance queries by storing non-zero components in a B-tree and pruning candidates based on statistical lower bounds. Dense vectors currently rely on brute-force distance calculation.
How does Turso calculate distance between different vector types?
Distance functions in core/vector/operations/ dispatch to type-specific kernels. Dense f32/f64 use SIMD instructions where available. Quantized Float8 vectors are de-quantized on-the-fly using stored alpha and shift metadata. Sparse vectors iterate sorted index/value pairs to compute exact Jaccard or cosine metrics.
Can I extract and view the contents of a stored vector blob?
Yes. The vector_extract() SQL function, defined in core/vector/mod.rs, deserializes a binary vector blob back into a human-readable JSON array. This is useful for debugging or exporting embeddings to client applications without manual byte parsing.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →