# How Vector Embeddings Work in Turso: Storage, Search, and Indexing

> Discover how Turso stores, searches, and indexes vector embeddings. Explore SQL functions and distance operators for efficient similarity search with IVF indexing.

- Repository: [Turso Database/turso](https://github.com/tursodatabase/turso)
- Tags: deep-dive
- Published: 2026-06-23

---

**Turso stores vector embeddings as binary blobs with embedded type headers, exposes SQL construction functions like `vector32()`, and provides scalar distance operators (`vector_distance_cos`, `vector_distance_l2`, `vector_distance_jaccard`, `vector_distance_dot`) for similarity search, plus an IVF inverted-file index for accelerating sparse-vector queries.**

Turso, the open-source SQLite-compatible database maintained at `tursodatabase/turso`, natively supports vector embeddings at the storage engine level. This integration lets you store high-dimensional embeddings—dense, sparse, or quantized—alongside relational data and query them using standard SQL without external services.

## Vector Storage Formats and Type System

In [`core/vector/vector_types.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/vector_types.rs), Turso defines five distinct vector encodings. Each type stores raw data plus a one-byte type flag that determines how the engine parses dimensions and executes distance calculations.

| Vector Type | SQL Constructor | Storage Layout | Use Case |
|-------------|----------------|----------------|----------|
| **Float32Dense** | `vector32(...)` | Raw little-endian `f32` values | Standard dense embeddings (OpenAI, Cohere) |
| **Float64Dense** | `vector64(...)` | Raw little-endian `f64` values | High-precision scientific vectors |
| **Float32Sparse** | `vector32_sparse(...)` | `idx:u32` + `value:f32` pairs + 4-byte length | High-dimensional text/keyword sparse vectors |
| **Float1Bit** | `vector1bit(...)` | Packed bits (1 bit per dimension) | Binary hash-style embeddings |
| **Float8** (quantized) | `vector8(...)` | 1-byte quantized values + alpha/shift metadata | Memory-constrained edge deployments |

The binary format guarantees zero-copy deserialization. When you insert a vector via `vector32('[0.1, 0.2]')`, the function writes the type byte followed by eight raw bytes representing two `f32` values.

## Parsing and Vector Construction

When SQL receives a vector literal, the engine dispatches to [`core/vector/mod.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/mod.rs). The `parse_vector` function detects whether the input is `TEXT` (JSON array) or `BLOB` (pre-serialized) and returns a `Vector` struct containing:

- The detected **VectorType** (discriminant from the first byte)
- Dimensionality derived from byte length / type size
- A reference to the underlying byte buffer

This unified abstraction allows distance functions to operate on any vector type without SQL-layer branching.

## Distance and Similarity Functions

Turso implements four metric primitives as scalar SQL functions. Each resides in a dedicated file under `core/vector/operations/`:

- **`vector_distance_cos`** ([`distance_cos.rs`](https://github.com/tursodatabase/turso/blob/main/distance_cos.rs)): Computes cosine distance (1 − cos θ) using SIMD-accelerated `f32`/`f64` kernels on dense vectors. For `Float1Bit` data, it optimizes to Hamming-distance calculation (XOR popcount).
- **`vector_distance_l2`** ([`distance_l2.rs`](https://github.com/tursodatabase/turso/blob/main/distance_l2.rs)): Calculates Euclidean (L2) distance. Dense paths use the `simd` crate; sparse paths iterate sorted index/value pairs from `VectorSparse`.
- **`vector_distance_jaccard`** ([`jaccard.rs`](https://github.com/tursodatabase/turso/blob/main/jaccard.rs)): Measures Jaccard distance for `Float32Sparse` vectors. Operates on the sorted non-zero component lists.
- **`vector_distance_dot`** ([`distance_dot.rs`](https://github.com/tursodatabase/turso/blob/main/distance_dot.rs)): Returns negative dot product (for use as a distance metric).

All functions accept two vectors of compatible types and return a `DOUBLE`. Quantized `Float8` vectors are de-quantized on-the-fly using stored `alpha` and `shift` parameters before distance calculation.

## Index-Accelerated Search with IVF

For high-dimensional sparse data, Turso provides the **toy inverted-file (IVF) index** implemented in [`core/index_method/toy_vector_sparse_ivf.rs`](https://github.com/tursodatabase/turso/blob/main/core/index_method/toy_vector_sparse_ivf.rs). This index accelerates `vector_distance_jaccard` queries by avoiding full table scans.

### Index Structure

- **Component B-tree**: Each non-zero dimension of a sparse vector becomes a key mapping to `(sum, rowid)` pairs.
- **Statistics B-tree**: Tracks per-component counts, min, and max to enable lower-bound estimation during query pruning.

Creating the index uses standard SQL syntax:

```sql
CREATE INDEX idx_sparse ON documents 
USING toy_vector_sparse_ivf (embedding);

```

### Query Execution

When you issue a Jaccard similarity query:

```sql
SELECT vector_distance_jaccard(embedding, vector32_sparse('[1,0,0,1]')) AS dist
FROM documents
ORDER BY dist
LIMIT 10;

```

Turso rewrites the plan to:
1. Collect query components (subject to `scan_portion` and `scan_order` parameters)
2. Estimate a lower bound on Jaccard distance using component statistics (see `CollectComponentsSeek` state)
3. Scan only inverted index entries that can potentially beat the current best distance plus `delta`, skipping irrelevant rows

This pruning drastically reduces I/O for sparse vectors with millions of dimensions but few non-zero values.

## Practical SQL Examples

### Dense Vector Similarity Search

Store and query dense `f32` embeddings using cosine similarity:

```sql
CREATE TABLE products(
    id INTEGER PRIMARY KEY,
    name TEXT,
    embedding BLOB
);

INSERT INTO products(name, embedding) 
VALUES ('laptop', vector32('[0.1, 0.2, 0.3, 0.4]'));

-- Nearest neighbor search
SELECT name, 
       vector_distance_cos(embedding, vector32('[0.1, 0.2, 0.35, 0.4]')) AS similarity
FROM products
ORDER BY similarity
LIMIT 5;

```

### Sparse Vectors with IVF Index

Index and search high-dimensional sparse vectors efficiently:

```sql
CREATE INDEX doc_idx ON articles 
USING toy_vector_sparse_ivf (embedding);

INSERT INTO articles(title, embedding) 
VALUES ('AI overview', vector32_sparse('[100, 0, 0, 50]'));

-- Uses the IVF index for pruning
SELECT title,
       vector_distance_jaccard(embedding, vector32_sparse('[100, 0, 0, 45]')) AS dist
FROM articles
ORDER BY dist
LIMIT 3;

```

### Quantized Float8 Vectors

Reduce storage by 4× using 8-bit quantization:

```sql
INSERT INTO products(name, embedding)
VALUES ('phone', vector8('[0.12, -0.34, 0.78, 0.01]'));

-- Distance calculation automatically handles de-quantization
SELECT vector_distance_cos(embedding, vector8('[0.10, -0.30, 0.80, 0.00]'))
FROM products;

```

### Vector Extraction and Inspection

Debug stored blobs or export vectors to JSON:

```sql
SELECT id, vector_extract(embedding) AS json_array
FROM products;

```

Additionally, `vector_slice(start, end)` extracts sub-vectors for dimensionality reduction, and `vector_concat(v1, v2)` merges vectors for combined embeddings.

## Summary

- **Binary storage**: Turso stores vectors in [`core/vector/vector_types.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/vector_types.rs) as typed binary blobs supporting dense (`Float32`, `Float64`), sparse (`Float32Sparse`), quantized (`Float8`), and binary (`Float1Bit`) formats.
- **SQL interface**: Construction functions (`vector32`, `vector64`, `vector32_sparse`, `vector8`, `vector1bit`) in [`core/vector/mod.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/mod.rs) handle parsing and serialization.
- **Distance metrics**: Four native functions (`vector_distance_cos`, `vector_distance_l2`, `vector_distance_jaccard`, `vector_distance_dot`) with SIMD-optimized kernels for dense data and specialized paths for sparse and quantized types.
- **IVF indexing**: The `toy_vector_sparse_ivf` index in [`core/index_method/toy_vector_sparse_ivf.rs`](https://github.com/tursodatabase/turso/blob/main/core/index_method/toy_vector_sparse_ivf.rs) provides B-tree backed inverted-file acceleration for Jaccard distance queries on sparse vectors, using component statistics for aggressive pruning.

## Frequently Asked Questions

### What vector types does Turso support?

Turso supports **Float32Dense**, **Float64Dense**, **Float32Sparse**, **Float1Bit**, and **Float8** (8-bit quantized). Each type uses a distinct binary layout in [`core/vector/vector_types.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/vector_types.rs) and exposes a specific SQL constructor like `vector32()` or `vector32_sparse()`.

### Does Turso use an index for vector search?

Turso provides the **`toy_vector_sparse_ivf`** index for sparse vectors, implemented in [`core/index_method/toy_vector_sparse_ivf.rs`](https://github.com/tursodatabase/turso/blob/main/core/index_method/toy_vector_sparse_ivf.rs). This index accelerates Jaccard distance queries by storing non-zero components in a B-tree and pruning candidates based on statistical lower bounds. Dense vectors currently rely on brute-force distance calculation.

### How does Turso calculate distance between different vector types?

Distance functions in `core/vector/operations/` dispatch to type-specific kernels. Dense `f32`/`f64` use SIMD instructions where available. Quantized `Float8` vectors are de-quantized on-the-fly using stored `alpha` and `shift` metadata. Sparse vectors iterate sorted index/value pairs to compute exact Jaccard or cosine metrics.

### Can I extract and view the contents of a stored vector blob?

Yes. The **`vector_extract()`** SQL function, defined in [`core/vector/mod.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/mod.rs), deserializes a binary vector blob back into a human-readable JSON array. This is useful for debugging or exporting embeddings to client applications without manual byte parsing.