# Using Turso for Vector Search and Embedding Similarity

> Discover Turso's native vector search capabilities. Use SQL functions for efficient embedding similarity and inverted-file indexing directly in your database for powerful vector search.

- Repository: [Turso Database/turso](https://github.com/tursodatabase/turso)
- Tags: how-to-guide
- Published: 2026-06-22

---

**Turso provides native vector storage and similarity search through SQL functions like `vector32()`, `vector_distance_cos()`, and an inverted-file index for sparse vectors, enabling embedding-based search directly in your database.**

Turso (the Rust-based SQLite rewrite) ships a first-class **vector extension** that lets you store, manipulate, and search high-dimensional embeddings using standard SQL. The implementation spans from low-level binary serialization in [`core/vector/vector_types.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/vector_types.rs) to high-level SQL functions documented in `docs/sql-reference/functions/vector.mdx`, making it possible to build semantic search applications without external vector databases.

## Vector Storage and Type System

Turso stores vectors as compact binary blobs that preserve type information and dimensionality. When you insert a vector literal such as `vector32('[0.1, 0.3, 0.5]')`, the engine calls `Vector::from_vec` in [`core/vector/vector_types.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/vector_types.rs) to parse the JSON array and encode it into a dense binary format.

### Binary Format and Type Identification

The binary layout includes a trailing type byte that identifies whether the vector is dense, sparse, or quantized. The `Vector::vector_type` method (lines 38-84 of [`vector_types.rs`](https://github.com/tursodatabase/turso/blob/main/vector_types.rs)) matches this byte to determine the dimensionality (`dims`) and select the appropriate Rust type (`Float32Dense`, `Float32Sparse`, etc.). This allows the engine to safely reinterpret the payload during later operations without additional metadata lookups.

### Supported Vector Types

According to the source code analysis, Turso supports:

- **Dense vectors**: `Float32Dense`, `Float64Dense`, `Float8Quantized`
- **Sparse vectors**: `Float32Sparse` (optimized for the IVF index)
- **Quantized vectors**: Space-efficient 8-bit representations via `vector8()`

## SQL Vector Functions

The vector extension exposes constructor and utility functions through the SQLite parser, integrated via [`core/translate/expr/vectors.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/expr/vectors.rs).

### Vector Constructors

These functions parse JSON arrays and return binary blobs:

| Function | Description |
|----------|-------------|
| `vector32(json)` | 32-bit float dense vector |
| `vector64(json)` | 64-bit float dense vector |
| `vector8(json)` | 8-bit quantized vector |
| `vector32_sparse(json)` | Sparse 32-bit float vector |

Example:

```sql
INSERT INTO documents (embedding) 
VALUES (vector32('[0.12, -0.34, 0.56, 0.78]'));

```

### Distance Metrics

Turso implements multiple similarity functions in `core/vector/operations/`:

- **`vector_distance_cos(v1, v2)`**: Cosine distance (1 − cosine similarity), implemented in [`distance_cos.rs`](https://github.com/tursodatabase/turso/blob/main/distance_cos.rs)
- **`vector_distance_l2(v1, v2)`**: Euclidean distance
- **`vector_distance_jaccard(v1, v2)`**: Jaccard distance for binary/sparse vectors, implemented in [`distance_jaccard.rs`](https://github.com/tursodatabase/turso/blob/main/distance_jaccard.rs)

### Utility Functions

Additional helpers include:

- **`vector_extract(v)`**: Serializes a vector back to JSON for debugging
- **`vector_concat(v1, v2, ...)`**: Concatenates multiple vectors
- **`vector_slice(v, start, end)`**: Extracts sub-vectors

## Indexing Sparse Vectors with IVF

For production workloads, linear scans over thousands of embeddings become inefficient. Turso addresses this with a **toy vector sparse IVF** index located in [`core/index_method/toy_vector_sparse_ivf.rs`](https://github.com/tursodatabase/turso/blob/main/core/index_method/toy_vector_sparse_ivf.rs).

### Index Architecture

The IVF (Inverted File) index accelerates **Jaccard similarity** searches on sparse vectors only. When you create the index:

```sql
CREATE INDEX vec_idx ON documents 
USING toy_vector_sparse_ivf (embedding);

```

The `attach` method (lines 26-33) registers the index with the schema. During insertion, the cursor extracts non-zero components via `Vector::as_f32_sparse` and updates two B-Tree structures:

1. An **inverted index** mapping component values to row IDs
2. A **stats tree** storing per-component min/max/count for query pruning

### Query Pattern Recognition

Turso recognizes specific query patterns to trigger index usage. When you execute:

```sql
SELECT vector_distance_jaccard(embedding, vector32_sparse('[1,0,0,0]')) AS distance
FROM documents
ORDER BY distance
LIMIT 10;

```

The query planner matches this against the registered pattern in `VectorSparseInvertedIndexMethod::attach`, allowing the engine to use the inverted index rather than a full table scan.

### Query Execution and Pruning

The index implementation uses a state machine (`CollectComponentsSeek`, `Seek`) that:

1. Gathers the most selective components based on stored statistics
2. Walks the inverted index to find candidate rows
3. Computes a lower bound on Jaccard distance to discard rows that cannot beat the current best plus the configured `delta`

The algorithm is fully described in the `query_start` method (starting at line 40) and the state machine implementation (lines 88-210).

### Configuration Parameters

The IVF index accepts optional parameters in the constructor (lines 61-77):

- **`delta`**: Threshold for pruning candidates
- **`scan_portion`**: Fraction of components to scan
- **`scan_order`**: Ordering strategy for index traversal

**Important limitation**: The IVF index only supports **sparse vectors** (`Float32Sparse`). Dense vectors require full table scans with `vector_distance_cos()` or `vector_distance_l2()`.

## Practical Implementation Examples

### Storing Dense Embeddings

Create a table with a BLOB column and insert vectors using the constructor:

```sql
CREATE TABLE articles (
    id INTEGER PRIMARY KEY,
    title TEXT NOT NULL,
    embedding BLOB
);

INSERT INTO articles VALUES (
    1,
    'Machine Learning Basics',
    vector32('[0.12, -0.34, 0.56, 0.78, -0.11, 0.45, -0.23, 0.67]')
);

```

*Source:* This pattern appears in the integration tests at [`tests/integration/query_processing/test_vacuum.rs`](https://github.com/tursodatabase/turso/blob/main/tests/integration/query_processing/test_vacuum.rs) (lines 3785-3790).

### Cosine Similarity Search

Find the 5 most similar articles to a query vector:

```sql
SELECT 
    id,
    title,
    vector_distance_cos(
        embedding, 
        vector32('[0.1, -0.3, 0.5, 0.8, -0.1, 0.4, -0.2, 0.6]')
    ) AS distance
FROM articles
ORDER BY distance ASC
LIMIT 5;

```

This performs a linear scan using the cosine distance implementation from [`core/vector/operations/distance_cos.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/operations/distance_cos.rs).

### Sparse Vector Indexing

For sparse embeddings (where most dimensions are zero), use the sparse constructor and create an IVF index:

```sql
-- Insert sparse vector
INSERT INTO articles VALUES (
    2,
    'Database Design',
    vector32_sparse('[1, 0, 0, 0, 1, 0, 0, 1]')
);

-- Create IVF index
CREATE INDEX idx_sparse ON articles 
USING toy_vector_sparse_ivf (embedding);

```

### Accelerated Jaccard Search

Query using the IVF index for fast similarity search:

```sql
SELECT 
    id,
    vector_distance_jaccard(
        embedding, 
        vector32_sparse('[1, 0, 0, 0, 1, 0, 0, 0]')
    ) AS distance
FROM articles
ORDER BY distance
LIMIT 10;

```

Because the query matches the registered pattern, Turso rewrites the execution plan to read from the inverted index, dramatically reducing row examinations.

### Inspecting Vector Contents

Debug stored embeddings by converting back to JSON:

```sql
SELECT vector_extract(embedding) AS json_array
FROM articles
WHERE id = 1;

```

This calls `Vector::as_f32_slice` in [`vector_types.rs`](https://github.com/tursodatabase/turso/blob/main/vector_types.rs) to deserialize the binary blob.

## Summary

- **Turso stores vectors as binary blobs** with type-aware serialization in [`core/vector/vector_types.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/vector_types.rs), supporting dense, sparse, and quantized formats.
- **SQL functions provide native vector operations** including constructors (`vector32`, `vector64`, `vector8`), distance metrics (`vector_distance_cos`, `vector_distance_l2`, `vector_distance_jaccard`), and utilities (`vector_extract`, `vector_concat`).
- **Sparse vectors support IVF indexing** via `toy_vector_sparse_ivf` in [`core/index_method/toy_vector_sparse_ivf.rs`](https://github.com/tursodatabase/turso/blob/main/core/index_method/toy_vector_sparse_ivf.rs), which accelerates Jaccard similarity searches through inverted file structures.
- **Dense vectors require linear scans** while sparse vectors benefit from index-based pruning using component statistics and configurable delta thresholds.
- **Pattern matching triggers index usage** when queries follow the specific structure: `SELECT vector_distance_jaccard(<col>, ?) ... ORDER BY distance LIMIT ?`.

## Frequently Asked Questions

### Does Turso support index-accelerated search for dense vectors?

No, the IVF index in [`core/index_method/toy_vector_sparse_ivf.rs`](https://github.com/tursodatabase/turso/blob/main/core/index_method/toy_vector_sparse_ivf.rs) only supports sparse vectors (`Float32Sparse`). Dense vectors must use linear scan searches with `vector_distance_cos()` or `vector_distance_l2()`. For dense embeddings, you should implement application-level filtering or limit result sets with `LIMIT` clauses.

### What is the difference between `vector32()` and `vector32_sparse()`?

`vector32()` creates dense vectors where every dimension is stored sequentially as a 32-bit float, suitable for cosine and L2 distance calculations. `vector32_sparse()` stores only non-zero components with their indices, enabling efficient Jaccard similarity computation and IVF index support in [`toy_vector_sparse_ivf.rs`](https://github.com/tursodatabase/turso/blob/main/toy_vector_sparse_ivf.rs).

### How does Turso serialize vector embeddings internally?

Vectors are serialized into compact binary blobs via `Vector::from_vec` in [`core/vector/vector_types.rs`](https://github.com/tursodatabase/turso/blob/main/core/vector/vector_types.rs). The format includes the raw component values followed by a type identification byte that distinguishes between dense, sparse, and quantized formats. This allows the engine to validate dimensionality and select appropriate distance algorithms at query time.

### Can I use the IVF index with cosine similarity instead of Jaccard?

No, the `toy_vector_sparse_ivf` implementation specifically targets `vector_distance_jaccard` queries. The index method recognizes the pattern `SELECT vector_distance_jaccard(<col>, ?) ... ORDER BY distance LIMIT ?` in its `attach` method (lines 26-33 of [`toy_vector_sparse_ivf.rs`](https://github.com/tursodatabase/turso/blob/main/toy_vector_sparse_ivf.rs)). Cosine similarity searches on dense vectors do not trigger index usage and perform full table scans.