BM25Scorer and EmbeddingScorer for Relevance Filtering in Headroom: A Complete Comparison

BM25Scorer delivers sub-millisecond exact-token matching with zero dependencies, while EmbeddingScorer captures synonymy and semantic intent via dense vectors, and both share an identical score(item, context) interface so you can drop them into HybridScorer or swap them without changing call sites.

Headroom (chopratejas/headroom) ships two built-in relevance scorers that target fundamentally different retrieval needs. Choosing between BM25Scorer and EmbeddingScorer—or blending them—determines whether your relevance filtering pipeline optimizes for raw speed and exact identifiers or for conceptual understanding and sparse keyword overlap.

How BM25Scorer Implements Keyword-Based Relevance Filtering

Term-Frequency Scoring with Zero Dependencies

BM25Scorer is a pure-Python implementation of the classical BM25 term-frequency / inverse-document-frequency algorithm. It requires no external packages, making it ideal for environments where dependency weight and network access are constrained. In headroom/relevance/bm25.py, the scorer tokenizes text, computes a floored IDF variant identical to Lucene and Elasticsearch, and normalizes the result to a 0–1 range.

The tokenizer is deliberately simple: it preserves UUIDs and numeric IDs intact and applies a small boost to long tokens (≥ 8 characters). This design means a query containing an exact UUID will score near 1.0 against a matching document, while a query with no token overlap will return a negligible score.

Implementation Highlights in headroom/relevance/bm25.py

Key details from the source code include:

  • Deterministic matched terms: The scorer surfaces exactly which tokens contributed to the score.
  • Long-token boost: Tokens of eight or more characters receive an additional weight, helping exact identifiers stand out.
  • Speed: Typical scoring completes in less than 1 ms.

Because BM25Scorer relies solely on surface-form tokens, it cannot bridge synonyms. For example, a query containing "error" will not semantically match a document containing "failure" unless that exact token appears.

How EmbeddingScorer Implements Semantic Relevance Filtering

Dense Vector Similarity with fastembed

EmbeddingScorer moves beyond exact tokens by encoding both the stored item and the query into dense vectors using a fastembed ONNX model. It then computes cosine similarity between the two embeddings and clamps the result to [0, 1] so that scores remain comparable with BM25 output.

This semantic approach excels when queries paraphrase or generalize the stored text. For instance, a query like "show me the errors" will score highly against an item whose status field reads "failed", even though no words are shared. According to the source in headroom/relevance/embedding.py, the implementation uses the same model as the Rust version, guaranteeing byte-for-byte parity across language bindings.

Caching and Lazy Imports in headroom/relevance/embedding.py

The embedding scorer is built to keep the core package lightweight:

  • Lazy imports: numpy and fastembed are imported only when EmbeddingScorer is instantiated.
  • Embedding cache: Vectors are cached per scorer instance, so repeated calls against the same document set avoid redundant model passes.
  • Model footprint: The ONNX model downloads once (≈ 30 MB) when the scorer is first created.

Scoring latency is slightly higher—measured in a few milliseconds—because of model loading and inference overhead. Install the optional dependencies with:

pip install headroom[relevance]

Head-to-Head: BM25Scorer vs EmbeddingScorer

When deciding which scorer to use for relevance filtering in Headroom, match the tool to the text distribution and query style:

  • BM25Scorer: Best for large corpora dominated by exact identifiers, log lines, trace IDs, or structured fields where token overlap is meaningful. Fully deterministic, explainable, and dependency-free.
  • EmbeddingScorer: Best for short-text queries or natural-language questions where keyword overlap is sparse but semantic meaning is shared. Captures synonymy and paraphrase at the cost of an ONNX model download.

Both classes expose the identical public API: score(item, context) returns a RelevanceScore object containing a numeric score field and a human-readable reason field. This symmetry allows you to substitute one for the other without rewriting downstream logic.

Combining Both with HybridScorer

When your workload needs both exact-match precision and semantic flexibility, Headroom provides HybridScorer in headroom/relevance/hybrid.py. It accepts an adaptive weight (α) that blends BM25 and embedding scores into a single relevance value. You can tune α toward 0 for keyword-heavy pipelines or toward 1 for semantic search tasks.

All three scorers are exported from headroom/relevance/__init__.py, so you can import whichever strategy fits the current filtering stage.

Practical Code Example

The snippet below demonstrates how the same item and two distinct queries behave under each scorer:

from headroom.relevance import BM25Scorer, EmbeddingScorer

item = '{"id": "550e8400-e29b-41d4-a716-446655440000", "status": "failed"}'

# Query 1: exact UUID present

query_uuid = "find record 550e8400-e29b-41d4-a716-446655440000"

# Query 2: semantic query with no token overlap

query_semantic = "show me the errors"

# BM25Scorer: zero-dependency, keyword-based

bm25 = BM25Scorer()
print(bm25.score(item, query_uuid))      # → high score; reason includes the UUID

print(bm25.score(item, query_semantic))  # → low score; only sees "failed"

# EmbeddingScorer: requires fastembed

embed = EmbeddingScorer()
print(embed.score(item, query_uuid))      # → moderate semantic similarity

print(embed.score(item, query_semantic))  # → higher score; captures "failed" ↔ "errors"

Running this produces RelevanceScore instances from both scorers. The BM25 result for the UUID query is typically close to 1.0 thanks to the long exact token, whereas the embedding result gives a respectable similarity for the semantic query that BM25 cannot capture.

Summary

  • BM25Scorer (headroom/relevance/bm25.py) provides sub-millisecond, deterministic keyword scoring with no external dependencies, ideal for UUIDs, numeric IDs, and exact token matches.
  • EmbeddingScorer (headroom/relevance/embedding.py) uses fastembed ONNX dense vectors to compute cosine similarity, enabling synonym and paraphrase detection at the cost of a ~30 MB model and slightly higher latency.
  • Both implement score(item, context) returning a normalized [0, 1] RelevanceScore, so they are interchangeable.
  • HybridScorer (headroom/relevance/hybrid.py) lets you merge both signals with an adaptive weight when exact and semantic relevance are both required.

Frequently Asked Questions

What is the main difference between BM25Scorer and EmbeddingScorer?

BM25Scorer ranks items by exact token frequency using the classical BM25 algorithm, while EmbeddingScorer ranks them by dense vector cosine similarity through a neural embedding model. BM25 is faster and deterministic; EmbeddingScorer understands synonyms and paraphrases.

Do I need to install extra dependencies to use EmbeddingScorer?

Yes. EmbeddingScorer requires the optional fastembed and numpy packages. Install them with pip install headroom[relevance]. BM25Scorer has no external dependencies beyond the core library.

Can I use BM25Scorer and EmbeddingScorer together?

Absolutely. Import HybridScorer from headroom/relevance to combine both scorers with an adaptive α weight. This lets you tune the balance between exact keyword matching and semantic similarity within a single relevance filtering pipeline.

Which scorer is faster for large-scale relevance filtering?

BM25Scorer is significantly faster, typically completing in less than one millisecond per call. EmbeddingScorer incurs a few milliseconds of overhead due to ONNX model inference and embedding computation, though it caches vectors per instance to amortize repeated work.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →