Programming

Semble: Intelligent Code Search That Slashes Token Usage by 98%

2026-05-18 10:37:02

The Problem with Traditional Code Search for AI Agents

When AI coding assistants like Claude Code tackle large codebases, they often rely on grep to locate relevant code. But grep is a blunt instrument—it scans files line by line, consuming massive numbers of tokens and frequently missing the right matches. The result: wasted compute, slower responses, and incomplete context for the agent. Existing alternatives either demand GPU-powered indexing, require API keys, or suffer from poor retrieval quality. Developers need a tool that is fast, accurate, and economical with tokens.

Semble: Intelligent Code Search That Slashes Token Usage by 98%
Source: hnrss.org

Introducing Semble: A Token-Efficient Alternative

Semble is an open-source code search engine built specifically for AI agents. Developed by Stephan and Thomas, it addresses the token waste problem head-on. By combining static Model2Vec embeddings (using their custom model, potion-code-16M) with BM25, fused via Reciprocal Rank Fusion (RRF) and reranked using code-aware signals, Semble achieves state-of-the-art retrieval without any transformers. This means everything runs on CPU, making it accessible and inexpensive.

How It Works

The magic lies in the hybrid approach: static embeddings capture semantic meaning without the overhead of running a transformer model, while BM25 provides traditional keyword matching. RRF blends the two rankings, and a lightweight reranking step fine-tunes results based on code-specific heuristics. The entire pipeline is optimized for speed—typically indexing a repository takes ~250 milliseconds, and each query completes in ~1.5 milliseconds on CPU.

Benchmark Performance: Almost Perfect Accuracy

On a benchmark of approximately 1,250 query/document pairs across 63 repositories and 19 programming languages, Semble delivers remarkable results:

These numbers show that Semble nearly matches the retrieval quality of much heavier transformer models while being dramatically faster and token-efficient.

Key Features

Getting Started

Integrating Semble with Claude Code is a one-liner:

Semble: Intelligent Code Search That Slashes Token Usage by 98%
Source: hnrss.org
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble

For other environments (Cursor, Codex, OpenCode), check the README for detailed instructions.

Why This Matters for AI Agents

Agents work in loops: they ask a question, gather context, then act. Every token spent on grep or reading full files adds latency and cost. By slashing token usage by 98%, Semble allows agents to operate faster, handle larger codebases, and stay within budget. Because it runs on CPU with no external dependencies, it works immediately out of the box—perfect for local, offline, or air-gapped environments.

Conclusion

Semble proves that you don’t need massive transformer models for high-quality code retrieval. Its hybrid approach offers a practical, efficient solution for AI coding tools. Whether you’re building a custom agent or using Claude Code, Semble can dramatically reduce token consumption while maintaining near-perfect retrieval accuracy. Try it today and see the difference.

For more details, including the full benchmark methodology and model weights, visit the Semble repository and the benchmarks page. The static model is available on Hugging Face.

Explore

Breaking: V8 Drops Sea of Nodes – Switches to Turboshaft for JavaScript Performance 6 Key Takeaways from the Axios Supply Chain Attack: How Autonomous AI EDR Stopped the Threat Testing in the Age of AI: Strategies for Verifying Code You Didn't Write April 2026 Linux App Updates: Firefox 150, Kdenlive, VirtualBox Headline a Month of Major Releases HP Z6 G5 A Workstation: A Deep Dive into the Latest Linux-Ready Powerhouse