Vector search for mortals: a builder's checklist

Pure-vector search works in demos and crumbles in production. The systems that ship are hybrid: lexical for precision, dense for recall, a reranker to settle the argument.

Chunking is your most underrated lever

If you're chunking by token count alone, you're throwing away structure. Code wants AST-aware splits. Markdown wants heading-aware splits. Tickets and chat want thread-aware splits. The retriever can only return what your chunker preserved.

The four reranker tradeoffs

Latency: cross-encoders are accurate and slow. Budget your candidate set accordingly.
Cost: rerank-as-a-service can quietly dominate your bill at scale.
Drift: reranker quality drifts with corpus drift. Schedule re-evals.
Explainability: hybrid scores are easier to explain than a black-box reranker. Sometimes that matters more than recall.

Chunking is your most underrated lever

The four reranker tradeoffs

Scaling microservices to one million users without losing your mind

Why I switched from REST to GraphQL Federation (and when I didn't)