Hybrid retrieval, structure-aware chunking, and the four reranker tradeoffs nobody warned you about — distilled from shipping a docs search system used by 400 engineers.
Pure-vector search works in demos and crumbles in production. The systems that ship are hybrid: lexical for precision, dense for recall, a reranker to settle the argument.
Chunking is your most underrated lever
If you're chunking by token count alone, you're throwing away structure. Code wants AST-aware splits. Markdown wants heading-aware splits. Tickets and chat want thread-aware splits. The retriever can only return what your chunker preserved.
The four reranker tradeoffs
- Latency: cross-encoders are accurate and slow. Budget your candidate set accordingly.
- Cost: rerank-as-a-service can quietly dominate your bill at scale.
- Drift: reranker quality drifts with corpus drift. Schedule re-evals.
- Explainability: hybrid scores are easier to explain than a black-box reranker. Sometimes that matters more than recall.