resham_acharya.exe000%
Compiling experience...
All Essays
AI/MLBackendDec 2, 202514 min read

Hybrid retrieval, structure-aware chunking, and the four reranker tradeoffs nobody warned you about — distilled from shipping a docs search system used by 400 engineers.

Pure-vector search works in demos and crumbles in production. The systems that ship are hybrid: lexical for precision, dense for recall, a reranker to settle the argument.

Chunking is your most underrated lever

If you're chunking by token count alone, you're throwing away structure. Code wants AST-aware splits. Markdown wants heading-aware splits. Tickets and chat want thread-aware splits. The retriever can only return what your chunker preserved.

The four reranker tradeoffs

  • Latency: cross-encoders are accurate and slow. Budget your candidate set accordingly.
  • Cost: rerank-as-a-service can quietly dominate your bill at scale.
  • Drift: reranker quality drifts with corpus drift. Schedule re-evals.
  • Explainability: hybrid scores are easier to explain than a black-box reranker. Sometimes that matters more than recall.