· Hernán Pérez Rodal · Engineering  · 3 min read

RAG over 10+ databases: what production taught us

Why vector-only RAG doesn't scale in compliance, how we designed hybrid retrieval across multiple stores, and the architectural decisions that worked in production.

Why vector-only RAG doesn't scale in compliance, how we designed hybrid retrieval across multiple stores, and the architectural decisions that worked in production.

TL;DR: In 2026 the consensus is clear: vector-only RAG doesn’t scale in production. At Darwin we built an agentic compliance system with hybrid retrieval across 10+ databases. The best decisions we made had nothing to do with the model, they were about the data layer.

The problem

Our agentic compliance system has to answer questions like:

“Which mango lots from producer X, processed between May and July, met the CTEs required by FSMA 204, and which ones have evidence gaps?”

A single question that combines:

  • Regulatory knowledge (FSMA 204 CTE/KDE definitions, structured text)
  • Traceability events (CTEs recorded with timestamp, geolocation, supplier, lot)
  • Internal business rules (gap analysis, risk scoring, dynamic logic)
  • Relationships between entities (producer → plant → shipment → retailer)

A single vector store with embeddings of everything mixed together can’t answer that well. The right answer requires joining structured data + semantic retrieval + aggregate computation.

The architecture

Our retrieval stack:

SourceTypeUsage
QdrantVector storeRegulations, doctrine, historical cases (unstructured data)
PostgreSQLRelationalTraceability events (CTE/KDE with timestamps, IDs, geo)
Firebase/FirestoreDocumentPer-customer config, UI state
Cloud StorageBlobOriginal PDFs, audit trails, digital evidence
On-chain (Polygon)ImmutableCritical attestations, digital signatures

The agent doesn’t know where each thing lives, the orchestrator resolves it.

LangGraph as orchestrator

We use LangGraph to route queries in multiple steps:

  1. Classify: what kind of question is it (regulatory / operational / mixed)
  2. Plan: which retrievals are needed (vector + SQL + graph traversal)
  3. Fan-out: execute retrievals in parallel
  4. Synthesize: pass results to the LLM with structured context
  5. Validate: guardrails to prevent hallucinations on numeric data

The key step was #2: giving the LLM a query planner that decides the retrieval strategy before going to fetch. Without that, the model hallucinates data or pulls in irrelevant context.

What didn’t work

Vector-only with aggressive chunking: our first attempt. It failed on two fronts:

  • Counting / aggregation queries (how many lots? weekly average?), the LLM made up numbers when they weren’t explicitly in context
  • Relational joins (producer X + time window Y + certification Z), impossible without a structured query

The fix wasn’t “better chunking”, it was separating semantic retrieval from structured queries.

What did work

Explicit query routing → the planner decides whether a question requires vector search, SQL, graph traversal, or a mix.

Numeric guardrails → if the LLM’s answer contains numbers, we verify they match what the structured query returned. If not, fail fast instead of returning wrong data.

Semantic caching at the similar-questions level → cuts LLM costs by ~40% without impact on quality.

Full-trace observability with OpenTelemetry → every query is tracked end-to-end (planner → retrieval → LLM → guardrails). Critical for debugging.

Lessons learned

  1. The bottleneck of RAG in production isn’t retrieval, it’s deciding which retrieval to use
  2. Numeric guardrails save lives when the correctness of an answer drives regulatory decisions
  3. LangGraph beats linear chains for orchestrating conditional retrievals
  4. Multi-store + planner > single vector store with better chunking
  5. LLMs will hallucinate on structured aggregations: no matter how good the model is

What’s next?

The next iteration is to replace some planner rules with router fine-tuning using real production examples. The planner as an LLM is flexible but expensive, caching its decisions into a smaller model is a logical step.

If you’re building RAG for regulated domains, my advice is: start with the query planner, not the vector store.


Are you building something similar? Let’s talk, we’re open to sharing architecture and learning from other cases.

Compartir:
Back to Blog

Related Posts

View All Posts »