ArticleMay 28, 2026Updated May 31, 2026

RAG without the platform rewrite

How to add retrieval over your existing data without standing up a separate vector platform or pausing the product roadmap.

Topics:rag architecture

Retrieval-augmented generation (RAG) is often sold as a new platform decision: pick a vector database, build an ingestion pipeline, deploy a separate search service, then wire a chat UI on top.

For most product teams, that is the wrong framing. You already have databases, APIs, search indexes, and authorization. RAG should plug into those boundaries — not replace them.

Integrated RAG vs. separate platform

Separate platform (common pitch)

1.New vector DB

2.Ingestion pipeline

3.Separate search API

4.Detached chat UI

Integrated path (recommended)

Your app + auth

Existing session

Retrieval middleware

SQL, APIs, search you own

LLM + citations in UI

Embedded in product views

The integrated path reuses auth, data access, and deployment you already operate.

Why the separate-platform pitch is tempting

Vendors bundle vector storage, chunking, and a chat widget because it is easy to demo. For a greenfield project, that can be fine. For an existing product with paying customers, it creates problems:

Duplicate auth — a sidecar search service does not know your tenant model
Stale data — another pipeline to keep in sync with your source of truth
Detached UX — users live in your app; a floating chat widget fights your workflow design
Ops overhead — another system to monitor, secure, and on-call for

Integrated RAG treats retrieval as middleware in your application — same deployment, same identity, same observability.

Not every feature needs retrieval at all. See When not to use RAG for a decision guide before you pick a vector database.

Start from the user workflow

Before choosing Pinecone, pgvector, or Elasticsearch, define the feature in product terms:

What question is the user trying to answer?
What data do they already have permission to see?
Where in the UI does the answer need to appear — inline, sidebar, modal, or action suggestion?

The retrieval layer should assemble context from sources your app already trusts: Postgres rows, document metadata, CRM records, ticket history, internal APIs — scoped per user and tenant.

A concrete example

A support copilot embedded in a ticket view should retrieve: the current ticket thread, the customer's plan tier, relevant help articles, and recent similar resolved tickets. It should not search the entire knowledge base without tenant filters or return answers without citations your agents can verify.

Middleware owns retrieval

Retrieval belongs on the server — after authentication, before the model call. A typical flow:

RAG retrieval flow (server-side)

1. Authenticate

Session / JWT

2. Fetch

DB, APIs, docs

3. Rank & trim

Fit context window

4. Prompt + call

With citations

5. Render

Answer + sources in UI

Retrieval runs after auth — never trust the client to assemble context.

The middleware layer should:

Authenticate the request using your existing session or token
Fetch candidate context from stores the user can access
Rank and trim to fit the model's context window — quality over quantity
Attach citations the UI can render — source IDs, links, or snippets
Log what was retrieved for debugging bad answers

Keeping retrieval server-side prevents clients from bypassing permission checks, makes caching straightforward, and gives support a trail when something goes wrong.

You do not need a greenfield vector stack on day one

Vector search helps at scale, especially for semantic matching over large unstructured corpora. Many integrations start simpler:

Structured retrieval — SQL with filters (tenant_id, status, date range)
API composition — aggregate context from services you already call
Full-text search — Elasticsearch, Postgres tsvector, or your existing search product
Hybrid — metadata filters plus keyword search before adding embeddings

Retrieval strategy spectrum

StructuredEffort: Low · SQL filters, API lookups

Best when: Known queries, tabular data

HybridEffort: Medium · Full-text + filters

Best when: Docs + metadata search

VectorEffort: Higher · Embeddings + rerank

Best when: Semantic match at scale

Start left. Move right when structured retrieval stops working — not before.

When to add embeddings

Consider vectors when:

Users ask questions that do not match document titles or keywords
Your corpus is large enough that brute-force fetch is too slow or expensive
You have eval data showing structured retrieval misses too often

Defer vectors when:

Most queries map to known entities (accounts, orders, projects)
Your content is already well-structured with metadata
Team bandwidth is limited — embeddings add indexing, re-embedding on change, and reranking complexity

Citations are not optional

For B2B products, "the AI said so" is not acceptable. Citations build trust, help users verify answers, and give support a starting point for escalations.

Good citation UX:

Links or IDs back to source records in your product
Snippets that match what was actually sent to the model
Clear distinction when no relevant context was found — refuse or ask clarifying questions instead of guessing

Common failure modes

Failure	Symptom	Mitigation
Wrong tenant context	Cross-customer data leakage	Enforce tenant filter at fetch time, never in prompt alone
Stale documents	Outdated policy answers	Tie retrieval to source version; surface "last updated" in UI
Over-retrieval	Slow responses, high cost	Rank aggressively; cap chunks per source
Under-retrieval	Hallucinated fill-in	Eval retrieval hit rate; expand sources incrementally

Ship a thin vertical slice

The biggest mistake is boiling the ocean: index every document, support every question type, launch a standalone chat. Instead:

Thin vertical slice rollout

Week 1–2

One workflow

Define user question
Pick one data source
Server retrieval

Week 3–4

Harden

Logging & evals
Citation UI
Feature flag

Week 5+

Expand

More sources
Hybrid search
Vectors if needed

Ship one end-to-end path before adding data sources or infrastructure.

Pick one workflow, one primary data source, one UI surface. Get it behind a feature flag with logging and evals. Measure answer quality and latency with real users. Then expand retrieval sources and add semantic search only when the data proves you need it.

Eval questions for your first slice

Does the answer cite the right source 80%+ of the time on a golden set?
What happens when no relevant context exists?
What is p95 latency end-to-end — retrieval plus generation?
What does it cost per successful resolution at current traffic?

Operating RAG in production

RAG systems decay as content changes. Plan for:

Re-indexing or refresh when source documents update
Retrieval regression tests when you add new data sources
Dashboards for retrieval latency, chunk count, and empty-result rate
Feedback loops — thumbs down should tag the retrieval set for review

This is ongoing product operations, not a one-time integration project.

The integration mindset

RAG without the platform rewrite means: use your auth, your data access patterns, your deployment pipeline, and your UI. Add retrieval middleware and citations. Grow complexity only when measured need appears.

Want help scoping RAG for your stack? Get in touch with your auth model, data sources, and target workflow — we will map a thin-slice plan you can ship without pausing the roadmap.

Browse all resourcesMore on rag