In-app copilots: how to embed AI in your product without a sidebar chatbot
A practical guide to embedded copilots: context from product state, server-side assembly, RBAC, and UI patterns that fit existing workflows instead of a floating chat widget.
The fastest way to "add AI" to a SaaS product is a floating chat widget in the corner. Users type a question; the model answers from whatever context the frontend could scrape together. Demo in a week.
That pattern breaks in production for predictable reasons: users re-explain what is already on screen, the model sees data the user should not access, support cannot reconstruct what context was sent, and product teams discover the copilot answers questions that were never in scope for the workflow.
An in-app copilot is a different shape. It is assistive AI embedded in the view the user is already working in (ticket detail, CRM record, admin console, onboarding step), with context assembled server-side from product state, permissions, and tenant boundaries. Not a detached sidebar that pretends to know your product.
Copilot vs. chat widget vs. agent
These terms get used interchangeably in sales decks. In integration work they mean different things.
| Pattern | What the user sees | What the system does |
|---|---|---|
| Floating chat widget | Generic chat UI, any page | User types; client sends text + ad hoc context |
| In-app copilot | Assist panel, inline draft, or command on a specific view | Server assembles context from route, selection, and permitted APIs |
| Agent | Often similar UI, but multi-step | Model selects tools, calls APIs in sequence, handles intermediate results |
Many products start with a copilot on one high-value screen and add agent capabilities later, once middleware, tool boundaries, and evals exist. See Build an agent with LangChain for the multi-step orchestration pattern; this guide focuses on the copilot foundation.
A chat widget is a UI choice. A copilot is an integration pattern: context assembly, auth, and workflow scope defined before the first prompt ships.
Why sidebar chatbots fail production review
Browser scrapes DOM → sends blob to model → streams reply into widgetThis path fails for reasons that have nothing to do with model quality:
- Context the user already has (ticket subject, account name, form fields) gets re-typed or omitted
- Context the user should not have (another tenant's data, fields hidden by RBAC) can leak if the client assembles prompts unsupervised
- No workflow scope: "ask anything" means vague eval criteria, out-of-scope answers, and no refusal path when context is missing
- No audit trail: support cannot see what was retrieved or suggested when a customer reports a bad answer
- No action boundaries: suggestions that touch data have no confirmation gate tied to your existing auth layer
Production copilots invert the flow: the client sends intent and entity IDs, the server loads scoped context, middleware calls the model, and the UI renders streaming output in place.
Client UI
Copilot, search, actions
Your API
Existing auth session
LLM middleware
Auth, rate limits, logging
Model provider
OpenAI, Anthropic, etc.
Every model call passes through your stack — not around it.
Context assembly: the core of an in-app copilot
The hardest part of a copilot is not the prompt. It is deciding what goes into the request and enforcing that decision on every call.
What the client should send
Keep the browser payload small and explicit:
- Route or view identifier:
ticket-detail,customer-360,invoice-edit - Entity IDs: ticket ID, account ID, selected row
- User intent: "summarize thread", "draft reply", "suggest next step"
- Optional UI state: active tab, selected text, filter summary (not raw database dumps)
The client should not send privileged record bodies assembled from hidden fields, cached API responses the user should not see, or "everything we could find on the page."
What the server should assemble
A context builder runs after auth, before the model call:
- Validate session, tenant, and role
- Fetch permitted entities from your databases and APIs
- Format fields into a stable prompt structure: labels, truncation rules, PII handling
- Attach system instructions scoped to this workflow
- Call middleware → model → post-process (citations, schema validation, tool calls)
// lib/copilot/context.ts (illustrative)
type CopilotRequest = {
view: "ticket-detail";
ticketId: string;
intent: "summarize" | "draft-reply" | "suggest-next-step";
};
export async function buildTicketCopilotContext(
req: CopilotRequest,
session: Session,
) {
const ticket = await getTicketForUser(session, req.ticketId);
if (!ticket) throw new NotFoundError();
const thread = await getThreadForUser(session, req.ticketId, {
maxMessages: 40,
redactInternalNotes: !session.roles.includes("internal"),
});
return {
system: ticketCopilotSystemPrompt(req.intent),
messages: [
{
role: "user" as const,
content: formatTicketContext({ ticket, thread, intent: req.intent }),
},
],
};
}The same pattern applies across views: one builder per workflow boundary, shared middleware underneath.
Context assembly is not RAG
If the data is already loaded for the screen (the ticket thread, the record on the page, the dashboard the user is viewing), pass it in the prompt. That is context assembly, not retrieval over a document corpus.
Add RAG only when the model needs knowledge not already in the request (support docs, policy PDFs, a large changing corpus), and only after simpler paths fail evals. See When not to use RAG for the decision framework; many copilots ship without vector search on day one.
Best when: Known queries, tabular data
Best when: Docs + metadata search
Best when: Semantic match at scale
Start left. Move right when structured retrieval stops working — not before.
UI patterns that fit the product
The goal is assistive AI that feels native, not a third-party chat product dropped on top of yours.
Inline assist on the current task
Draft reply, summarize thread, explain this field: triggered from the control the user already reached for. Output appears in the textarea, summary block, or tooltip, not in a separate conversation pane.
Best when: single-turn generation, high frequency, obvious user action.
Contextual panel beside the record
A side panel on ticket detail, deal view, or admin record that streams answers about what is on screen. The panel reads route and selection from product state; it does not start blank.
Best when: multi-turn follow-ups within one entity, longer outputs, optional tool calls.
Command surface / palette
Keyboard-driven actions scoped to the current view ("summarize", "extract action items", "draft customer update"), with results applied to the focused field or shown in a transient panel.
Best when: power users, dense admin UIs, many discrete assist actions on one screen.
What to avoid
- Iframe to a generic model UI: no connection to tenant boundaries or product APIs
- Global "Ask AI" with no route context: vague scope, impossible evals, support nightmares
- Client-side prompt construction from the full DOM: brittle, over-broad, bypasses server auth
Match the pattern to one workflow first. Expand to additional views after middleware, logging, and eval baselines exist, not by cloning the widget to every page.
Architecture: middleware first, copilot second
Every copilot request should follow the same path as your other AI features:
- Authenticated API route:
POST /api/copilot/assistor view-specific routes - Context builder: tenant-scoped fetch and formatting
- LLM middleware: rate limits, model routing, logging, streaming
- Model provider
- Post-processing: trim, cite, validate schema, queue tool calls
See LLM middleware explained for the full layer breakdown. Copilots are often the first workflow-bound feature on top of middleware, not a reason to skip it.
Streaming without exposing secrets
The UI subscribes to a server stream (SSE or fetch streaming) and renders tokens into the assist surface. API keys and raw context never leave your backend. Timeouts and partial results should fail cleanly; an infinite spinner on a draft button erodes trust faster than a honest retry message.
Optional tools without becoming an agent on day one
Some copilots need one or two tools (fetch live account status, look up policy version, check shipment state) without full multi-step agent orchestration. That is fine. Keep the tool surface narrow, re-check permissions on every invocation, and log inputs and outcomes.
When workflows grow to multi-step sequences with branching, you are moving into agent territory. The security bar is the same; the orchestration layer gets thicker. See Prompt injection and LLM security for SaaS for tool sandboxing and confirmation patterns.
Permissions, confirmation, and audit
A copilot must respect the same RBAC as the rest of your product. If a user cannot view billing notes in the UI, the context builder must not include them in the prompt, even if the model might infer from other fields.
For suggestions that mutate data (send reply, update field, change status), treat the copilot as a draft assistant, not an autonomous actor:
- Model output prefills a field; user edits and submits through normal form actions
- Destructive or external actions require explicit confirmation with the same authorization checks as manual clicks
- Audit log: who asked, what context was loaded, what was suggested, what was accepted
Common mistakes
| Mistake | What goes wrong | Better path |
|---|---|---|
| Widget first, architecture never | Demo ships; security review blocks GA | Middleware + context builder before UI polish |
| Send the whole record | Token bloat, PII leakage, stale nested data | Fetch only fields the workflow needs; truncate with rules |
| RAG by default | Latency and ops for data already in the ticket API | Context assembly; add retrieval when evals prove gap |
| No eval scope | "It feels worse after the prompt change" | Golden set per workflow: summarize, draft, refuse out-of-scope |
| Same copilot everywhere | One prompt tries to serve admin, end-user, and support | One builder per view; shared middleware underneath |
| Skip confirmation on sends | Model-suggested email goes out with one click | Draft → user review → existing send path with auth |
Rollout order that survives real traffic
- Middleware route: auth, rate limit, logging, one model, streaming
- One view, one intent: e.g. summarize on ticket detail for internal users only
- Eval baseline: golden tickets with expected properties (contains status, refuses missing context)
- Expand intents on the same view: draft reply before adding new routes
- Additional views: reuse middleware; new context builders per workflow
- Retrieval or extra tools: only when metrics show the simpler path is insufficient
Measure quality, cost, and support load at each stage before expanding.
Questions before tenant rollout
- Can support see which context was loaded for a bad summary?
- What is the kill switch (per tenant, per view, global)?
- What does the user see when the provider is rate-limited or down?
- How do you roll back a prompt change without redeploying the whole app?
See What production-ready LLM integration actually means and Eval pipelines for LLM features for the operational checklist and regression gates.
Productionizing an existing copilot POC
A common engagement starts with a working demo: streaming works, stakeholders are excited, and the implementation calls the model from the client with a long system prompt.
The path to production usually looks like:
- Move calls server-side: same UX, middleware owns keys and context
- Replace client-assembled context with builders tied to your auth model
- Scope to one workflow: remove "ask anything" until evals and refusal behavior exist
- Add observability: traces, tokens per action, override/dismiss rates
- Ship behind feature flags: internal → canary tenants → GA
You keep the product vision; the integration work makes it permissioned, observable, and reversible.
Putting it together
An in-app copilot is not a chatbot skin on your app. It is assistive AI bound to a workflow: context from product state, enforcement on the server, UI embedded where the user already works.
If you are planning a copilot, start by naming one view, one user intent, and exactly which APIs may supply context. Draw the request path: auth → context builder → middleware → model → UI. If the arrow goes from browser to OpenAI with a scraped DOM, you have integration work before you scale traffic.
Scoping a copilot for your product? Describe the workflow (view, auth model, and data sources) and we will map context assembly, middleware, and a rollout plan that fits your stack without a sidebar chatbot bolt-on.
Related resources
More on copilot- LLM middleware: what it is, why you need it, and how to implement it
A practical guide to the server-side layer between your app and the model — auth, rate limits, routing, logging, and the patterns that keep AI features production-ready.
- When not to use RAG
RAG is the default answer for every AI feature — but often the wrong one. A decision guide for engineering leaders scoping retrieval, tools, and middleware.
- What production-ready LLM integration actually means
A practical checklist for engineering leaders — beyond the demo and before you call an AI feature shipped.
