Back to Blog

Why Building an In-House Copilot Takes Quarters (And How to Ship in Days)

The hidden complexity of auth, tenant isolation, and execution layers. The "30 small things" that kill velocity.

T

Touchstage Team

Jan 15, 2024·8 min read

It’s a story we hear every week. A brilliant engineering team decides to build an AI copilot for their SaaS product. "It's just an LLM wrapper," they say. "We'll hook up OpenAI to our API, add a chat UI, and ship it next sprint."

Three months later, they are still in "beta." The copilot works great on the CTO's laptop but fails in production. It hallucinates parameters. It times out on long-running tasks. And worst of all, nobody is quite sure if it's 100% secure.

It’s not “calling OpenAI” that takes time. That’s 2 days. The rest is all the boring, ugly infrastructure nobody scopes properly until they're knee-deep in it. This is the Hidden Complexity Trap of agentic AI.

The 30 "Small" Things That Kill Velocity

When teams scope an in-house copilot, they usually account for the "happy path": User asks question -> LLM interprets -> API call -> Success. But production software is rarely about the happy path. It's about the edge cases, the security boundaries, and the reliability at scale.

1. Tooling & Auth: The Nightmare of Credentials

Defining tools isn't just about writing JSON schemas for OpenAI functions. You have to map your internal APIs to safe, structured actions with strict validations and timeouts.

Then comes the authentication problem:

  • Identity Propagation: How does the AI execute actions as the user? Do you pass the user's JWT to the model context? (Don't do that). Do you create a service account?
  • Permission Scoping: If a user asks to "delete all files," and the AI calls delete_files(), does your API know which files this specific user is allowed to delete? Or is the AI running with a god-mode token?
  • Token Management: How do you handle token rotation, revocation, and refresh tokens in a long-running agent session?

We've seen teams accidentally leak root API keys into prompt contexts or give their agent permission to modify global settings because they used a shared service account. These are "oh shit" moments that stop a project dead in its tracks.

2. Multi-Tenant Isolation is Harder Than It Looks

SaaS is multi-tenant by definition. Your copilot must be, too. Ensuring Tenant A’s copilot never, ever pulls data from Tenant B seems obvious, but in a RAG (Retrieval Augmented Generation) system, it's complex.

  • Vector Database Partitioning: You can't just dump all docs into one index. You need strict metadata filtering on every query.
  • Resource Scoping: It's not just tenant-level. Can this specific user in Tenant A see Project X? If not, the copilot shouldn't answer questions about it.
  • Role Mapping: You need a way to translate your RBAC (Role-Based Access Control) into something the model understands. If a user asks "Create an invoice," the model needs to know if they have the billing.write permission before it attempts the call.

3. The Execution Layer: Where "Chat" Meets Reality

Chatbots just talk. Agents do. And "doing" things in a distributed system is messy.

Orchestrating multi-step workflows requires a robust state machine:

  • Sequential vs. Parallel: If a user says "Invite John and create a project for him," can those happen in parallel? Or does the project creation depend on John's user ID?
  • Idempotency: If the "Create Project" step times out, but the API actually succeeded, what happens when the agent retries? Do you get two projects?
  • Compensating Actions: If step 3 fails after steps 1 & 2 succeeded, do you leave the system in a broken state? Or do you have a rollback mechanism?

Building a reliable execution engine that handles retries, backoff, and timeouts gracefully is a project in itself. Most teams underestimate this until users start complaining about broken workflows.

4. Observability: The "Black Box" Problem

When a user reports "the AI didn't work," how do you debug? In a traditional app, you look at the stack trace. In an AI app, you have a probabilistic black box.

You need a specialized logging framework that captures:

  • The exact user prompt.
  • The context retrieved from RAG.
  • The model's "thought process" (reasoning trace).
  • The specific tool calls and their arguments.
  • The raw API response from your backend.

Without this, you are flying blind. "It works on staging but not prod" becomes the standard state of your feature.

How Touchstage Bypasses The Infrastructure Tax

At Touchstage, our philosophy is simple: Don't build the plumbing. Build the product.

We provide the entire "copilot stack" as a managed service. You plug in your existing APIs (via OpenAPI spec or Postman) and your documentation. We handle the rest.

The Touchstage Architecture

  1. The Surface: We give you a ready-made, themeable UI (chat dock, sidebar, or headless SDK) so you don't have to reinvent the chat interface.
  2. The Capability Builder: Instead of writing python scripts for every tool, you define "Capabilities" in our UI. A capability maps a user intent (e.g., "Refund User") to an API endpoint, with built-in parameter extraction and validation.
  3. The Safety Layer: We baked policy and auth into the core. You define "Fail-Closed" policies (e.g., "Refunds > $50 require approval") and we enforce them before the API call is ever made.

The "Aha" Moment

The moment developers realize the value is usually when they see the playback logs for the first time. They see a structured trace:

User Intent: "Upgrade John to Pro plan"
Touchstage Execution:
1. Mapped intent → update_subscription Capability
2. Fetched current subscription state (Context)
3. Checked Policy: "User is Admin? Yes."
4. Called PUT /subscriptions/{id} with payload { plan: "pro" }
5. Logged success event.

And they realize: "Oh, I didn’t have to build any of that orchestration, state management, or logging framework." They just connected their API spec, and suddenly they had a working, safe, observable agent.

Building a copilot is easy. Building a production-grade copilot is hard. Don't spend your precious engineering quarters building infrastructure that isn't your core business. Ship in days with Touchstage.

ProductSaaSAI
T

Written by Touchstage Team

Building the future of agentic experiences at Touchstage. Sharing insights on product, engineering, and the AI revolution.

More from Touchstage

Killing the Policy-Bound Ticket

Read more · 5 min read

The Death of the 'Chatbot'

Read more · 5 min read

Building Multi-Step Workflows

Read more · 5 min read

Ready to ship your first agent?

Turn your documentation into a production-grade copilot in days, not quarters.

Get Started
Touchstage LogoTouchstage
All Systems Operational
Alkolumi Software PVT LTDBuilt within Switzerland🇨🇭