Why Building an In-House Copilot Takes Quarters (And How to Ship in Days)
The hidden complexity of auth, tenant isolation, and execution layers. The "30 small things" that kill velocity.
Touchstage Team
The hidden complexity of auth, tenant isolation, and execution layers. The "30 small things" that kill velocity.
Touchstage Team
It’s a story we hear every week. A brilliant engineering team decides to build an AI copilot for their SaaS product. "It's just an LLM wrapper," they say. "We'll hook up OpenAI to our API, add a chat UI, and ship it next sprint."
Three months later, they are still in "beta." The copilot works great on the CTO's laptop but fails in production. It hallucinates parameters. It times out on long-running tasks. And worst of all, nobody is quite sure if it's 100% secure.
It’s not “calling OpenAI” that takes time. That’s 2 days. The rest is all the boring, ugly infrastructure nobody scopes properly until they're knee-deep in it. This is the Hidden Complexity Trap of agentic AI.
When teams scope an in-house copilot, they usually account for the "happy path": User asks question -> LLM interprets -> API call -> Success. But production software is rarely about the happy path. It's about the edge cases, the security boundaries, and the reliability at scale.
Defining tools isn't just about writing JSON schemas for OpenAI functions. You have to map your internal APIs to safe, structured actions with strict validations and timeouts.
Then comes the authentication problem:
delete_files(), does your API know which files this specific user is allowed to delete? Or is the AI running with a god-mode token?We've seen teams accidentally leak root API keys into prompt contexts or give their agent permission to modify global settings because they used a shared service account. These are "oh shit" moments that stop a project dead in its tracks.
SaaS is multi-tenant by definition. Your copilot must be, too. Ensuring Tenant A’s copilot never, ever pulls data from Tenant B seems obvious, but in a RAG (Retrieval Augmented Generation) system, it's complex.
billing.write permission before it attempts the call.Chatbots just talk. Agents do. And "doing" things in a distributed system is messy.
Orchestrating multi-step workflows requires a robust state machine:
Building a reliable execution engine that handles retries, backoff, and timeouts gracefully is a project in itself. Most teams underestimate this until users start complaining about broken workflows.
When a user reports "the AI didn't work," how do you debug? In a traditional app, you look at the stack trace. In an AI app, you have a probabilistic black box.
You need a specialized logging framework that captures:
Without this, you are flying blind. "It works on staging but not prod" becomes the standard state of your feature.
At Touchstage, our philosophy is simple: Don't build the plumbing. Build the product.
We provide the entire "copilot stack" as a managed service. You plug in your existing APIs (via OpenAPI spec or Postman) and your documentation. We handle the rest.
The moment developers realize the value is usually when they see the playback logs for the first time. They see a structured trace:
User Intent: "Upgrade John to Pro plan"
Touchstage Execution:
1. Mapped intent →update_subscriptionCapability
2. Fetched current subscription state (Context)
3. Checked Policy: "User is Admin? Yes."
4. CalledPUT /subscriptions/{id}with payload{ plan: "pro" }
5. Logged success event.
And they realize: "Oh, I didn’t have to build any of that orchestration, state management, or logging framework." They just connected their API spec, and suddenly they had a working, safe, observable agent.
Building a copilot is easy. Building a production-grade copilot is hard. Don't spend your precious engineering quarters building infrastructure that isn't your core business. Ship in days with Touchstage.
Building the future of agentic experiences at Touchstage. Sharing insights on product, engineering, and the AI revolution.
Read more · 5 min read
Read more · 5 min read
Read more · 5 min read
Turn your documentation into a production-grade copilot in days, not quarters.
Get Started