MCP / AI Agents ·Feb 2026 ·14 min read ·3.2K reads

Building Production MCP Servers:
Lessons from Intuit

Real-world lessons from architecting and shipping MCP-based agentic AI workflows at scale — pitfalls, patterns, and performance tips from the frontlines at Intuit.

BG
Balachandraiah Gajwala
Senior Software Engineer (SDE3) · Intuit, Bengaluru
30%
Manual effort saved
5+
MCP servers shipped
100M+
Users impacted

Why MCP Changes Everything

When Anthropic released the Model Context Protocol (MCP) spec in late 2024, it quietly shifted how we think about AI integration in production systems. Instead of ad-hoc API wrappers, MCP gives you a structured, typed interface between an AI model and the tools it can use. At Intuit, I was tasked with being one of the first engineers to take this from proof-of-concept to production — and it taught me a lot.

MCP is an open standard that lets AI models (like Claude) securely interact with external tools, databases, and APIs through a well-defined server interface — reducing hallucination and increasing reliability of tool calls significantly in production.

The Architecture We Chose

Our MCP server for QuickBooks content generation sits between the AI model and a set of internal CMS APIs. Here's the simplified flow we settled on after several iterations of design and real-world testing:

// mcp-server/src/index.ts
import { MCPServer, Tool } from '@anthropic/mcp-sdk';

const server = new MCPServer({
  name: 'quickbooks-content-agent',
  version: '1.0.0',
});

server.addTool('generate_content', {
  description: 'Generate QB help article from a topic',
  inputSchema: {
    type: 'object',
    properties: {
      topic:    { type: 'string' },
      audience: { type: 'string' },
      tone:     { type: 'string', enum: ['formal', 'conversational'] }
    },
    required: ['topic', 'audience']
  },
  handler: async (input) => {
    const draft = await callInternalCMS(input);
    return { content: draft };
  }
});

server.listen(3000);
console.log('MCP Server running on port 3000');

Lesson 1 — Schema Validation is Your Safety Net

The single biggest reliability improvement came from strict JSON Schema validation on every tool input and output. Early in development, the AI would occasionally pass malformed inputs that crashed our downstream CMS APIs. Adding Zod validation at the MCP boundary reduced these errors to zero overnight.

import { z } from 'zod';

const GenerateInputSchema = z.object({
  topic:    z.string().min(3).max(200),
  audience: z.enum(['small-business', 'accountant', 'enterprise']),
  tone:     z.enum(['formal', 'conversational'])
              .default('conversational'),
});

// Validate before passing to handler
const parsed = GenerateInputSchema.safeParse(rawInput);
if (!parsed.success) {
  throw new MCPToolError('Invalid input', parsed.error);
}

Lesson 2 — Rate Limiting & Cost Controls are Non-Negotiable

AI agents can be surprisingly aggressive about calling tools in loops. Without rate limiting, a single runaway agent session cost us $80 in LLM API calls during QA testing. We added a token-bucket rate limiter per session and a hard cap on tool call depth.

Always set max_iterations and per-session cost caps on your agentic workflows. An agent retrying a failed tool 100 times will burn through your API budget fast — we learned this the expensive way.

Our Rate Limiting Strategy

Lesson 3 — Observability First, Always

Traditional request logging doesn't cut it for agentic workflows. We built a custom trace logger that records the full tool call chain per session — input, output, latency, token count, and model version. This was invaluable for debugging non-deterministic failures that happened only in production.

// trace-logger.ts — Full session observability
interface ToolCallTrace {
  sessionId:  string;
  toolName:   string;
  input:      unknown;
  output:     unknown;
  latencyMs:  number;
  tokenCount: number;
  model:      string;
  timestamp:  string;
  error?:     string;
}

export async function withTrace<T>(
  meta: Omit<ToolCallTrace, 'latencyMs' | 'timestamp'>,
  fn: () => Promise<T>
): Promise<T> {
  const start = Date.now();
  try {
    const result = await fn();
    await logTrace({ ...meta, latencyMs: Date.now() - start,
      timestamp: new Date().toISOString() });
    return result;
  } catch (err) {
    await logTrace({ ...meta, error: String(err),
      latencyMs: Date.now() - start,
      timestamp: new Date().toISOString() });
    throw err;
  }
}

Lesson 4 — Graceful Degradation Saves You

Our MCP server sits in the critical path for content authoring. When the AI model is unavailable or rate-limited, we fall back to a template-based generation system so authors are never blocked. Never make an AI agent a single point of failure in a production workflow.

The fallback strategy reduced our P0 incidents from "AI is down, all authors blocked" to zero. Always design for degraded-mode operation from day one.

Lesson 5 — Prompt Versioning is Infrastructure

We made the mistake early on of hardcoding prompts inside the MCP handler functions. When we needed to update a prompt, it required a full deployment. We moved all prompts to a versioned config store — now prompt updates are zero-downtime config changes, and we can A/B test prompt variants with zero code changes.

Results After 6 Months in Production

After 6 months in production, the QuickBooks Content Agent has generated over 4,000 draft articles, reduced average authoring time from 45 minutes to 30 minutes per article (a 33% reduction), and achieved a 94% author satisfaction rate on first drafts — surpassing our original 30% effort reduction target.

The key insight: MCP isn't just a protocol — it's a forcing function for building reliable, observable, and safe AI integrations. Treat your MCP server like production infrastructure, not a hackathon prototype.
MCP Agentic AI Node.js TypeScript Intuit LLM Production