Best practices for building a production-grade MCP server

Learn proven patterns for building scalable MCP servers: from tool design and organization to transport protocols and production monitoring.

2025-07-10

By Kashish Hora, Co-founder of MCPcat

Best practices for building a production-grade MCP server cover image

Design tools around use cases, not API calls

MCP servers aren't REST APIs, and building them like traditional APIs is a recipe for frustrated users and confused LLMs. Think about it this way: when someone uses your MCP server, they want to accomplish something specific, not perform a series of technical operations.

The magic happens when you design each tool to map directly to what users actually want to do. Instead of exposing individual API operations, you're creating tools that handle entire workflows. This approach does two beautiful things: it makes LLMs way better at picking the right tool, and it stops users from having to click "Allow" five times just to create a GitHub issue with labels.

Let's look at a real example. Here's the wrong way to design GitHub tools:

// Each tool maps to a single API endpoint
server.tool("github_create_issue", createIssueHandler);
server.tool("github_add_labels", addLabelsHandler);
server.tool("github_assign_user", assignUserHandler);

Now here's the right way:

// One tool handles the entire user workflow
server.tool("create_github_issue", async ({ title, body, labels, assignees }) => {
  const issue = await api.createIssue(title, body);
  if (labels) await api.addLabels(issue.id, labels);
  if (assignees) await api.assignUser(issue.id, assignees);
  
  return `Created issue #${issue.number}: ${title}`;
});

See the difference? With the first approach, creating a labeled issue requires three separate tool calls, three permission prompts, and three chances for something to go wrong. With the second approach, it's one smooth operation that matches how users think about the task.

This philosophy extends to error handling too. When something goes wrong deep in your API calls, resist the urge to bubble up technical error messages. Instead, catch those errors internally and return something humans can actually understand. Nobody wants to see "Error 422: Unprocessable Entity" when what really happened is they tried to assign an issue to someone who isn't a repo collaborator.

The naming game matters here too, since the AI agent deciding what tools to call will go based on the tool's name and description. Call your tools what they do, not what they are. "create_github_issue" beats "github_issue_endpoint" every time. Skip the internal jargon and technical terms.

Remember, you're not building an API client; you're building a tool that helps people get stuff done. And your users aren't humans, they're language models trying to interpret exactly what they read. Design accordingly, and watch your MCP server transform from a technical interface into something genuinely useful.

Start organizing your tools for scale

So your MCP server was cruising along with 5 or 10 tools, and suddenly you've got 20+ tools competing for your AI client's attention.

The good news is that the MCP community has figured out some solid patterns for keeping things organized as you scale.

The simplest solution: namespace your tools

When you're dealing with dozens of tools, grouping them into logical namespaces is your first line of defense. Just use a forward slash to create categories that make sense:

class CategoryBasedMCPServer {
  private toolCategories = {
    "files": ["read", "write", "search", "delete"],
    "database": ["query", "backup", "analyze"],
    "system": ["info", "health", "metrics"]
  };

  registerTools() {
    // Register as "files/read", "database/query", etc.
    Object.entries(this.toolCategories).forEach(([category, tools]) => {
      tools.forEach(tool => {
        this.server.tool(`${category}/${tool}`, ...);
      });
    });
  }
}

This approach works great up to about 30 tools. Your LLM can easily understand that "files/write" is for file operations, and users can mentally organize what's available.

The more scalable solution: dynamic toolset management

When namespaces aren't enough, it's time to get dynamic. This is actually the officially recommended approach in the Model Context Protocol docs.

The idea is simple: instead of registering all tools upfront, you dynamically load only the tools that make sense for the current context.

Github demonstrates this perfectly in their production MCP server:

// GitHub's dynamic tool management
ListAvailableToolsets()  // Returns: ["files", "database", "system"] with enabled status
GetToolsetTools("files") // Returns: ["read", "write", "search", "delete"]
EnableToolset("files")   // Dynamically adds file tools to active set
// Now the AI only sees file tools instead of all 20+ tools

The super scalable solution: multiple MCP servers

When you're operating at the scale of AWS's MCP implementation, it's time to break things into separate servers entirely. Think of it like microservices, but for AI tools. You've got a few patterns to choose from:

By product area: core-mcp-server, analytics-mcp-server, billing-mcp-server
By permissions: read-mcp-server for safe operations, write-mcp-server for mutations
By performance: fast-mcp-server for quick lookups, batch-mcp-server for heavy processing

This approach requires more infrastructure work (you'll need to manage multiple server instances), but it gives you ultimate flexibility. Each server can scale independently, have its own security policies, and be maintained by different teams.

In fact, AWS has over 30 MCP servers available on their official readme.

The beauty of these patterns is that they're not mutually exclusive. Start with namespaces, add dynamic loading when needed, and eventually split into multiple servers if you reach that scale.

Implement Streamable HTTP transport

Okay, let's talk about the elephant in the room: Server sent events (SSE) for MCP servers are officially deprecated. If you just felt your heart skip a beat because you built your entire server on SSE, take a deep breath. You're not alone, and there's a clear path forward.

The new sheriff in town is Streamable HTTP, and honestly, it's a better solution anyway. Think of it as SSE's cooler, more capable cousin who actually shows up to production deployments. While SSE had some nice properties for real time communication, Streamable HTTP gives you everything SSE did plus better error handling, connection management, and broader client support.

Here's the transport landscape in a nutshell:

Stdio transport: Great for local development and CLI tools, but that's about it. You can't run a stdio MCP server in the cloud. If you're building something that needs to be hosted and accessed remotely, stdio is not your friend.

SSE transport: It had a good run, but it's time to say goodbye. The deprecation isn't just about following trends; SSE had real limitations that made production deployments painful. Connection drops, proxy issues, and limited request/response patterns all contributed to its retirement.

Streamable HTTP transport: For a remotely deployed MCP server, this is where you want to be. It handles both streaming and request/response patterns elegantly, works great with modern infrastructure, and plays nicely with load balancers and proxies. If you're starting a new MCP server today, this is your only choice.

The good news? If you're using the official modelcontextprotocol SDKs, they've got your back with automatic fallback support. Your Streamable HTTP server will gracefully handle older clients that only speak SSE, so you don't have to maintain two implementations. It's like having a translator at a party who speaks both languages fluently.

For those of you staring at an existing SSE implementation and wondering how painful migration will be, it's actually not that bad. The concepts map pretty cleanly between SSE and Streamable HTTP. The main changes involve how you structure your response streams and handle connection lifecycle.

If you're feeling overwhelmed by the migration process or just want an expert eye on your implementation, we're happy to help out. You can reach out to us at hi@mcpcat.io for help moving from SSE to Streamable HTTP. Sometimes it's worth having someone who's done this migration a dozen times guide you through it.

Use primitives (tools, prompts, resources) effectively

Most MCP servers out there are only using one third of what MCP actually offers. While everyone's familiar with tools, the resources and prompts primitives are sitting there, tragically underused (for now!).

Let's break down when to use each primitive, because choosing the right one can make the difference between a clunky experience and something that feels magical.

Tools are for actions with consequences

Tools are the workhorses of MCP, and they're perfect for anything that changes state or has side effects. Creating a database record? Tool. Sending an email? Tool. Deleting a file? Definitely a tool.

The secret sauce with tools is providing rich annotations that help both LLMs and user interfaces understand what's going on:

server.tool(
  "delete_user",
  { userId: z.string() },
  async (params) => { /*implementation*/ },
  {
    description: "Permanently deletes a user and all associated data",
    longDescription: "This action cannot be undone. User data will be archived for 30 days before permanent deletion.",
    inputSchema: {
      userId: { description: "The unique identifier of the user to delete" }
    },
    dangerous: true,
    requiresConfirmation: true
  }
);

Resources expose data without side effects

Resources are perfect for exposing data that AI clients can read without worrying about breaking anything. Think of them as read only windows into your system. Configuration files, user profiles, system status, documentation... if it's data that should be accessible without side effects, make it a resource.

server.resource(
  "system_config",
  async () => ({
    contents: [{
      uri: "config://production",
      mimeType: "application/json",
      text: JSON.stringify(await getSystemConfig())
    }]
  })
);

The beauty of resources is that clients can freely explore and reference them without triggering permission prompts every time. It also gives the client the ability to intelligently handle things like caching without depending on the server.

Prompts create reusable interaction patterns

Prompts are the most underutilized primitive, and that's a shame because they're incredibly powerful for creating consistent interactions. They're essentially templates that combine system context with user input:

server.prompt(
  "debug_assistant",
  { errorMessage: z.string(), codeContext: z.string().optional() },
  ({ errorMessage, codeContext }) => ({
    messages: [
      {
        role: "system",
        content: "You are a debugging assistant with access to the codebase and error logs."
      },
      {
        role: "user",
        content: `Help me debug this error: ${errorMessage}\n${codeContext ?`\nCode context: ${codeContext}`: ""}`
      }
    ]
  })
);

Using all three primitives together creates an MCP server that feels complete and thoughtful. Tools handle the actions, resources provide the context, and prompts enable sophisticated interactions.

Monitor user intentions not just technical metrics

Running an MCP server without monitoring is like driving at night with your headlights off. Sure, you might make it to your destination, but you're missing crucial information that could prevent disasters. Let's talk about what you actually need to track to keep your MCP server humming in production.

That includes basic things like:

Reliable and detailed logging
Trustworthy health checks that actually check health
Clear and descriptive error messages

But the million dollar question is always: What are users actually doing?

Here's where most monitoring falls short. You can track response times and error rates all day, but what you really need to know is: what are users trying to accomplish with your MCP server? Are they using it as intended? Are they running into walls trying to do something you didn't anticipate?

This is where MCPcat becomes invaluable. Instead of just seeing "tool X was called 47 times," MCPCat automatically generates rich user intentions for every tool call. You'll understand that users are "trying to bulk update inventory prices before the holiday sale" rather than just "calling update_price repeatedly."

Setting up comprehensive monitoring usually means stitching together multiple services: one for logs, another for metrics, a third for error tracking, and probably a fourth for alerting. MCPCat consolidates this into a single platform designed specifically for MCP servers.

With automatic error tracking, you'll know immediately when tools start failing, complete with context about what the user was trying to do. Request and response monitoring shows you performance trends and helps identify bottlenecks before users complain. And the best part? It works with any MCP server, regardless of which framework or language you used to build it.

The goal isn't just to collect data; it's to understand how your MCP server is serving real users in the wild. Because at the end of the day, that's what determines whether your server is truly production grade or just production hopeful.

Keep an eye on AI.

Get rich user analytics and tracing on every user interacting with your MCP server.

Get started