Building a serverless MCP server

Kashish Hora

Kashish Hora

Co-founder of MCPcat

Try out MCPcat

The Quick Answer

Deploy an MCP server to AWS Lambda using StreamableHTTP transport with Lambda Web Adapter:

$npm install @modelcontextprotocol/sdk
$npm install -D @aws-lambda/web-adapter
// index.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import express from "express";

const server = new McpServer({ name: "my-server", version: "1.0.0" });
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });

const app = express();
app.use(express.json());
app.post("/mcp", transport.handle.bind(transport));

await server.connect(transport);
app.listen(8080);

StreamableHTTP enables stateless MCP servers perfect for serverless platforms. Deploy to Lambda with Function URLs, Vercel with their MCP adapter, or Cloudflare Workers with Durable Objects for stateful operations.

Prerequisites

  • Node.js 18+ or Python 3.8+ installed
  • AWS CLI, Vercel CLI, or Wrangler CLI configured
  • Basic understanding of MCP server architecture
  • Familiarity with your chosen serverless platform

Installation

AWS Lambda

# Install AWS SAM CLI
$brew install aws-sam-cli # macOS
# or visit https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html
 
# Install MCP SDK and Lambda Web Adapter
$npm install @modelcontextprotocol/sdk express
$npm install -D @aws-lambda/web-adapter @types/aws-lambda

Vercel

# Install Vercel CLI
$npm install -g vercel
 
# Install Vercel MCP adapter
$npm install @vercel/mcp-adapter zod

Cloudflare Workers

# Install Wrangler CLI
$npm install -g wrangler
 
# Install Cloudflare MCP dependencies
$npm install @cloudflare/mcp-agent

Configuration

AWS Lambda Configuration

Configure your Lambda function to use StreamableHTTP transport. The Lambda Web Adapter automatically converts HTTP requests to Lambda events, enabling standard web frameworks to run seamlessly.

# template.yaml (SAM)
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  McpServerFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: .
      Handler: bootstrap
      Runtime: provided.al2023
      Environment:
        Variables:
          AWS_LAMBDA_EXEC_WRAPPER: /opt/bootstrap
          PORT: 8080
      FunctionUrlConfig:
        AuthType: NONE
        InvokeMode: RESPONSE_STREAM  # Enable streaming for real-time responses
      Timeout: 900  # 15 minutes max
      MemorySize: 1024  # Recommended for better performance
      Layers:
        - !Sub arn:aws:lambda:${AWS::Region}:753240598075:layer:LambdaAdapterLayerX86:20

The RESPONSE_STREAM mode is critical for StreamableHTTP support, allowing chunked responses and long-running operations. The Lambda Web Adapter layer handles the HTTP-to-Lambda translation automatically.

Vercel Configuration

Vercel's MCP adapter simplifies deployment with built-in routing and session management:

// app/api/mcp/route.ts
import { createMcpHandler } from '@vercel/mcp-adapter';
import { z } from 'zod';

const handler = createMcpHandler((server) => {
  // Define your tools here
  server.tool(
    'get_weather',
    'Get current weather for a location',
    {
      location: z.string().describe('City name or coordinates'),
      units: z.enum(['celsius', 'fahrenheit']).optional()
    },
    async ({ location, units = 'celsius' }) => {
      // Implementation
      return {
        content: [{
          type: 'text',
          text: `Weather in ${location}: 22°${units === 'celsius' ? 'C' : 'F'}`
        }]
      };
    }
  );
}, {
  // Optional configuration
  name: 'weather-server',
  version: '1.0.0'
}, {
  basePath: '/api'
});

export { handler as GET, handler as POST, handler as DELETE };

Cloudflare Workers Configuration

Cloudflare leverages Durable Objects for stateful MCP servers, providing persistent storage and WebSocket support:

// wrangler.toml
name = "mcp-server"
main = "src/index.js"
compatibility_date = "2024-12-20"

[[durable_objects.bindings]]
name = "MCP_AGENT"
class_name = "McpAgent"

[[migrations]]
tag = "v1"
new_classes = ["McpAgent"]

// src/index.js
import { McpAgent } from '@cloudflare/mcp-agent';

export default {
  async fetch(request, env) {
    // Route requests to Durable Object
    const id = env.MCP_AGENT.idFromName('default');
    const agent = env.MCP_AGENT.get(id);
    return agent.fetch(request);
  }
};

export { McpAgent };

Usage

Implementing Tools in AWS Lambda

Create a complete MCP server with tools and resources. This example demonstrates a database query tool with proper error handling:

// src/server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import express from "express";
import { DynamoDBClient, QueryCommand } from "@aws-sdk/client-dynamodb";

const dynamodb = new DynamoDBClient({});
const server = new McpServer({
  name: "dynamodb-query-server",
  version: "1.0.0"
});

// Define a tool for querying DynamoDB
server.tool(
  "query_table",
  "Query items from a DynamoDB table",
  {
    tableName: z.string(),
    keyCondition: z.string(),
    limit: z.number().optional()
  },
  async ({ tableName, keyCondition, limit = 10 }) => {
    try {
      const command = new QueryCommand({
        TableName: tableName,
        KeyConditionExpression: keyCondition,
        Limit: limit
      });
      
      const response = await dynamodb.send(command);
      
      return {
        content: [{
          type: 'text',
          text: JSON.stringify(response.Items, null, 2)
        }]
      };
    } catch (error) {
      return {
        content: [{
          type: 'text',
          text: `Error querying table: ${error.message}`
        }],
        isError: true
      };
    }
  }
);

// Set up Express server with StreamableHTTP transport
const app = express();
app.use(express.json());

const transport = new StreamableHTTPServerTransport({
  sessionIdGenerator: undefined  // Stateless mode
});

app.post("/mcp", transport.handle.bind(transport));

// Connect server to transport
await server.connect(transport);

const port = process.env.PORT || 8080;
app.listen(port, () => {
  console.log(`MCP server listening on port ${port}`);
});

export default app;

The StreamableHTTP transport handles all protocol negotiation automatically. When deployed to Lambda, the Function URL provides a direct HTTP endpoint that AI assistants can connect to.

Deploying to Vercel

Vercel deployment leverages their Fluid Compute architecture for optimal performance:

# Deploy to Vercel
$vercel
 
# Test locally first
$npx @modelcontextprotocol/inspector http://localhost:3000/api/mcp

The MCP Inspector tool helps validate your server implementation before deployment. It simulates client connections and tests all exposed tools and resources.

Stateful Operations with Cloudflare

Cloudflare's Durable Objects enable persistent sessions, perfect for maintaining conversation context or multi-step workflows:

// src/agent.js
export class McpAgent {
  constructor(state, env) {
    this.state = state;
    this.storage = state.storage;
  }

  async fetch(request) {
    // Handle WebSocket upgrade for persistent connections
    if (request.headers.get('Upgrade') === 'websocket') {
      const [client, server] = new WebSocketPair();
      
      // Accept WebSocket and enable hibernation
      this.state.acceptWebSocket(server, ['mcp']);
      
      return new Response(null, {
        status: 101,
        webSocket: client
      });
    }

    // Handle regular HTTP requests
    return this.handleHttpRequest(request);
  }

  async webSocketMessage(ws, message) {
    // Process MCP protocol messages
    const data = JSON.parse(message);
    
    if (data.method === 'tools/call') {
      // Execute tool and maintain state
      const result = await this.executeTool(data.params);
      
      // Store in persistent storage
      await this.storage.put(`call:${data.id}`, result);
      
      ws.send(JSON.stringify({
        id: data.id,
        result
      }));
    }
  }

  async webSocketClose(ws, code, reason) {
    // Clean up session data
    await this.storage.deleteAll();
  }
}

Durable Objects provide 10GB of persistent storage per object and automatic WebSocket hibernation, reducing costs for idle connections.

Advanced Usage

Multi-Region Deployment on AWS

Deploy MCP servers across multiple AWS regions for global availability. This pattern uses Route 53 for geographic routing:

# template-multi-region.yaml
Parameters:
  DeploymentRegions:
    Type: CommaDelimitedList
    Default: "us-east-1,eu-west-1,ap-southeast-1"

Resources:
  McpServerStackSet:
    Type: AWS::CloudFormation::StackSet
    Properties:
      StackSetName: mcp-server-global
      Capabilities:
        - CAPABILITY_IAM
      Parameters:
        - ParameterKey: ServerVersion
          ParameterValue: !Ref ServerVersion
      PermissionModel: SELF_MANAGED
      OperationPreferences:
        RegionConcurrencyType: PARALLEL
      StackInstancesGroup:
        - DeploymentTargets:
            Accounts:
              - !Ref AWS::AccountId
          Regions: !Ref DeploymentRegions

Combine with Lambda@Edge for request routing based on client location, ensuring low latency globally.

Rate Limiting and Security

Implement rate limiting to prevent abuse, especially important for public-facing MCP servers:

// Vercel example with built-in rate limiting
import { createMcpHandler } from '@vercel/mcp-adapter';
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '10 s'),
});

export async function POST(request: Request) {
  // Extract client identifier
  const identifier = request.headers.get('x-api-key') || 'anonymous';
  
  const { success } = await ratelimit.limit(identifier);
  
  if (!success) {
    return new Response('Rate limit exceeded', { status: 429 });
  }
  
  // Process MCP request
  return handler(request);
}

Always validate the Origin header and implement authentication for production deployments. The MCP specification recommends checking against DNS rebinding attacks.

Common Issues

Error: Cold Start Timeouts

Cold starts can cause timeouts, especially with large dependencies. Lambda's cold start occurs when no warm execution environment is available.

Minimize cold start impact by using provisioned concurrency:

ProvisionedConcurrencyConfig:
  ProvisionedConcurrentExecutions: 2  # Keep 2 instances warm

Alternatively, use smaller runtime dependencies and tree-shake unused code. Consider using ESBuild or similar bundlers to reduce package size.

Error: "Client closed for command"

This error indicates the MCP client disconnected before receiving a response. Common causes include network timeouts or server processing delays.

Implement heartbeat mechanisms for long-running operations:

// Send periodic updates during long operations
server.tool(
  "long_operation",
  "Performs a time-consuming task",
  {},
  async (args, { updateProgress }) => {
    for (let i = 0; i < 100; i++) {
      // Send progress update
      await updateProgress({
        type: 'text',
        text: `Processing: ${i}%`
      });
      
      // Simulate work
      await new Promise(resolve => setTimeout(resolve, 100));
    }
    
    return {
      content: [{
        type: 'text',
        text: 'Operation completed!'
      }]
    };
  }
);

Error: Memory Limit Exceeded

Serverless platforms impose memory limits. AWS Lambda allows up to 10GB, but costs increase with memory allocation.

Stream large responses instead of loading everything into memory:

// Stream file contents instead of reading entirely
server.resource(
  "large_file",
  "Stream a large file",
  async ({ uri }) => {
    const stream = fs.createReadStream(uri);
    
    return {
      content: [{
        type: 'stream',
        stream: stream
      }]
    };
  }
);

Examples

Building a GitHub Integration Server

This example demonstrates a production-ready MCP server that integrates with GitHub's API, showcasing authentication, error handling, and response formatting:

// github-mcp-server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { Octokit } from "@octokit/rest";
import { z } from "zod";

const server = new McpServer({
  name: "github-integration",
  version: "1.0.0",
  description: "Query and manage GitHub repositories"
});

// Initialize GitHub client with token from environment
const octokit = new Octokit({
  auth: process.env.GITHUB_TOKEN
});

server.tool(
  "search_repositories",
  "Search GitHub repositories with advanced filters",
  {
    query: z.string().describe("Search query"),
    language: z.string().optional().describe("Filter by programming language"),
    sort: z.enum(["stars", "forks", "updated"]).optional(),
    limit: z.number().min(1).max(100).default(10)
  },
  async ({ query, language, sort = "stars", limit }) => {
    try {
      const searchQuery = language 
        ? `${query} language:${language}`
        : query;
      
      const response = await octokit.search.repos({
        q: searchQuery,
        sort,
        per_page: limit
      });
      
      const results = response.data.items.map(repo => ({
        name: repo.full_name,
        description: repo.description,
        stars: repo.stargazers_count,
        url: repo.html_url
      }));
      
      return {
        content: [{
          type: 'text',
          text: JSON.stringify(results, null, 2)
        }]
      };
    } catch (error) {
      // Handle rate limiting gracefully
      if (error.status === 403) {
        return {
          content: [{
            type: 'text',
            text: 'GitHub API rate limit exceeded. Please try again later.'
          }],
          isError: true
        };
      }
      throw error;
    }
  }
);

// Add resource for repository details
server.resource(
  "repository/*/*",
  "Get detailed information about a GitHub repository",
  async ({ uri }) => {
    const [owner, repo] = uri.split('/').slice(-2);
    
    const data = await octokit.repos.get({ owner, repo });
    
    return {
      content: [{
        type: 'text',
        text: JSON.stringify({
          name: data.data.full_name,
          description: data.data.description,
          created: data.data.created_at,
          language: data.data.language,
          topics: data.data.topics,
          license: data.data.license?.name
        }, null, 2)
      }]
    };
  }
);

Production deployment requires proper secret management. Use AWS Secrets Manager, Vercel environment variables, or Cloudflare Workers secrets to store the GitHub token securely.

Real-time Data Processing Server

This example shows how to build an MCP server that processes streaming data, useful for analytics or monitoring applications:

// streaming-analytics-server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { CloudWatchLogsClient } from "@aws-sdk/client-cloudwatch-logs";

const logsClient = new CloudWatchLogsClient({});
const server = new McpServer({
  name: "log-analytics",
  version: "1.0.0"
});

server.tool(
  "analyze_logs",
  "Analyze CloudWatch logs in real-time",
  {
    logGroup: z.string(),
    startTime: z.string().datetime(),
    pattern: z.string().optional(),
    limit: z.number().default(1000)
  },
  async ({ logGroup, startTime, pattern, limit }) => {
    // Set up streaming response
    const encoder = new TextEncoder();
    const stream = new ReadableStream({
      async start(controller) {
        let nextToken;
        let totalEvents = 0;
        
        do {
          const response = await logsClient.filterLogEvents({
            logGroupName: logGroup,
            startTime: new Date(startTime).getTime(),
            filterPattern: pattern,
            limit: Math.min(limit - totalEvents, 100),
            nextToken
          });
          
          // Stream results as they arrive
          for (const event of response.events || []) {
            const chunk = encoder.encode(
              JSON.stringify({
                timestamp: new Date(event.timestamp),
                message: event.message
              }) + '\n'
            );
            controller.enqueue(chunk);
            totalEvents++;
          }
          
          nextToken = response.nextToken;
        } while (nextToken && totalEvents < limit);
        
        controller.close();
      }
    });
    
    return {
      content: [{
        type: 'stream',
        mimeType: 'application/x-ndjson',
        stream
      }]
    };
  }
);

Streaming responses reduce memory usage and provide real-time feedback for long-running operations. The NDJSON format allows clients to process results incrementally.

[Screenshot: Example of streaming log analysis results in an AI assistant interface]

These examples demonstrate the flexibility of serverless MCP servers. Whether building simple tools or complex integrations, the combination of StreamableHTTP transport and serverless platforms provides a scalable, cost-effective solution for extending AI capabilities.