The Quick Answer
Deploy an MCP server to AWS Lambda using StreamableHTTP transport with Lambda Web Adapter:
$npm install @modelcontextprotocol/sdk$npm install -D @aws-lambda/web-adapter
// index.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import express from "express";
const server = new McpServer({ name: "my-server", version: "1.0.0" });
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
const app = express();
app.use(express.json());
app.post("/mcp", transport.handle.bind(transport));
await server.connect(transport);
app.listen(8080);
StreamableHTTP enables stateless MCP servers perfect for serverless platforms. Deploy to Lambda with Function URLs, Vercel with their MCP adapter, or Cloudflare Workers with Durable Objects for stateful operations.
Prerequisites
- Node.js 18+ or Python 3.8+ installed
- AWS CLI, Vercel CLI, or Wrangler CLI configured
- Basic understanding of MCP server architecture
- Familiarity with your chosen serverless platform
Installation
AWS Lambda
# Install AWS SAM CLI$brew install aws-sam-cli # macOS# or visit https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html# Install MCP SDK and Lambda Web Adapter$npm install @modelcontextprotocol/sdk express$npm install -D @aws-lambda/web-adapter @types/aws-lambda
Vercel
# Install Vercel CLI$npm install -g vercel# Install Vercel MCP adapter$npm install @vercel/mcp-adapter zod
Cloudflare Workers
# Install Wrangler CLI$npm install -g wrangler# Install Cloudflare MCP dependencies$npm install @cloudflare/mcp-agent
Configuration
AWS Lambda Configuration
Configure your Lambda function to use StreamableHTTP transport. The Lambda Web Adapter automatically converts HTTP requests to Lambda events, enabling standard web frameworks to run seamlessly.
# template.yaml (SAM)
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
McpServerFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: .
Handler: bootstrap
Runtime: provided.al2023
Environment:
Variables:
AWS_LAMBDA_EXEC_WRAPPER: /opt/bootstrap
PORT: 8080
FunctionUrlConfig:
AuthType: NONE
InvokeMode: RESPONSE_STREAM # Enable streaming for real-time responses
Timeout: 900 # 15 minutes max
MemorySize: 1024 # Recommended for better performance
Layers:
- !Sub arn:aws:lambda:${AWS::Region}:753240598075:layer:LambdaAdapterLayerX86:20
The RESPONSE_STREAM
mode is critical for StreamableHTTP support, allowing chunked responses and long-running operations. The Lambda Web Adapter layer handles the HTTP-to-Lambda translation automatically.
Vercel Configuration
Vercel's MCP adapter simplifies deployment with built-in routing and session management:
// app/api/mcp/route.ts
import { createMcpHandler } from '@vercel/mcp-adapter';
import { z } from 'zod';
const handler = createMcpHandler((server) => {
// Define your tools here
server.tool(
'get_weather',
'Get current weather for a location',
{
location: z.string().describe('City name or coordinates'),
units: z.enum(['celsius', 'fahrenheit']).optional()
},
async ({ location, units = 'celsius' }) => {
// Implementation
return {
content: [{
type: 'text',
text: `Weather in ${location}: 22°${units === 'celsius' ? 'C' : 'F'}`
}]
};
}
);
}, {
// Optional configuration
name: 'weather-server',
version: '1.0.0'
}, {
basePath: '/api'
});
export { handler as GET, handler as POST, handler as DELETE };
Cloudflare Workers Configuration
Cloudflare leverages Durable Objects for stateful MCP servers, providing persistent storage and WebSocket support:
// wrangler.toml
name = "mcp-server"
main = "src/index.js"
compatibility_date = "2024-12-20"
[[durable_objects.bindings]]
name = "MCP_AGENT"
class_name = "McpAgent"
[[migrations]]
tag = "v1"
new_classes = ["McpAgent"]
// src/index.js
import { McpAgent } from '@cloudflare/mcp-agent';
export default {
async fetch(request, env) {
// Route requests to Durable Object
const id = env.MCP_AGENT.idFromName('default');
const agent = env.MCP_AGENT.get(id);
return agent.fetch(request);
}
};
export { McpAgent };
Usage
Implementing Tools in AWS Lambda
Create a complete MCP server with tools and resources. This example demonstrates a database query tool with proper error handling:
// src/server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import express from "express";
import { DynamoDBClient, QueryCommand } from "@aws-sdk/client-dynamodb";
const dynamodb = new DynamoDBClient({});
const server = new McpServer({
name: "dynamodb-query-server",
version: "1.0.0"
});
// Define a tool for querying DynamoDB
server.tool(
"query_table",
"Query items from a DynamoDB table",
{
tableName: z.string(),
keyCondition: z.string(),
limit: z.number().optional()
},
async ({ tableName, keyCondition, limit = 10 }) => {
try {
const command = new QueryCommand({
TableName: tableName,
KeyConditionExpression: keyCondition,
Limit: limit
});
const response = await dynamodb.send(command);
return {
content: [{
type: 'text',
text: JSON.stringify(response.Items, null, 2)
}]
};
} catch (error) {
return {
content: [{
type: 'text',
text: `Error querying table: ${error.message}`
}],
isError: true
};
}
}
);
// Set up Express server with StreamableHTTP transport
const app = express();
app.use(express.json());
const transport = new StreamableHTTPServerTransport({
sessionIdGenerator: undefined // Stateless mode
});
app.post("/mcp", transport.handle.bind(transport));
// Connect server to transport
await server.connect(transport);
const port = process.env.PORT || 8080;
app.listen(port, () => {
console.log(`MCP server listening on port ${port}`);
});
export default app;
The StreamableHTTP transport handles all protocol negotiation automatically. When deployed to Lambda, the Function URL provides a direct HTTP endpoint that AI assistants can connect to.
Deploying to Vercel
Vercel deployment leverages their Fluid Compute architecture for optimal performance:
# Deploy to Vercel$vercel# Test locally first$npx @modelcontextprotocol/inspector http://localhost:3000/api/mcp
The MCP Inspector tool helps validate your server implementation before deployment. It simulates client connections and tests all exposed tools and resources.
Stateful Operations with Cloudflare
Cloudflare's Durable Objects enable persistent sessions, perfect for maintaining conversation context or multi-step workflows:
// src/agent.js
export class McpAgent {
constructor(state, env) {
this.state = state;
this.storage = state.storage;
}
async fetch(request) {
// Handle WebSocket upgrade for persistent connections
if (request.headers.get('Upgrade') === 'websocket') {
const [client, server] = new WebSocketPair();
// Accept WebSocket and enable hibernation
this.state.acceptWebSocket(server, ['mcp']);
return new Response(null, {
status: 101,
webSocket: client
});
}
// Handle regular HTTP requests
return this.handleHttpRequest(request);
}
async webSocketMessage(ws, message) {
// Process MCP protocol messages
const data = JSON.parse(message);
if (data.method === 'tools/call') {
// Execute tool and maintain state
const result = await this.executeTool(data.params);
// Store in persistent storage
await this.storage.put(`call:${data.id}`, result);
ws.send(JSON.stringify({
id: data.id,
result
}));
}
}
async webSocketClose(ws, code, reason) {
// Clean up session data
await this.storage.deleteAll();
}
}
Durable Objects provide 10GB of persistent storage per object and automatic WebSocket hibernation, reducing costs for idle connections.
Advanced Usage
Multi-Region Deployment on AWS
Deploy MCP servers across multiple AWS regions for global availability. This pattern uses Route 53 for geographic routing:
# template-multi-region.yaml
Parameters:
DeploymentRegions:
Type: CommaDelimitedList
Default: "us-east-1,eu-west-1,ap-southeast-1"
Resources:
McpServerStackSet:
Type: AWS::CloudFormation::StackSet
Properties:
StackSetName: mcp-server-global
Capabilities:
- CAPABILITY_IAM
Parameters:
- ParameterKey: ServerVersion
ParameterValue: !Ref ServerVersion
PermissionModel: SELF_MANAGED
OperationPreferences:
RegionConcurrencyType: PARALLEL
StackInstancesGroup:
- DeploymentTargets:
Accounts:
- !Ref AWS::AccountId
Regions: !Ref DeploymentRegions
Combine with Lambda@Edge for request routing based on client location, ensuring low latency globally.
Rate Limiting and Security
Implement rate limiting to prevent abuse, especially important for public-facing MCP servers:
// Vercel example with built-in rate limiting
import { createMcpHandler } from '@vercel/mcp-adapter';
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(10, '10 s'),
});
export async function POST(request: Request) {
// Extract client identifier
const identifier = request.headers.get('x-api-key') || 'anonymous';
const { success } = await ratelimit.limit(identifier);
if (!success) {
return new Response('Rate limit exceeded', { status: 429 });
}
// Process MCP request
return handler(request);
}
Always validate the Origin header and implement authentication for production deployments. The MCP specification recommends checking against DNS rebinding attacks.
Common Issues
Error: Cold Start Timeouts
Cold starts can cause timeouts, especially with large dependencies. Lambda's cold start occurs when no warm execution environment is available.
Minimize cold start impact by using provisioned concurrency:
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 2 # Keep 2 instances warm
Alternatively, use smaller runtime dependencies and tree-shake unused code. Consider using ESBuild or similar bundlers to reduce package size.
Error: "Client closed for command"
This error indicates the MCP client disconnected before receiving a response. Common causes include network timeouts or server processing delays.
Implement heartbeat mechanisms for long-running operations:
// Send periodic updates during long operations
server.tool(
"long_operation",
"Performs a time-consuming task",
{},
async (args, { updateProgress }) => {
for (let i = 0; i < 100; i++) {
// Send progress update
await updateProgress({
type: 'text',
text: `Processing: ${i}%`
});
// Simulate work
await new Promise(resolve => setTimeout(resolve, 100));
}
return {
content: [{
type: 'text',
text: 'Operation completed!'
}]
};
}
);
Error: Memory Limit Exceeded
Serverless platforms impose memory limits. AWS Lambda allows up to 10GB, but costs increase with memory allocation.
Stream large responses instead of loading everything into memory:
// Stream file contents instead of reading entirely
server.resource(
"large_file",
"Stream a large file",
async ({ uri }) => {
const stream = fs.createReadStream(uri);
return {
content: [{
type: 'stream',
stream: stream
}]
};
}
);
Examples
Building a GitHub Integration Server
This example demonstrates a production-ready MCP server that integrates with GitHub's API, showcasing authentication, error handling, and response formatting:
// github-mcp-server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { Octokit } from "@octokit/rest";
import { z } from "zod";
const server = new McpServer({
name: "github-integration",
version: "1.0.0",
description: "Query and manage GitHub repositories"
});
// Initialize GitHub client with token from environment
const octokit = new Octokit({
auth: process.env.GITHUB_TOKEN
});
server.tool(
"search_repositories",
"Search GitHub repositories with advanced filters",
{
query: z.string().describe("Search query"),
language: z.string().optional().describe("Filter by programming language"),
sort: z.enum(["stars", "forks", "updated"]).optional(),
limit: z.number().min(1).max(100).default(10)
},
async ({ query, language, sort = "stars", limit }) => {
try {
const searchQuery = language
? `${query} language:${language}`
: query;
const response = await octokit.search.repos({
q: searchQuery,
sort,
per_page: limit
});
const results = response.data.items.map(repo => ({
name: repo.full_name,
description: repo.description,
stars: repo.stargazers_count,
url: repo.html_url
}));
return {
content: [{
type: 'text',
text: JSON.stringify(results, null, 2)
}]
};
} catch (error) {
// Handle rate limiting gracefully
if (error.status === 403) {
return {
content: [{
type: 'text',
text: 'GitHub API rate limit exceeded. Please try again later.'
}],
isError: true
};
}
throw error;
}
}
);
// Add resource for repository details
server.resource(
"repository/*/*",
"Get detailed information about a GitHub repository",
async ({ uri }) => {
const [owner, repo] = uri.split('/').slice(-2);
const data = await octokit.repos.get({ owner, repo });
return {
content: [{
type: 'text',
text: JSON.stringify({
name: data.data.full_name,
description: data.data.description,
created: data.data.created_at,
language: data.data.language,
topics: data.data.topics,
license: data.data.license?.name
}, null, 2)
}]
};
}
);
Production deployment requires proper secret management. Use AWS Secrets Manager, Vercel environment variables, or Cloudflare Workers secrets to store the GitHub token securely.
Real-time Data Processing Server
This example shows how to build an MCP server that processes streaming data, useful for analytics or monitoring applications:
// streaming-analytics-server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { CloudWatchLogsClient } from "@aws-sdk/client-cloudwatch-logs";
const logsClient = new CloudWatchLogsClient({});
const server = new McpServer({
name: "log-analytics",
version: "1.0.0"
});
server.tool(
"analyze_logs",
"Analyze CloudWatch logs in real-time",
{
logGroup: z.string(),
startTime: z.string().datetime(),
pattern: z.string().optional(),
limit: z.number().default(1000)
},
async ({ logGroup, startTime, pattern, limit }) => {
// Set up streaming response
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
let nextToken;
let totalEvents = 0;
do {
const response = await logsClient.filterLogEvents({
logGroupName: logGroup,
startTime: new Date(startTime).getTime(),
filterPattern: pattern,
limit: Math.min(limit - totalEvents, 100),
nextToken
});
// Stream results as they arrive
for (const event of response.events || []) {
const chunk = encoder.encode(
JSON.stringify({
timestamp: new Date(event.timestamp),
message: event.message
}) + '\n'
);
controller.enqueue(chunk);
totalEvents++;
}
nextToken = response.nextToken;
} while (nextToken && totalEvents < limit);
controller.close();
}
});
return {
content: [{
type: 'stream',
mimeType: 'application/x-ndjson',
stream
}]
};
}
);
Streaming responses reduce memory usage and provide real-time feedback for long-running operations. The NDJSON format allows clients to process results incrementally.
[Screenshot: Example of streaming log analysis results in an AI assistant interface]
These examples demonstrate the flexibility of serverless MCP servers. Whether building simple tools or complex integrations, the combination of StreamableHTTP transport and serverless platforms provides a scalable, cost-effective solution for extending AI capabilities.
Related Guides
Configuring MCP transport protocols for Docker containers
Configure stdio, SSE, and StreamableHTTP transport protocols for MCP servers running in Docker containers with practical examples and troubleshooting.
Error handling in custom MCP servers
Implement robust error handling in MCP servers with proper error codes, logging, and recovery strategies.
Building a health check endpoint for your MCP server
Implement health check endpoints for MCP servers to enable monitoring, load balancing, and automated recovery.