The Quick Answer
Add health check endpoints to your MCP server for monitoring and automated recovery:
app.get('/health', (req, res) => {
res.status(200).json({ status: 'healthy', server: 'mcp-server' });
});
app.get('/health/ready', async (req, res) => {
const isReady = await checkDependencies();
res.status(isReady ? 200 : 503).json({ ready: isReady });
});
Health checks validate server functionality, dependencies, and readiness. They enable load balancers to route traffic and orchestrators to restart unhealthy instances automatically.
Prerequisites
- Node.js 18+ or Python 3.10+ installed
- Basic understanding of MCP server architecture
- Express.js (TypeScript) or FastAPI (Python) framework knowledge
- Optional: Kubernetes or Docker for production deployments
Installation
Install the required dependencies for your chosen language:
# TypeScript/Node.js$npm install express @modelcontextprotocol/sdk# Python$pip install fastapi uvicorn mcp
Configuration
Health check endpoints require careful configuration to balance responsiveness with system load. MCP servers support multiple transport types (stdio, HTTP+SSE, WebSocket), but health checks are most relevant for HTTP-based deployments where external monitoring is possible.
Configure your health check endpoints with appropriate timeouts and response codes:
const HEALTH_CHECK_TIMEOUT = 5000; // 5 seconds
const DEPENDENCY_CHECK_INTERVAL = 30000; // 30 seconds
// Cache dependency status to avoid overloading external services
let lastDependencyCheck = { time: 0, status: true };
async function checkDependencies(): Promise<boolean> {
const now = Date.now();
if (now - lastDependencyCheck.time < DEPENDENCY_CHECK_INTERVAL) {
return lastDependencyCheck.status;
}
// Perform actual checks
const checks = await Promise.all([
checkDatabase(),
checkExternalAPI(),
checkMCPServerInit()
]);
lastDependencyCheck = { time: now, status: checks.every(c => c) };
return lastDependencyCheck.status;
}
The caching mechanism prevents health check endpoints from overwhelming your dependencies. In production environments, adjust the DEPENDENCY_CHECK_INTERVAL
based on your SLA requirements and dependency reliability.
Usage
MCP servers need different types of health checks for various operational scenarios. The three primary patterns address different monitoring needs:
Basic Health Check
The simplest health check confirms the server process is running and can respond to HTTP requests:
app.get('/health', (req, res) => {
const health = {
status: 'healthy',
timestamp: new Date().toISOString(),
service: 'mcp-server',
version: process.env.npm_package_version || '1.0.0',
uptime: process.uptime()
};
res.status(200).json(health);
});
This endpoint serves as a liveness probe, indicating the server hasn't crashed. Load balancers typically check this endpoint every 5-10 seconds to detect unresponsive instances.
Readiness Check
Readiness checks verify the server can handle actual MCP requests by validating all dependencies:
app.get('/health/ready', async (req, res) => {
try {
const mcpReady = mcpServer.isInitialized && mcpServer.tools.length > 0;
const dbConnected = await checkDatabaseConnection();
const apiAvailable = await checkExternalAPIHealth();
const ready = mcpReady && dbConnected && apiAvailable;
res.status(ready ? 200 : 503).json({
ready,
checks: {
mcp: mcpReady,
database: dbConnected,
externalAPI: apiAvailable
},
timestamp: new Date().toISOString()
});
} catch (error) {
res.status(503).json({
ready: false,
error: 'Health check failed',
timestamp: new Date().toISOString()
});
}
});
Kubernetes uses readiness probes to determine when to route traffic to a pod. A failing readiness check removes the instance from the load balancer pool without restarting it.
Detailed Health Status
For comprehensive monitoring, implement a detailed health endpoint that provides granular status information:
interface HealthComponent {
name: string;
status: 'healthy' | 'degraded' | 'unhealthy';
responseTime: number;
message?: string;
}
app.get('/health/detailed', async (req, res) => {
const components: HealthComponent[] = [];
// Check MCP server components
const mcpStart = Date.now();
components.push({
name: 'mcp-server',
status: mcpServer.isInitialized ? 'healthy' : 'unhealthy',
responseTime: Date.now() - mcpStart,
message: `${mcpServer.tools.length} tools, ${mcpServer.resources.length} resources`
});
// Check each dependency with timeout
const dbStart = Date.now();
try {
await Promise.race([
checkDatabase(),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timeout')), 3000)
)
]);
components.push({
name: 'database',
status: 'healthy',
responseTime: Date.now() - dbStart
});
} catch (error) {
components.push({
name: 'database',
status: 'unhealthy',
responseTime: Date.now() - dbStart,
message: error.message
});
}
const overallStatus = components.every(c => c.status === 'healthy')
? 'healthy'
: components.some(c => c.status === 'unhealthy')
? 'unhealthy'
: 'degraded';
res.status(overallStatus === 'healthy' ? 200 : 503).json({
status: overallStatus,
components,
timestamp: new Date().toISOString()
});
});
This pattern helps identify specific failure points during incidents. Monitoring systems can alert on degraded states before complete failures occur.
Common Issues
Error: Connection timeout during health checks
Health check timeouts typically occur when dependency checks take too long or when the server is under heavy load. The root cause often lies in synchronous blocking operations or missing timeout configurations.
// Problem: No timeout protection
async function checkDatabase() {
const result = await db.query('SELECT 1'); // Can hang indefinitely
return result.rows.length > 0;
}
// Solution: Add timeout wrapper
async function checkDatabaseWithTimeout() {
try {
const result = await Promise.race([
db.query('SELECT 1'),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Database timeout')), 2000)
)
]);
return result.rows.length > 0;
} catch (error) {
console.error('Database health check failed:', error);
return false;
}
}
Implement timeouts for all external calls and use connection pooling to prevent resource exhaustion. Consider implementing circuit breakers for frequently failing dependencies.
Error: Health check passes but server returns errors
This disconnect happens when health checks don't accurately reflect server capability. Often, basic health checks only verify the process is running without testing actual MCP functionality.
// Insufficient check - only tests HTTP server
app.get('/health', (req, res) => res.send('OK'));
// Comprehensive check - validates MCP capabilities
app.get('/health', async (req, res) => {
try {
// Test actual MCP functionality
const testTool = mcpServer.tools.find(t => t.name === 'test-tool');
if (!testTool) throw new Error('Test tool not found');
// Verify tool execution capability
const result = await testTool.handler({ test: true });
if (!result) throw new Error('Tool execution failed');
res.status(200).json({
status: 'healthy',
mcp: {
tools: mcpServer.tools.length,
resources: mcpServer.resources.length,
capabilities: mcpServer.capabilities
}
});
} catch (error) {
res.status(503).json({
status: 'unhealthy',
error: error.message
});
}
});
Always include functional checks that exercise core MCP capabilities. This ensures health checks accurately represent server readiness.
Error: Flapping health status (alternating healthy/unhealthy)
Flapping occurs when health checks are too sensitive to transient issues or when thresholds are poorly configured. This causes unnecessary service disruptions and alert fatigue.
// Implement a stability window to prevent flapping
class HealthChecker {
private history: boolean[] = [];
private readonly windowSize = 5;
private readonly healthyThreshold = 0.6;
async checkHealth(): Promise<{ stable: boolean; healthy: boolean }> {
const currentHealth = await this.performHealthCheck();
this.history.push(currentHealth);
if (this.history.length > this.windowSize) {
this.history.shift();
}
const healthyCount = this.history.filter(h => h).length;
const healthyRatio = healthyCount / this.history.length;
return {
stable: this.history.length >= this.windowSize,
healthy: healthyRatio >= this.healthyThreshold
};
}
}
Use rolling windows and percentage-based thresholds instead of single-check failures. This approach tolerates temporary glitches while still detecting persistent issues.
Examples
Production-Ready TypeScript MCP Health Check Server
This example demonstrates a complete health check implementation for a TypeScript MCP server with multiple monitoring endpoints and dependency checks:
import express from 'express';
import { Server as McpServer } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
const app = express();
const mcpServer = new McpServer({
name: 'production-mcp-server',
version: '1.0.0'
});
// Health check state management
const healthState = {
startTime: Date.now(),
isReady: false,
lastCheck: { time: 0, results: {} }
};
// Initialize MCP server with tools and resources
async function initializeMCPServer() {
mcpServer.setRequestHandler('tools/list', async () => ({
tools: [{
name: 'query-database',
description: 'Query the database',
inputSchema: { type: 'object', properties: { query: { type: 'string' } } }
}]
}));
// Start MCP server on stdio transport
const transport = new StdioServerTransport();
await mcpServer.connect(transport);
healthState.isReady = true;
}
// Kubernetes-compatible health endpoints
app.get('/health/startup', (req, res) => {
const startupDuration = Date.now() - healthState.startTime;
const maxStartupTime = 60000; // 60 seconds
if (startupDuration > maxStartupTime && !healthState.isReady) {
res.status(503).json({
status: 'failed',
message: 'Startup timeout exceeded'
});
} else if (healthState.isReady) {
res.status(200).json({ status: 'started' });
} else {
res.status(503).json({
status: 'starting',
duration: startupDuration
});
}
});
app.get('/health/live', (req, res) => {
// Simple liveness check - process is running
res.status(200).json({
status: 'alive',
pid: process.pid,
uptime: process.uptime(),
memory: process.memoryUsage()
});
});
app.get('/health/ready', async (req, res) => {
if (!healthState.isReady) {
return res.status(503).json({ ready: false, reason: 'Server initializing' });
}
// Check all critical dependencies
const checks = {
mcp: mcpServer.capabilities !== undefined,
database: await checkDatabase(),
cache: await checkRedis(),
storage: await checkS3()
};
const ready = Object.values(checks).every(check => check === true);
res.status(ready ? 200 : 503).json({ ready, checks });
});
// Prometheus-compatible metrics endpoint
app.get('/metrics', (req, res) => {
const metrics = [
`# HELP mcp_server_up MCP server status`,
`# TYPE mcp_server_up gauge`,
`mcp_server_up{service="production-mcp-server"} ${healthState.isReady ? 1 : 0}`,
`# HELP mcp_server_uptime_seconds MCP server uptime`,
`# TYPE mcp_server_uptime_seconds counter`,
`mcp_server_uptime_seconds ${process.uptime()}`,
`# HELP mcp_tools_total Total number of MCP tools`,
`# TYPE mcp_tools_total gauge`,
`mcp_tools_total ${mcpServer.tools?.length || 0}`
];
res.set('Content-Type', 'text/plain');
res.send(metrics.join('\n'));
});
// Start servers
async function start() {
await initializeMCPServer();
app.listen(8080, () => {
console.log('Health check endpoints available on :8080');
});
}
start().catch(console.error);
This implementation provides multiple health check endpoints suitable for different monitoring scenarios. The startup probe handles slow initialization, liveness confirms the process hasn't deadlocked, and readiness validates all dependencies before accepting traffic. The Prometheus metrics endpoint enables detailed monitoring and alerting based on custom thresholds.
Python FastAPI MCP Health Monitor
For Python-based MCP servers, this example shows how to implement comprehensive health monitoring with async support:
from fastapi import FastAPI, Response
from mcp.server import Server
from mcp.server.stdio import stdio_transport
import asyncio
import time
from typing import Dict, Any
from datetime import datetime
import aioredis
import asyncpg
app = FastAPI()
mcp_server = Server("python-mcp-server", "1.0.0")
# Health check configuration
HEALTH_CHECK_CACHE_TTL = 30 # seconds
DEPENDENCY_TIMEOUT = 3 # seconds
class HealthMonitor:
def __init__(self):
self.cache = {}
self.server_ready = False
self.start_time = time.time()
async def check_dependency(self, name: str, check_func) -> Dict[str, Any]:
"""Check a dependency with timeout and caching"""
cache_key = f"dep_{name}"
cached = self.cache.get(cache_key)
if cached and (time.time() - cached['timestamp']) < HEALTH_CHECK_CACHE_TTL:
return cached['result']
start = time.time()
try:
result = await asyncio.wait_for(
check_func(),
timeout=DEPENDENCY_TIMEOUT
)
status = 'healthy' if result else 'unhealthy'
except asyncio.TimeoutError:
status = 'timeout'
result = False
except Exception as e:
status = 'error'
result = False
response_time = (time.time() - start) * 1000 # ms
check_result = {
'status': status,
'responseTime': response_time,
'timestamp': time.time()
}
self.cache[cache_key] = {
'result': check_result,
'timestamp': time.time()
}
return check_result
health_monitor = HealthMonitor()
# Dependency check functions
async def check_postgres() -> bool:
"""Verify PostgreSQL connection"""
try:
conn = await asyncpg.connect(
'postgresql://user:pass@localhost/db',
timeout=2
)
await conn.fetchval('SELECT 1')
await conn.close()
return True
except:
return False
async def check_redis() -> bool:
"""Verify Redis connection"""
try:
redis = await aioredis.create_redis_pool('redis://localhost')
await redis.ping()
redis.close()
await redis.wait_closed()
return True
except:
return False
# Health check endpoints
@app.get("/health")
async def health_check():
"""Basic health check endpoint"""
return {
"status": "healthy",
"service": "python-mcp-server",
"timestamp": datetime.utcnow().isoformat(),
"uptime": time.time() - health_monitor.start_time
}
@app.get("/health/ready")
async def readiness_check(response: Response):
"""Comprehensive readiness check"""
if not health_monitor.server_ready:
response.status_code = 503
return {"ready": False, "reason": "Server still initializing"}
# Check all dependencies in parallel
checks = await asyncio.gather(
health_monitor.check_dependency("postgres", check_postgres),
health_monitor.check_dependency("redis", check_redis),
return_exceptions=True
)
# Process results
dependency_results = {}
all_healthy = True
for idx, (name, result) in enumerate(zip(["postgres", "redis"], checks)):
if isinstance(result, Exception):
dependency_results[name] = {
"status": "error",
"message": str(result)
}
all_healthy = False
else:
dependency_results[name] = result
if result['status'] != 'healthy':
all_healthy = False
# MCP server check
mcp_healthy = len(mcp_server._tools) > 0
dependency_results['mcp'] = {
'status': 'healthy' if mcp_healthy else 'unhealthy',
'tools': len(mcp_server._tools),
'resources': len(mcp_server._resources)
}
if not mcp_healthy:
all_healthy = False
response.status_code = 200 if all_healthy else 503
return {
"ready": all_healthy,
"checks": dependency_results,
"timestamp": datetime.utcnow().isoformat()
}
# Initialize MCP server
@mcp_server.tool()
async def query_data(query: str) -> str:
"""Example MCP tool"""
return f"Processed query: {query}"
async def start_mcp_server():
"""Start the MCP server"""
async with stdio_transport() as transport:
await mcp_server.run(transport)
health_monitor.server_ready = True
# Run both servers
if __name__ == "__main__":
import uvicorn
# Start MCP server in background
asyncio.create_task(start_mcp_server())
# Start FastAPI server
uvicorn.run(app, host="0.0.0.0", port=8080)
This Python implementation leverages FastAPI's async capabilities for efficient health checking. The caching mechanism prevents overwhelming dependencies during high-frequency health checks. The parallel dependency checking ensures fast response times even with multiple external services. Integration with MCP server state provides accurate readiness signals for container orchestration platforms.
Related Guides
Implementing connection health checks and monitoring
Implement health checks and monitoring for MCP servers to ensure reliable production deployments.
Configuring MCP installations for production deployments
Configure MCP servers for production with security, monitoring, and deployment best practices.
Security tests for MCP server endpoints
Test MCP server security by validating authentication, authorization, and vulnerability scanning.