Writing unit tests for MCP servers

Kashish Hora

Kashish Hora

Co-founder of MCPcat

Try out MCPcat

The Quick Answer

Test MCP servers directly in-memory without subprocess overhead using FastMCP's client-server binding:

import pytest
from fastmcp import FastMCP, Client

async def test_tool_execution():
    server = FastMCP("TestServer")
    
    @server.tool
    def calculate(x: int, y: int) -> int:
        return x + y
    
    async with Client(server) as client:
        result = await client.call_tool("calculate", {"x": 5, "y": 3})
        assert result[0].text == "8"

This pattern enables deterministic testing by eliminating network dependencies and subprocess management. The direct server-client connection provides full protocol compliance while maintaining test speed and reliability.

Prerequisites

  • Python 3.9+ with pytest or unittest installed
  • TypeScript/JavaScript with vitest, jest, or mocha configured
  • Basic understanding of MCP protocol and tool implementation
  • Testing framework of choice (pytest recommended for Python, vitest for TypeScript)

Installation

Install testing dependencies for your chosen language:

# Python with pytest
$pip install pytest pytest-asyncio fastmcp
 
# TypeScript with vitest
$npm install -D vitest @vitest/ui @modelcontextprotocol/sdk
 
# Python with unittest (built-in, but add async support)
$pip install pytest-asyncio # Works with unittest too

Usage

Basic Tool Testing Pattern

The fundamental pattern for testing MCP tools involves creating a server instance, registering tools, and validating their execution through a client. This approach tests the complete tool lifecycle including parameter validation, execution, and response formatting.

import pytest
import json
import asyncio
from fastmcp import FastMCP, Client
from typing import List, Dict

@pytest.fixture
def mcp_server():
    """Create a test MCP server with sample tools"""
    server = FastMCP("TestServer")
    
    @server.tool
    def search_items(query: str, limit: int = 10) -> List[Dict]:
        """Search for items matching query"""
        # Mock implementation for testing
        if not query:
            raise ValueError("Query cannot be empty")
        return [{"name": f"Item {i}", "score": i} for i in range(limit)]
    
    @server.tool
    async def async_process(data: str) -> str:
        """Async tool for testing async operations"""
        await asyncio.sleep(0.1)  # Simulate async work
        return f"Processed: {data}"
    
    return server

async def test_tool_with_parameters(mcp_server):
    """Test tool execution with various parameter combinations"""
    async with Client(mcp_server) as client:
        # Test with valid parameters
        result = await client.call_tool("search_items", {
            "query": "test",
            "limit": 5
        })
        items = json.loads(result[0].text)
        assert len(items) == 5
        assert items[0]["name"] == "Item 0"
        
        # Test with default parameter
        result = await client.call_tool("search_items", {"query": "test"})
        items = json.loads(result[0].text)
        assert len(items) == 10  # Default limit

Testing Resource Handlers

MCP servers often expose resources alongside tools. Testing resources requires validating both the resource listing and individual resource retrieval, ensuring proper URI formatting and content handling.

async def test_resource_management(mcp_server):
    """Test resource listing and retrieval"""
    @mcp_server.resource("config://settings")
    async def get_settings():
        return {"theme": "dark", "language": "en"}
    
    async with Client(mcp_server) as client:
        # List available resources
        resources = await client.list_resources()
        assert len(resources) == 1
        assert resources[0].uri == "config://settings"
        
        # Read specific resource
        content = await client.read_resource("config://settings")
        settings = json.loads(content.text)
        assert settings["theme"] == "dark"

Mocking External Dependencies

Production MCP servers interact with databases, APIs, and file systems. Effective testing requires mocking these dependencies to ensure tests remain fast, deterministic, and isolated from external failures.

from unittest.mock import Mock, patch, AsyncMock
import aiohttp
import json
from typing import Dict

async def test_with_mocked_api(mcp_server):
    """Test tool that calls external API"""
    
    @mcp_server.tool
    async def fetch_weather(city: str) -> Dict:
        async with aiohttp.ClientSession() as session:
            async with session.get(f"https://api.weather.com/{city}") as resp:
                return await resp.json()
    
    # Mock the aiohttp response
    mock_response = AsyncMock()
    mock_response.json = AsyncMock(return_value={"temp": 72, "condition": "sunny"})
    
    with patch('aiohttp.ClientSession.get', return_value=mock_response):
        async with Client(mcp_server) as client:
            result = await client.call_tool("fetch_weather", {"city": "NYC"})
            weather = json.loads(result[0].text)
            assert weather["temp"] == 72
            assert weather["condition"] == "sunny"

TypeScript/Vitest Testing

TypeScript MCP servers follow similar patterns but leverage vitest's powerful mocking capabilities. The async nature of MCP operations maps well to vitest's promise-based testing approach.

import { describe, it, expect, vi } from 'vitest'
import { MCPServer } from '@modelcontextprotocol/sdk'
import { createTestClient } from './test-utils'

describe('MCP Tool Tests', () => {
  it('should execute calculation tool correctly', async () => {
    const server = new MCPServer()
    
    server.tool('multiply', {
      description: 'Multiply two numbers',
      inputSchema: {
        type: 'object',
        properties: {
          a: { type: 'number' },
          b: { type: 'number' }
        },
        required: ['a', 'b']
      }
    }, async ({ a, b }) => {
      return { result: a * b }
    })
    
    const client = createTestClient(server)
    const result = await client.callTool('multiply', { a: 4, b: 7 })
    
    expect(result.result).toBe(28)
  })
  
  it('should handle errors gracefully', async () => {
    const server = new MCPServer()
    
    server.tool('divide', {
      inputSchema: {
        type: 'object',
        properties: {
          a: { type: 'number' },
          b: { type: 'number' }
        }
      }
    }, async ({ a, b }) => {
      if (b === 0) throw new Error('Division by zero')
      return { result: a / b }
    })
    
    const client = createTestClient(server)
    
    await expect(
      client.callTool('divide', { a: 10, b: 0 })
    ).rejects.toThrow('Division by zero')
  })
})

Testing Error Scenarios

Robust MCP servers handle errors gracefully. Testing should cover input validation errors, tool execution failures, and protocol-level errors to ensure proper error propagation and client feedback.

async def test_error_handling(mcp_server):
    """Test various error scenarios"""
    
    @mcp_server.tool
    def risky_operation(action: str) -> str:
        if action == "fail":
            raise RuntimeError("Operation failed")
        elif action == "invalid":
            raise ValueError("Invalid action")
        return f"Success: {action}"
    
    async with Client(mcp_server) as client:
        # Test runtime errors
        with pytest.raises(Exception) as exc_info:
            await client.call_tool("risky_operation", {"action": "fail"})
        assert "Operation failed" in str(exc_info.value)
        
        # Test validation errors
        with pytest.raises(Exception) as exc_info:
            await client.call_tool("risky_operation", {"action": "invalid"})
        assert "Invalid action" in str(exc_info.value)
        
        # Test missing required parameters
        with pytest.raises(Exception) as exc_info:
            await client.call_tool("risky_operation", {})
        assert "required" in str(exc_info.value).lower()

Common Issues

Error: "Connection refused" or "Server not started"

Many developers attempt to test MCP servers by spawning subprocess instances, leading to race conditions and connection failures. The in-memory testing pattern eliminates this issue entirely by passing the server instance directly to the client. If you must test with subprocesses, implement proper startup synchronization using health checks or retry logic with exponential backoff.

# Problematic approach
proc = subprocess.Popen(["python", "server.py"])
client = Client("stdio://...")  # Often fails with connection refused

# Recommended approach
server = FastMCP("TestServer")
async with Client(server) as client:  # Direct connection, no subprocess
    # Test code here

Error: "Timeout waiting for tool response"

Test timeouts often indicate async operation issues or infinite loops in tool implementations. MCP clients typically enforce timeouts to prevent hanging operations. When testing long-running tools, either mock the slow operations or explicitly configure longer timeouts for integration tests.

# Configure custom timeout for slow operations
import asyncio

async def test_slow_operation():
    server = FastMCP("TestServer")
    
    @server.tool
    async def long_process(duration: int) -> str:
        await asyncio.sleep(duration)
        return "Complete"
    
    # Use custom client with extended timeout
    async with Client(server, timeout=30) as client:
        result = await client.call_tool("long_process", {"duration": 5})
        assert result[0].text == "Complete"

Error: "Invalid tool parameters" in tests that pass locally

Parameter validation differences between test and production environments often stem from type coercion issues. MCP uses JSON for parameter serialization, which can alter types (e.g., integers becoming floats). Always test with the exact types your production environment will receive, and consider using strict type validation in your tools.

@server.tool
def process_data(count: int, threshold: float) -> dict:
    # Explicitly validate types to catch JSON coercion issues
    if not isinstance(count, int):
        raise TypeError(f"Expected int, got {type(count)}")
    if not isinstance(threshold, (int, float)):
        raise TypeError(f"Expected number, got {type(threshold)}")
    
    return {"processed": count, "above_threshold": count > threshold}

Examples

Example: Testing a Database-Backed MCP Server

This example demonstrates testing an MCP server that interacts with a database, showing proper mocking strategies and transaction management for test isolation.

import pytest
import json
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from fastmcp import FastMCP, Client
from unittest.mock import patch
# Assume these are defined in your models
from myapp.models import Base, User

@pytest.fixture
def test_db():
    """Create in-memory SQLite database for testing"""
    engine = create_engine("sqlite:///:memory:")
    Base.metadata.create_all(engine)
    SessionLocal = sessionmaker(bind=engine)
    return SessionLocal()

@pytest.fixture
def db_mcp_server(test_db):
    """MCP server with database operations"""
    server = FastMCP("DatabaseServer")
    
    @server.tool
    def create_user(name: str, email: str) -> dict:
        user = User(name=name, email=email)
        test_db.add(user)
        test_db.commit()
        return {"id": user.id, "name": user.name, "email": user.email}
    
    @server.tool
    def list_users(limit: int = 10) -> list:
        users = test_db.query(User).limit(limit).all()
        return [{"id": u.id, "name": u.name} for u in users]
    
    return server

async def test_database_operations(db_mcp_server, test_db):
    """Test database CRUD operations through MCP"""
    async with Client(db_mcp_server) as client:
        # Create users
        user1 = await client.call_tool("create_user", {
            "name": "Alice",
            "email": "alice@example.com"
        })
        user1_data = json.loads(user1[0].text)
        assert user1_data["name"] == "Alice"
        
        # Verify in database
        db_user = test_db.query(User).filter_by(email="alice@example.com").first()
        assert db_user is not None
        assert db_user.name == "Alice"
        
        # List users
        users = await client.call_tool("list_users", {"limit": 5})
        users_data = json.loads(users[0].text)
        assert len(users_data) == 1
        assert users_data[0]["name"] == "Alice"

Production considerations include implementing proper connection pooling, handling transaction rollbacks on errors, and ensuring test database cleanup. The example shows the pattern but production code would include comprehensive error handling and resource management.

Example: Testing MCP Server with State Management

Many MCP servers maintain conversation state or context across multiple tool invocations. Testing stateful servers requires validating state persistence and proper cleanup between test runs.

import { describe, it, expect, beforeEach } from 'vitest'
import { StatefulMCPServer } from './stateful-server'
import { createTestClient } from './test-utils'

describe('Stateful MCP Server', () => {
  let server: StatefulMCPServer
  let client: TestClient
  
  beforeEach(() => {
    server = new StatefulMCPServer()
    client = createTestClient(server)
  })
  
  it('should maintain conversation context', async () => {
    // Set context
    await client.callTool('set_context', {
      user: 'testuser',
      session: 'test123'
    })
    
    // First tool call uses context
    const result1 = await client.callTool('get_personalized_greeting', {})
    expect(result1.message).toBe('Hello, testuser!')
    
    // Subsequent calls remember context
    const result2 = await client.callTool('get_session_info', {})
    expect(result2.session).toBe('test123')
    expect(result2.callCount).toBe(2)
  })
  
  it('should isolate state between clients', async () => {
    const client2 = createTestClient(server)
    
    // Set different context for each client
    await client.callTool('set_context', { user: 'alice' })
    await client2.callTool('set_context', { user: 'bob' })
    
    // Verify isolation
    const result1 = await client.callTool('whoami', {})
    const result2 = await client2.callTool('whoami', {})
    
    expect(result1.user).toBe('alice')
    expect(result2.user).toBe('bob')
  })
})

State management testing requires careful attention to cleanup and isolation. Production implementations should consider state persistence strategies, memory management for long-running servers, and proper session timeout handling to prevent memory leaks.

State management testing requires careful attention to cleanup and isolation. Production implementations should consider state persistence strategies, memory management for long-running servers, and proper session timeout handling to prevent memory leaks.