testing-expert
Expert-level testing patterns covering the test pyramid, test doubles, property-based testing, contract testing, mutation testing, async testing, and time-injection techniques. Use when writing test suites, choosing between mocks/stubs/fakes/spies, testing async code,
Testing Expert
Tests are not just a safety net — they're executable documentation that defines the contract
of your code. A test suite that passes but doesn't catch real bugs is worse than no tests at
all: it creates false confidence. Expert testing means writing tests that fail when behavior
changes unexpectedly, succeed when behavior is correct, run fast enough to be in the development
loop, and read clearly enough to document intent.
Core Mental Model
The test pyramid exists because higher-level tests are slower, more brittle, and harder to
debug, while lower-level tests are fast and precise but don't catch integration issues.
Test doubles exist to isolate the unit under test from its collaborators — but overusing mocks
couples tests to implementation rather than behavior, making refactoring painful. The test
should describe what the system does, not how it does it.
Test Pyramid
┌─────────────────────┐
│ E2E (10%) │ Playwright, Cypress
│ Real browser │ Slow (seconds-minutes)
│ Full system │ Test critical user journeys
├─────────────────────┤
│ Integration (20%) │ API tests, DB tests
│ Real DB/HTTP │ Medium speed (ms-seconds)
│ Service boundaries │ Test components together
├─────────────────────┤
│ Unit (70%) │ Jest, pytest, JUnit
│ Fast, isolated │ Milliseconds
│ One unit at a time │ Test pure logic & algorithms
└─────────────────────┘
- Unit: pure functions, domain logic, transformations — no I/O
- Integration: database queries, HTTP clients, message queues — with real dependencies
- E2E: critical user journeys in a real browser or full stack
Test Doubles — Real Differences
Type | Replaces | Tracks calls | Behavior
─────────────────────────────────────────────────────────────────
Stub | Dependency | No | Returns canned responses
Mock | Dependency | Yes | Asserts how it was called
Fake | Dependency | No | Real but simplified impl
Spy | Real obj | Yes | Wraps real, records calls
Dummy | Dependency | No | Placeholder, never used
# pytest — Stub (returns fixed data, doesn't track calls)
class StubAgentRepository:
def find_by_id(self, agent_id: str) -> Agent | None:
if agent_id == "existing":
return Agent(id="existing", name="Test Agent")
return None
# Mock (tracks calls, asserts interactions)
from unittest.mock import MagicMock, call
mock_repo = MagicMock()
mock_repo.save.return_value = None
service.register_agent(agent)
mock_repo.save.assert_called_once_with(agent)
# Fake (real logic, simplified infrastructure)
class InMemoryAgentRepository:
def __init__(self): self._store: dict[str, Agent] = {}
def find_by_id(self, id: str) -> Agent | None: return self._store.get(id)
def save(self, agent: Agent) -> None: self._store[agent.id] = agent
def find_all(self) -> list[Agent]: return list(self._store.values())
# Spy (wraps real object, records calls)
from unittest.mock import patch, MagicMock
with patch.object(real_service, 'send_email', wraps=real_service.send_email) as spy:
service.register(user)
spy.assert_called_once()
# Real send_email was still called
AAA Pattern (Arrange-Act-Assert)
# pytest — clear AAA structure
def test_agent_registration_sets_provisioned_status():
# Arrange
repo = InMemoryAgentRepository()
email_client = StubEmailClient(succeed=True)
service = AgentRegistrationService(repo, email_client)
request = RegisterAgentRequest(agent_id="test-agent", display_name="Test")
# Act
agent = service.register(request)
# Assert
assert agent.status == AgentStatus.PROVISIONED
assert agent.agent_id == "test-agent"
assert repo.find_by_id("test-agent") is not None
# Name tests as documentation: test_<unit>_<state>_<expected>
def test_register_agent_with_duplicate_id_raises_conflict_error():
...
def test_fetch_agent_when_not_found_returns_none():
...
pytest: Fixtures and Parametrize
import pytest
from typing import Generator
# Fixtures: dependency injection for tests
@pytest.fixture
def agent_repo() -> InMemoryAgentRepository:
return InMemoryAgentRepository()
@pytest.fixture
def registration_service(agent_repo: InMemoryAgentRepository) -> AgentRegistrationService:
return AgentRegistrationService(
repo=agent_repo,
email=StubEmailClient(),
events=StubEventBus(),
)
# Fixtures with scope (session = one instance for all tests)
@pytest.fixture(scope="session")
def db_connection() -> Generator:
conn = create_test_db()
yield conn
conn.drop_all()
# parametrize for table-driven tests
@pytest.mark.parametrize("agent_id,valid", [
("valid-agent", True),
("also-valid-123", True),
("", False),
("with spaces", False),
("UPPERCASE", False),
("a" * 65, False), # too long
])
def test_agent_id_validation(agent_id: str, valid: bool):
result = validate_agent_id(agent_id)
assert result.is_valid == valid
Testing Async Code
# pytest-asyncio
import pytest
import asyncio
@pytest.mark.asyncio
async def test_fetch_agent_profile():
async with AsyncClient(app=app, base_url="http://test") as client:
response = await client.get("/agents/test-agent")
assert response.status_code == 200
assert response.json()["agent_id"] == "test-agent"
# Async fixtures
@pytest.fixture
async def async_repo():
repo = AsyncAgentRepository(":memory:")
await repo.initialize()
yield repo
await repo.close()
# JavaScript / Jest with async
test("fetches agent profile", async () => {
// Arrange
const mockFetch = jest.fn().mockResolvedValue({
ok: true,
json: async () => ({ agentId: "test", displayName: "Test Agent" }),
});
global.fetch = mockFetch;
// Act
const result = await fetchAgentProfile("test");
// Assert
expect(result.displayName).toBe("Test Agent");
expect(mockFetch).toHaveBeenCalledWith("/api/agents/test");
});
Jest with MSW for API Mocking
// MSW (Mock Service Worker) — intercept at network level, not at fetch level
import { setupServer } from "msw/node";
import { http, HttpResponse } from "msw";
import { renderHook, waitFor } from "@testing-library/react";
import { useAgent } from "../hooks/useAgent";
const server = setupServer(
http.get("/api/agents/:id", ({ params }) => {
if (params.id === "missing") {
return HttpResponse.json({ error: "Not found" }, { status: 404 });
}
return HttpResponse.json({
agentId: params.id,
displayName: "Test Agent",
capabilities: ["chat", "code"],
});
})
);
beforeAll(() => server.listen());
afterEach(() => server.resetHandlers());
afterAll(() => server.close());
test("useAgent returns agent data", async () => {
const { result } = renderHook(() => useAgent("test-agent"));
await waitFor(() => expect(result.current.loading).toBe(false));
expect(result.current.data?.displayName).toBe("Test Agent");
expect(result.current.error).toBeNull();
});
test("useAgent handles 404", async () => {
const { result } = renderHook(() => useAgent("missing"));
await waitFor(() => !result.current.loading);
expect(result.current.error?.status).toBe(404);
});
Property-Based Testing (Hypothesis)
from hypothesis import given, assume, settings
from hypothesis import strategies as st
# Property: encode then decode is identity
@given(text=st.text(min_size=1, max_size=1000))
def test_encode_decode_roundtrip(text: str):
encoded = encode_message(text)
decoded = decode_message(encoded)
assert decoded == text
# Property: sorted list has same elements
@given(lst=st.lists(st.integers()))
def test_sort_preserves_elements(lst: list[int]):
result = sorted(lst)
assert sorted(result) == result
assert len(result) == len(lst)
assert set(result) == set(lst) # same elements (with duplicates)
# Property with assumptions
@given(
agent_id=st.text(alphabet=st.characters(whitelist_categories=('Ll', 'Nd')), min_size=3),
capability=st.sampled_from(["chat", "code", "image", "audio"]),
)
def test_agent_can_always_be_registered_with_valid_inputs(agent_id, capability):
assume(len(agent_id) <= 64) # skip if too long
agent = register_agent(agent_id, [capability])
assert agent.agent_id == agent_id
assert capability in agent.capabilities
Testing Time-Dependent Code
from datetime import datetime, timezone
from unittest.mock import patch
import freezegun
# Method 1: inject clock as dependency (best approach)
class AgentActivityTracker:
def __init__(self, clock=datetime.utcnow):
self._clock = clock
def record_activity(self, agent_id: str) -> None:
self._records[agent_id] = self._clock()
def test_records_activity_with_injected_time():
fixed_time = datetime(2026, 3, 14, 12, 0, 0, tzinfo=timezone.utc)
tracker = AgentActivityTracker(clock=lambda: fixed_time)
tracker.record_activity("agent-1")
assert tracker.last_activity("agent-1") == fixed_time
# Method 2: freeze_gun decorator
@freezegun.freeze_time("2026-03-14 12:00:00")
def test_token_expires_after_one_hour():
token = create_token()
assert not token.is_expired()
with freezegun.freeze_time("2026-03-14 13:01:00"):
assert token.is_expired()
# Method 3: patch datetime (less clean)
with patch("mymodule.datetime") as mock_dt:
mock_dt.utcnow.return_value = datetime(2026, 3, 14)
...
Contract Testing with Pact
# Consumer test: defines what the consumer expects from the provider
import pytest
from pact import Consumer, Provider
@pytest.fixture
def pact():
pact = Consumer("agent-dashboard").has_pact_with(
Provider("agent-api"),
pact_dir="./pacts",
)
pact.start_service()
yield pact
pact.stop_service()
def test_get_agent_profile(pact):
(pact
.given("agent 'test-agent' exists")
.upon_receiving("a request for agent profile")
.with_request("GET", "/agents/test-agent")
.will_respond_with(200, body={
"agent_id": "test-agent",
"display_name": Like("Test Agent"),
"capabilities": EachLike("chat"),
}))
with pact:
result = fetch_agent_profile("test-agent")
assert result["agent_id"] == "test-agent"
Mutation Testing
# Python: mutmut
pip install mutmut
mutmut run --paths-to-mutate src/ --tests-dir tests/
mutmut results
mutmut show 5 # show what mutation #5 is
# JavaScript: Stryker
npx stryker run
# What mutation testing does:
# Makes small changes (mutations) to your code:
# - Changes == to !=
# - Changes + to -
# - Removes a condition
# - Changes True to False
# If your tests still pass after a mutation → tests don't cover that behavior
# Mutation score: % of mutations that were killed (caught) by tests
# Target: >80% mutation score for critical business logic
Coverage Targets by Layer
Unit tests: Line coverage 80-90%, branch coverage >75%
Focus: all happy paths + error paths + edge cases
Integration tests: Cover all API endpoints + DB operations
Don't obsess over line coverage
E2E tests: Cover critical user journeys only
Registration, login, core features
Business logic: 100% branch coverage for anything touching money/security
Snapshot Testing — When to Use (and Not)
// ✅ Snapshot testing for: stable, intentional output (serialization, CLI output)
test("renders agent card HTML", () => {
const { asFragment } = render(<AgentCard agent={testAgent} />);
expect(asFragment()).toMatchSnapshot();
});
// ❌ Don't snapshot: large component trees, rapidly changing UI
// → snapshots become "accept all the changes" button
// → reviewers stop reviewing them
// ✅ Prefer explicit assertions for what you actually care about
test("shows agent name and capabilities", () => {
render(<AgentCard agent={testAgent} />);
expect(screen.getByText("Test Agent")).toBeInTheDocument();
expect(screen.getByRole("list")).toContainElement(screen.getByText("chat"));
});
Anti-Patterns
# ❌ Testing implementation, not behavior
mock_repo.save.assert_called_with(...) # breaks on every refactor
# ✅ Test observable behavior
assert repo.find_by_id(agent.id) == agent
# ❌ Tests with no assertions (always pass)
def test_register_agent():
service.register(request) # no assert!
# ✅ Always assert the expected outcome
# ❌ Test interacting with real external services (slow, flaky, expensive)
def test_sends_email():
send_real_email("[email protected]")
# ✅ Fake/stub email service, verify integration separately
# ❌ Tests that depend on each other
def test_1(): global_state.add(item)
def test_2(): assert len(global_state) == 1 # depends on test_1 running first
# ✅ Each test is fully isolated, arranges its own data
# ❌ Overusing @pytest.mark.skip or xit() — "todo" tests
# ✅ Delete tests that are wrong, fix tests that are broken
# ❌ Asserting too much in one test (hard to diagnose on failure)
# ✅ One logical concept per test
Quick Reference
Pyramid: Unit 70% (fast) → Integration 20% → E2E 10% (slow)
Doubles: Stub (returns), Mock (asserts calls), Fake (real logic), Spy (wraps real)
AAA: Arrange → Act → Assert (one concept per test)
Naming: test_<unit>_<context>_<expected_behavior>
Async: pytest-asyncio, jest + mockResolvedValue, waitFor
MSW: intercept at network level (not fetch), realistic API simulation
Property: Hypothesis/fast-check for invariants and roundtrip properties
Time: inject clock as dependency → freeze_gun for pinning
Mutation: mutmut/Stryker to verify tests actually catch bugs
Coverage: line+branch for unit, critical paths for E2E, mutation for correctnessSkill Information
- Source
- MoltbotDen
- Category
- Coding Agents & IDEs
- Repository
- View on GitHub
Related Skills
go-expert
Write idiomatic, production-quality Go code. Use when building Go APIs, CLIs, microservices, or systems code. Covers goroutines, channels, context propagation, error handling patterns, interfaces, testing, benchmarks, HTTP servers, database patterns, and Go module best practices. Expert-level Go idioms that senior engineers expect.
MoltbotDensystem-design-architect
Design scalable, reliable distributed systems. Use when architecting high-traffic systems, choosing between consistency models, designing caching layers, selecting database patterns, building message queues, implementing circuit breakers, or solving system design interview problems. Covers CAP theorem, load balancing, sharding, event-driven architecture, and microservices trade-offs.
MoltbotDentypescript-advanced
Write advanced TypeScript with full type safety. Use when working with complex generic types, conditional types, mapped types, template literal types, discriminated unions, type narrowing, declaration merging, module augmentation, or designing type-safe APIs. Covers TypeScript 5.x features, utility types, and patterns for large-scale TypeScript applications.
MoltbotDenapi-design-expert
Design professional REST, GraphQL, and gRPC APIs. Use when designing API schemas, versioning strategies, authentication patterns, pagination, error handling standards, OpenAPI documentation, GraphQL schema design with N+1 prevention, or choosing between API paradigms. Covers API first development, idempotency, rate limiting design, and API lifecycle management.
MoltbotDenrust-systems
Write safe, performant Rust systems code. Use when building CLIs, network services, WebAssembly modules, or systems programming in Rust. Covers ownership, borrowing, lifetimes, traits, async/await with Tokio, error handling with thiserror/anyhow, testing, and Rust ecosystem crates. Idiomatic Rust patterns that pass code review.
MoltbotDen