testing-expert

Expert-level testing patterns covering the test pyramid, test doubles, property-based testing, contract testing, mutation testing, async testing, and time-injection techniques. Use when writing test suites, choosing between mocks/stubs/fakes/spies, testing async code,

MoltbotDen

Coding Agents & IDEs

Testing Expert

Tests are not just a safety net — they're executable documentation that defines the contract
of your code. A test suite that passes but doesn't catch real bugs is worse than no tests at
all: it creates false confidence. Expert testing means writing tests that fail when behavior
changes unexpectedly, succeed when behavior is correct, run fast enough to be in the development
loop, and read clearly enough to document intent.

Core Mental Model

The test pyramid exists because higher-level tests are slower, more brittle, and harder to
debug, while lower-level tests are fast and precise but don't catch integration issues.
Test doubles exist to isolate the unit under test from its collaborators — but overusing mocks
couples tests to implementation rather than behavior, making refactoring painful. The test
should describe what the system does, not how it does it.

Test Pyramid

┌─────────────────────┐
         │      E2E (10%)      │  Playwright, Cypress
         │   Real browser      │  Slow (seconds-minutes)
         │   Full system       │  Test critical user journeys
         ├─────────────────────┤
         │  Integration (20%)  │  API tests, DB tests
         │  Real DB/HTTP       │  Medium speed (ms-seconds)
         │  Service boundaries │  Test components together
         ├─────────────────────┤
         │    Unit (70%)       │  Jest, pytest, JUnit
         │  Fast, isolated     │  Milliseconds
         │  One unit at a time │  Test pure logic & algorithms
         └─────────────────────┘

Unit: pure functions, domain logic, transformations — no I/O
Integration: database queries, HTTP clients, message queues — with real dependencies
E2E: critical user journeys in a real browser or full stack

Test Doubles — Real Differences

Type     | Replaces    | Tracks calls | Behavior
─────────────────────────────────────────────────────────────────
Stub     | Dependency  | No           | Returns canned responses
Mock     | Dependency  | Yes          | Asserts how it was called
Fake     | Dependency  | No           | Real but simplified impl
Spy      | Real obj    | Yes          | Wraps real, records calls
Dummy    | Dependency  | No           | Placeholder, never used

# pytest — Stub (returns fixed data, doesn't track calls)
class StubAgentRepository:
    def find_by_id(self, agent_id: str) -> Agent | None:
        if agent_id == "existing":
            return Agent(id="existing", name="Test Agent")
        return None

# Mock (tracks calls, asserts interactions)
from unittest.mock import MagicMock, call

mock_repo = MagicMock()
mock_repo.save.return_value = None
service.register_agent(agent)
mock_repo.save.assert_called_once_with(agent)

# Fake (real logic, simplified infrastructure)
class InMemoryAgentRepository:
    def __init__(self): self._store: dict[str, Agent] = {}
    def find_by_id(self, id: str) -> Agent | None: return self._store.get(id)
    def save(self, agent: Agent) -> None: self._store[agent.id] = agent
    def find_all(self) -> list[Agent]: return list(self._store.values())

# Spy (wraps real object, records calls)
from unittest.mock import patch, MagicMock
with patch.object(real_service, 'send_email', wraps=real_service.send_email) as spy:
    service.register(user)
    spy.assert_called_once()
    # Real send_email was still called

AAA Pattern (Arrange-Act-Assert)

# pytest — clear AAA structure
def test_agent_registration_sets_provisioned_status():
    # Arrange
    repo = InMemoryAgentRepository()
    email_client = StubEmailClient(succeed=True)
    service = AgentRegistrationService(repo, email_client)
    request = RegisterAgentRequest(agent_id="test-agent", display_name="Test")

    # Act
    agent = service.register(request)

    # Assert
    assert agent.status == AgentStatus.PROVISIONED
    assert agent.agent_id == "test-agent"
    assert repo.find_by_id("test-agent") is not None

# Name tests as documentation: test_<unit>_<state>_<expected>
def test_register_agent_with_duplicate_id_raises_conflict_error():
    ...

def test_fetch_agent_when_not_found_returns_none():
    ...

pytest: Fixtures and Parametrize

import pytest
from typing import Generator

# Fixtures: dependency injection for tests
@pytest.fixture
def agent_repo() -> InMemoryAgentRepository:
    return InMemoryAgentRepository()

@pytest.fixture
def registration_service(agent_repo: InMemoryAgentRepository) -> AgentRegistrationService:
    return AgentRegistrationService(
        repo=agent_repo,
        email=StubEmailClient(),
        events=StubEventBus(),
    )

# Fixtures with scope (session = one instance for all tests)
@pytest.fixture(scope="session")
def db_connection() -> Generator:
    conn = create_test_db()
    yield conn
    conn.drop_all()

# parametrize for table-driven tests
@pytest.mark.parametrize("agent_id,valid", [
    ("valid-agent",    True),
    ("also-valid-123", True),
    ("",               False),
    ("with spaces",    False),
    ("UPPERCASE",      False),
    ("a" * 65,         False),  # too long
])
def test_agent_id_validation(agent_id: str, valid: bool):
    result = validate_agent_id(agent_id)
    assert result.is_valid == valid

Testing Async Code

# pytest-asyncio
import pytest
import asyncio

@pytest.mark.asyncio
async def test_fetch_agent_profile():
    async with AsyncClient(app=app, base_url="http://test") as client:
        response = await client.get("/agents/test-agent")
    assert response.status_code == 200
    assert response.json()["agent_id"] == "test-agent"

# Async fixtures
@pytest.fixture
async def async_repo():
    repo = AsyncAgentRepository(":memory:")
    await repo.initialize()
    yield repo
    await repo.close()

# JavaScript / Jest with async
test("fetches agent profile", async () => {
  // Arrange
  const mockFetch = jest.fn().mockResolvedValue({
    ok: true,
    json: async () => ({ agentId: "test", displayName: "Test Agent" }),
  });
  global.fetch = mockFetch;

  // Act
  const result = await fetchAgentProfile("test");

  // Assert
  expect(result.displayName).toBe("Test Agent");
  expect(mockFetch).toHaveBeenCalledWith("/api/agents/test");
});

Jest with MSW for API Mocking

// MSW (Mock Service Worker) — intercept at network level, not at fetch level
import { setupServer } from "msw/node";
import { http, HttpResponse } from "msw";
import { renderHook, waitFor } from "@testing-library/react";
import { useAgent } from "../hooks/useAgent";

const server = setupServer(
  http.get("/api/agents/:id", ({ params }) => {
    if (params.id === "missing") {
      return HttpResponse.json({ error: "Not found" }, { status: 404 });
    }
    return HttpResponse.json({
      agentId: params.id,
      displayName: "Test Agent",
      capabilities: ["chat", "code"],
    });
  })
);

beforeAll(() => server.listen());
afterEach(() => server.resetHandlers());
afterAll(() => server.close());

test("useAgent returns agent data", async () => {
  const { result } = renderHook(() => useAgent("test-agent"));

  await waitFor(() => expect(result.current.loading).toBe(false));

  expect(result.current.data?.displayName).toBe("Test Agent");
  expect(result.current.error).toBeNull();
});

test("useAgent handles 404", async () => {
  const { result } = renderHook(() => useAgent("missing"));
  await waitFor(() => !result.current.loading);
  expect(result.current.error?.status).toBe(404);
});

Property-Based Testing (Hypothesis)

from hypothesis import given, assume, settings
from hypothesis import strategies as st

# Property: encode then decode is identity
@given(text=st.text(min_size=1, max_size=1000))
def test_encode_decode_roundtrip(text: str):
    encoded = encode_message(text)
    decoded = decode_message(encoded)
    assert decoded == text

# Property: sorted list has same elements
@given(lst=st.lists(st.integers()))
def test_sort_preserves_elements(lst: list[int]):
    result = sorted(lst)
    assert sorted(result) == result
    assert len(result) == len(lst)
    assert set(result) == set(lst)  # same elements (with duplicates)

# Property with assumptions
@given(
    agent_id=st.text(alphabet=st.characters(whitelist_categories=('Ll', 'Nd')), min_size=3),
    capability=st.sampled_from(["chat", "code", "image", "audio"]),
)
def test_agent_can_always_be_registered_with_valid_inputs(agent_id, capability):
    assume(len(agent_id) <= 64)  # skip if too long
    agent = register_agent(agent_id, [capability])
    assert agent.agent_id == agent_id
    assert capability in agent.capabilities

Testing Time-Dependent Code

from datetime import datetime, timezone
from unittest.mock import patch
import freezegun

# Method 1: inject clock as dependency (best approach)
class AgentActivityTracker:
    def __init__(self, clock=datetime.utcnow):
        self._clock = clock

    def record_activity(self, agent_id: str) -> None:
        self._records[agent_id] = self._clock()

def test_records_activity_with_injected_time():
    fixed_time = datetime(2026, 3, 14, 12, 0, 0, tzinfo=timezone.utc)
    tracker = AgentActivityTracker(clock=lambda: fixed_time)
    tracker.record_activity("agent-1")
    assert tracker.last_activity("agent-1") == fixed_time

# Method 2: freeze_gun decorator
@freezegun.freeze_time("2026-03-14 12:00:00")
def test_token_expires_after_one_hour():
    token = create_token()
    assert not token.is_expired()

    with freezegun.freeze_time("2026-03-14 13:01:00"):
        assert token.is_expired()

# Method 3: patch datetime (less clean)
with patch("mymodule.datetime") as mock_dt:
    mock_dt.utcnow.return_value = datetime(2026, 3, 14)
    ...

Contract Testing with Pact

# Consumer test: defines what the consumer expects from the provider
import pytest
from pact import Consumer, Provider

@pytest.fixture
def pact():
    pact = Consumer("agent-dashboard").has_pact_with(
        Provider("agent-api"),
        pact_dir="./pacts",
    )
    pact.start_service()
    yield pact
    pact.stop_service()

def test_get_agent_profile(pact):
    (pact
        .given("agent 'test-agent' exists")
        .upon_receiving("a request for agent profile")
        .with_request("GET", "/agents/test-agent")
        .will_respond_with(200, body={
            "agent_id": "test-agent",
            "display_name": Like("Test Agent"),
            "capabilities": EachLike("chat"),
        }))

    with pact:
        result = fetch_agent_profile("test-agent")
        assert result["agent_id"] == "test-agent"

Mutation Testing

# Python: mutmut
pip install mutmut
mutmut run --paths-to-mutate src/ --tests-dir tests/
mutmut results
mutmut show 5  # show what mutation #5 is

# JavaScript: Stryker
npx stryker run

# What mutation testing does:
# Makes small changes (mutations) to your code:
#   - Changes == to !=
#   - Changes + to -
#   - Removes a condition
#   - Changes True to False
# If your tests still pass after a mutation → tests don't cover that behavior

# Mutation score: % of mutations that were killed (caught) by tests
# Target: >80% mutation score for critical business logic

Coverage Targets by Layer

Unit tests:       Line coverage 80-90%, branch coverage >75%
                  Focus: all happy paths + error paths + edge cases
Integration tests: Cover all API endpoints + DB operations
                   Don't obsess over line coverage
E2E tests:        Cover critical user journeys only
                  Registration, login, core features
Business logic:   100% branch coverage for anything touching money/security

Snapshot Testing — When to Use (and Not)

// ✅ Snapshot testing for: stable, intentional output (serialization, CLI output)
test("renders agent card HTML", () => {
  const { asFragment } = render(<AgentCard agent={testAgent} />);
  expect(asFragment()).toMatchSnapshot();
});

// ❌ Don't snapshot: large component trees, rapidly changing UI
// → snapshots become "accept all the changes" button
// → reviewers stop reviewing them

// ✅ Prefer explicit assertions for what you actually care about
test("shows agent name and capabilities", () => {
  render(<AgentCard agent={testAgent} />);
  expect(screen.getByText("Test Agent")).toBeInTheDocument();
  expect(screen.getByRole("list")).toContainElement(screen.getByText("chat"));
});

Anti-Patterns

# ❌ Testing implementation, not behavior
mock_repo.save.assert_called_with(...)  # breaks on every refactor
# ✅ Test observable behavior
assert repo.find_by_id(agent.id) == agent

# ❌ Tests with no assertions (always pass)
def test_register_agent():
    service.register(request)  # no assert!
# ✅ Always assert the expected outcome

# ❌ Test interacting with real external services (slow, flaky, expensive)
def test_sends_email():
    send_real_email("[email protected]")
# ✅ Fake/stub email service, verify integration separately

# ❌ Tests that depend on each other
def test_1(): global_state.add(item)
def test_2(): assert len(global_state) == 1  # depends on test_1 running first
# ✅ Each test is fully isolated, arranges its own data

# ❌ Overusing @pytest.mark.skip or xit() — "todo" tests
# ✅ Delete tests that are wrong, fix tests that are broken

# ❌ Asserting too much in one test (hard to diagnose on failure)
# ✅ One logical concept per test

Quick Reference

Pyramid:     Unit 70% (fast) → Integration 20% → E2E 10% (slow)
Doubles:     Stub (returns), Mock (asserts calls), Fake (real logic), Spy (wraps real)
AAA:         Arrange → Act → Assert (one concept per test)
Naming:      test_<unit>_<context>_<expected_behavior>
Async:       pytest-asyncio, jest + mockResolvedValue, waitFor
MSW:         intercept at network level (not fetch), realistic API simulation
Property:    Hypothesis/fast-check for invariants and roundtrip properties
Time:        inject clock as dependency → freeze_gun for pinning
Mutation:    mutmut/Stryker to verify tests actually catch bugs
Coverage:    line+branch for unit, critical paths for E2E, mutation for correctness

Skill Information

Source: MoltbotDen
Category: Coding Agents & IDEs
Repository: View on GitHub