Architecture

swarmtest is a CLI tool built in TypeScript that orchestrates swarms of headless game-testing agents. This page explains how the major components interact.

System Overview

┌──────────────────────────────────────────────────────────────┐
│                        swarmtest CLI                          │
│                                                               │
│  ┌─────────────┐    ┌──────────────┐    ┌────────────────┐  │
│  │   Swarm      │───▶│   Agent (N)   │───▶│  GameAdapter   │  │
│  │  orchestrator│    │  tick loop    │    │  (tipo, p2...) │  │
│  └──────┬───────┘    └──────┬───────┘    └───────┬────────┘  │
│         │                   │                     │           │
│         │            ┌──────▼───────┐      ┌──────▼────────┐ │
│         │            │ BehaviorTree  │      │   WebSocket    │ │
│         │            │ (JSON → exec) │      │   connection   │ │
│         │            └──────────────┘      └───────┬────────┘ │
│         │                                          │          │
│  ┌──────▼──────────────────────────────────────────┤          │
│  │          Detectors (6 built-in)                  │          │
│  │  crash | protocol | desync | latency |           │          │
│  │  invariant | message_rate                        │          │
│  └──────┬──────────────────────────────────────────┘          │
│         │                                                     │
│  ┌──────▼───────┐    ┌──────────────┐    ┌────────────────┐  │
│  │  Reporters    │    │ TreeRecorder  │    │ TreeGenerator  │  │
│  │  console/json │    │ saves on bug  │    │ Claude API     │  │
│  └──────────────┘    └──────────────┘    └────────────────┘  │
└──────────────────────────────────────────────────────────────┘
                              │
                     ┌────────▼────────┐
                     │  Game Server     │
                     │  (WebSocket)     │
                     └─────────────────┘

The Swarm Orchestrator

The Swarm class is the top-level coordinator. When you call swarm.run(config), it:

Loads the tree library from disk (e.g., ./trees/tipo/)
Allocates agents to behavior types based on the behavior mix (regression / LLM / handwritten)
Spawns agents one at a time with configurable stagger delay
Assigns each agent a behavior tree – from the library, from Claude, or handwritten
Starts behavior loops for connected agents (each agent ticks independently)
Runs detection loops – a cross-agent check every 2 seconds, plus per-agent checks on every tick
Optionally regenerates LLM trees during the run (every 30s by default)
Waits for the configured duration (cancellable via SIGINT/SIGTERM)
Shuts down – stops all agents, drains remaining findings, saves recorded trees, trims the library
Produces a report via configured reporters

Agents

Each Agent represents a single headless game client. An agent:

Opens a WebSocket connection through the adapter
Sends a connect/authenticate message
Waits for the server to accept the connection
Enters a tick loop that executes its behavior tree at a fixed interval (default 100ms)
Maintains local game state by processing server messages through the adapter
Sends periodic pings for latency measurement (every 50 ticks)
Reports to detectors on every tick, message, connect, disconnect, and error event

Agents have phases: idle -> connecting -> connected -> playing -> disconnected.

Behavior trees are the decision-making engine. They are defined as JSON (TreeNode) and hydrated into executable BehaviorNode objects. On each tick, the tree traverses from the root, evaluating conditions and executing actions.

The tree runtime supports eight node types: sequence, selector, repeat, random_selector, action, wait, condition, and probability. See the Writing Behavior Trees guide for details.

Actions are resolved through the adapter’s getAvailableActions() – if an action is not available in the current game state, the action node returns failure, which may cause the tree to try alternative branches.

Game Adapters

The GameAdapter interface abstracts all game-specific logic:

Connection – Opening WebSocket connections and creating connect/disconnect messages
Protocol – Parsing server messages and serializing client messages
State – Initializing and updating per-agent game state from server messages
Actions – Mapping high-level action names to protocol messages
Invariants – Game-specific rules that should always hold (checked by InvariantDetector)
LLM context – Descriptions of the game and available actions for Claude

This means swarmtest itself has zero knowledge of any specific game protocol. All game logic lives in the adapter.

Detection System

Detectors are pluggable modules that monitor agent activity and produce Finding objects. Each finding has:

severity – crash, bug, jank, warning, or info
category – Which detector produced it (e.g., crash, protocol, desync)
context – Recent message log entries for debugging

Detectors receive callbacks for key events:

Callback	When
`onSwarmStart`	Test run begins
`onAgentTick`	Each agent’s tick
`onMessage`	Any sent or received message
`onAgentConnect`	Agent successfully connects
`onAgentDisconnect`	Agent disconnects (with close code)
`onAgentError`	WebSocket error
`onCrossAgentCheck`	Periodic cross-agent comparison (every 2s)

The six built-in detectors are: CrashDetector, ProtocolErrorDetector, DesyncDetector, LatencyDetector, InvariantDetector, and MessageRateDetector.

LLM Integration

The TreeGenerator uses the Anthropic SDK to prompt Claude (default model: claude-haiku-4-5-20251001) with:

The game context from the adapter
All available actions with parameter specs
The TreeNode JSON schema
Summaries of existing trees (to encourage diversity)
A focus area (e.g., “edge cases in movement and room transitions”)

Claude’s response is parsed as JSON and validated against the TreeNode schema. Invalid responses cause a fallback to the random walk tree.

During a run, regeneration happens periodically. A random 10% of playing agents may receive a fresh tree at each regeneration interval (default 30s).

Tree Lifecycle

Generation – Trees are created by Claude, written by hand, or loaded from the library
Execution – Trees drive agent behavior during the test run
Recording – If a tree’s agent triggers a meaningful finding, the TreeRecorder saves it
Persistence – After the run, recorded trees are flushed to the TreeLibrary on disk
Trimming – The TreeTrimmer removes trees with >80% action sequence similarity
Regression – On the next run, saved trees are loaded and assigned to agents for replay

Reporting

Reporters receive findings in real-time and a complete summary at the end. The SwarmSummary includes:

Duration, agent count, connection success/failure counts
Total messages sent and received
All findings grouped by severity
Per-agent statistics (phase, message counts, latency, finding count)

The ConsoleReporter prints a formatted table to stdout. The JsonReporter writes the full summary to a timestamped JSON file in ./reports/.