Architecture
swarmtest is a CLI tool built in TypeScript that orchestrates swarms of headless game-testing agents. This page explains how the major components interact.
System Overview
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β swarmtest CLI β
β β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββββββ β
β β Swarm βββββΆβ Agent (N) βββββΆβ GameAdapter β β
β β orchestratorβ β tick loop β β (tipo, p2...) β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ βββββββββ¬βββββββββ β
β β β β β
β β ββββββββΌββββββββ ββββββββΌβββββββββ β
β β β BehaviorTree β β WebSocket β β
β β β (JSON β exec) β β connection β β
β β ββββββββββββββββ βββββββββ¬βββββββββ β
β β β β
β ββββββββΌβββββββββββββββββββββββββββββββββββββββββββ€ β
β β Detectors (6 built-in) β β
β β crash | protocol | desync | latency | β β
β β invariant | message_rate β β
β ββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββΌββββββββ ββββββββββββββββ ββββββββββββββββββ β
β β Reporters β β TreeRecorder β β TreeGenerator β β
β β console/json β β saves on bug β β Claude API β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββΌβββββββββ
β Game Server β
β (WebSocket) β
βββββββββββββββββββThe Swarm Orchestrator
The Swarm class is the top-level coordinator. When you call swarm.run(config), it:
- Loads the tree library from disk (e.g.,
./trees/tipo/) - Allocates agents to behavior types based on the behavior mix (regression / LLM / handwritten)
- Spawns agents one at a time with configurable stagger delay
- Assigns each agent a behavior tree β from the library, from Claude, or handwritten
- Starts behavior loops for connected agents (each agent ticks independently)
- Runs detection loops β a cross-agent check every 2 seconds, plus per-agent checks on every tick
- Optionally regenerates LLM trees during the run (every 30s by default)
- Waits for the configured duration (cancellable via SIGINT/SIGTERM)
- Shuts down β stops all agents, drains remaining findings, saves recorded trees, trims the library
- Produces a report via configured reporters
Agents
Each Agent represents a single headless game client. An agent:
- Opens a WebSocket connection through the adapter
- Sends a connect/authenticate message
- Waits for the server to accept the connection
- Enters a tick loop that executes its behavior tree at a fixed interval (default 100ms)
- Maintains local game state by processing server messages through the adapter
- Sends periodic pings for latency measurement (every 50 ticks)
- Reports to detectors on every tick, message, connect, disconnect, and error event
Agents have phases: idle -> connecting -> connected -> playing -> disconnected.
Behavior Trees
Behavior trees are the decision-making engine. They are defined as JSON (TreeNode) and hydrated into executable BehaviorNode objects. On each tick, the tree traverses from the root, evaluating conditions and executing actions.
The tree runtime supports eight node types: sequence, selector, repeat, random_selector, action, wait, condition, and probability. See the Writing Behavior Trees guide for details.
Actions are resolved through the adapterβs getAvailableActions() β if an action is not available in the current game state, the action node returns failure, which may cause the tree to try alternative branches.
Game Adapters
The GameAdapter interface abstracts all game-specific logic:
- Connection β Opening WebSocket connections and creating connect/disconnect messages
- Protocol β Parsing server messages and serializing client messages
- State β Initializing and updating per-agent game state from server messages
- Actions β Mapping high-level action names to protocol messages
- Invariants β Game-specific rules that should always hold (checked by
InvariantDetector) - LLM context β Descriptions of the game and available actions for Claude
This means swarmtest itself has zero knowledge of any specific game protocol. All game logic lives in the adapter.
Detection System
Detectors are pluggable modules that monitor agent activity and produce Finding objects. Each finding has:
- severity β
crash,bug,jank,warning, orinfo - category β Which detector produced it (e.g.,
crash,protocol,desync) - context β Recent message log entries for debugging
Detectors receive callbacks for key events:
| Callback | When |
|---|---|
onSwarmStart | Test run begins |
onAgentTick | Each agentβs tick |
onMessage | Any sent or received message |
onAgentConnect | Agent successfully connects |
onAgentDisconnect | Agent disconnects (with close code) |
onAgentError | WebSocket error |
onCrossAgentCheck | Periodic cross-agent comparison (every 2s) |
The six built-in detectors are: CrashDetector, ProtocolErrorDetector, DesyncDetector, LatencyDetector, InvariantDetector, and MessageRateDetector.
LLM Integration
The TreeGenerator uses the Anthropic SDK to prompt Claude (default model: claude-haiku-4-5-20251001) with:
- The game context from the adapter
- All available actions with parameter specs
- The TreeNode JSON schema
- Summaries of existing trees (to encourage diversity)
- A focus area (e.g., βedge cases in movement and room transitionsβ)
Claudeβs response is parsed as JSON and validated against the TreeNode schema. Invalid responses cause a fallback to the random walk tree.
During a run, regeneration happens periodically. A random 10% of playing agents may receive a fresh tree at each regeneration interval (default 30s).
Tree Lifecycle
- Generation β Trees are created by Claude, written by hand, or loaded from the library
- Execution β Trees drive agent behavior during the test run
- Recording β If a treeβs agent triggers a meaningful finding, the
TreeRecordersaves it - Persistence β After the run, recorded trees are flushed to the
TreeLibraryon disk - Trimming β The
TreeTrimmerremoves trees with >80% action sequence similarity - Regression β On the next run, saved trees are loaded and assigned to agents for replay
Reporting
Reporters receive findings in real-time and a complete summary at the end. The SwarmSummary includes:
- Duration, agent count, connection success/failure counts
- Total messages sent and received
- All findings grouped by severity
- Per-agent statistics (phase, message counts, latency, finding count)
The ConsoleReporter prints a formatted table to stdout. The JsonReporter writes the full summary to a timestamped JSON file in ./reports/.