Architecture

swarmtest is a CLI tool built in TypeScript that orchestrates swarms of headless game-testing agents. This page explains how the major components interact.

System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        swarmtest CLI                          β”‚
β”‚                                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Swarm      │───▢│   Agent (N)   │───▢│  GameAdapter   β”‚  β”‚
β”‚  β”‚  orchestratorβ”‚    β”‚  tick loop    β”‚    β”‚  (tipo, p2...) β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                   β”‚                     β”‚           β”‚
β”‚         β”‚            β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚         β”‚            β”‚ BehaviorTree  β”‚      β”‚   WebSocket    β”‚ β”‚
β”‚         β”‚            β”‚ (JSON β†’ exec) β”‚      β”‚   connection   β”‚ β”‚
β”‚         β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚         β”‚                                          β”‚          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€          β”‚
β”‚  β”‚          Detectors (6 built-in)                  β”‚          β”‚
β”‚  β”‚  crash | protocol | desync | latency |           β”‚          β”‚
β”‚  β”‚  invariant | message_rate                        β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚         β”‚                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Reporters    β”‚    β”‚ TreeRecorder  β”‚    β”‚ TreeGenerator  β”‚  β”‚
β”‚  β”‚  console/json β”‚    β”‚ saves on bug  β”‚    β”‚ Claude API     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                     β”‚  Game Server     β”‚
                     β”‚  (WebSocket)     β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Swarm Orchestrator

The Swarm class is the top-level coordinator. When you call swarm.run(config), it:

  1. Loads the tree library from disk (e.g., ./trees/tipo/)
  2. Allocates agents to behavior types based on the behavior mix (regression / LLM / handwritten)
  3. Spawns agents one at a time with configurable stagger delay
  4. Assigns each agent a behavior tree – from the library, from Claude, or handwritten
  5. Starts behavior loops for connected agents (each agent ticks independently)
  6. Runs detection loops – a cross-agent check every 2 seconds, plus per-agent checks on every tick
  7. Optionally regenerates LLM trees during the run (every 30s by default)
  8. Waits for the configured duration (cancellable via SIGINT/SIGTERM)
  9. Shuts down – stops all agents, drains remaining findings, saves recorded trees, trims the library
  10. Produces a report via configured reporters

Agents

Each Agent represents a single headless game client. An agent:

  • Opens a WebSocket connection through the adapter
  • Sends a connect/authenticate message
  • Waits for the server to accept the connection
  • Enters a tick loop that executes its behavior tree at a fixed interval (default 100ms)
  • Maintains local game state by processing server messages through the adapter
  • Sends periodic pings for latency measurement (every 50 ticks)
  • Reports to detectors on every tick, message, connect, disconnect, and error event

Agents have phases: idle -> connecting -> connected -> playing -> disconnected.

Behavior Trees

Behavior trees are the decision-making engine. They are defined as JSON (TreeNode) and hydrated into executable BehaviorNode objects. On each tick, the tree traverses from the root, evaluating conditions and executing actions.

The tree runtime supports eight node types: sequence, selector, repeat, random_selector, action, wait, condition, and probability. See the Writing Behavior Trees guide for details.

Actions are resolved through the adapter’s getAvailableActions() – if an action is not available in the current game state, the action node returns failure, which may cause the tree to try alternative branches.

Game Adapters

The GameAdapter interface abstracts all game-specific logic:

  • Connection – Opening WebSocket connections and creating connect/disconnect messages
  • Protocol – Parsing server messages and serializing client messages
  • State – Initializing and updating per-agent game state from server messages
  • Actions – Mapping high-level action names to protocol messages
  • Invariants – Game-specific rules that should always hold (checked by InvariantDetector)
  • LLM context – Descriptions of the game and available actions for Claude

This means swarmtest itself has zero knowledge of any specific game protocol. All game logic lives in the adapter.

Detection System

Detectors are pluggable modules that monitor agent activity and produce Finding objects. Each finding has:

  • severity – crash, bug, jank, warning, or info
  • category – Which detector produced it (e.g., crash, protocol, desync)
  • context – Recent message log entries for debugging

Detectors receive callbacks for key events:

CallbackWhen
onSwarmStartTest run begins
onAgentTickEach agent’s tick
onMessageAny sent or received message
onAgentConnectAgent successfully connects
onAgentDisconnectAgent disconnects (with close code)
onAgentErrorWebSocket error
onCrossAgentCheckPeriodic cross-agent comparison (every 2s)

The six built-in detectors are: CrashDetector, ProtocolErrorDetector, DesyncDetector, LatencyDetector, InvariantDetector, and MessageRateDetector.

LLM Integration

The TreeGenerator uses the Anthropic SDK to prompt Claude (default model: claude-haiku-4-5-20251001) with:

  1. The game context from the adapter
  2. All available actions with parameter specs
  3. The TreeNode JSON schema
  4. Summaries of existing trees (to encourage diversity)
  5. A focus area (e.g., β€œedge cases in movement and room transitions”)

Claude’s response is parsed as JSON and validated against the TreeNode schema. Invalid responses cause a fallback to the random walk tree.

During a run, regeneration happens periodically. A random 10% of playing agents may receive a fresh tree at each regeneration interval (default 30s).

Tree Lifecycle

  1. Generation – Trees are created by Claude, written by hand, or loaded from the library
  2. Execution – Trees drive agent behavior during the test run
  3. Recording – If a tree’s agent triggers a meaningful finding, the TreeRecorder saves it
  4. Persistence – After the run, recorded trees are flushed to the TreeLibrary on disk
  5. Trimming – The TreeTrimmer removes trees with >80% action sequence similarity
  6. Regression – On the next run, saved trees are loaded and assigned to agents for replay

Reporting

Reporters receive findings in real-time and a complete summary at the end. The SwarmSummary includes:

  • Duration, agent count, connection success/failure counts
  • Total messages sent and received
  • All findings grouped by severity
  • Per-agent statistics (phase, message counts, latency, finding count)

The ConsoleReporter prints a formatted table to stdout. The JsonReporter writes the full summary to a timestamped JSON file in ./reports/.