CI/CD Integration

swarmtest is designed to run headless, making it straightforward to integrate into CI/CD pipelines for automated game server testing.

Basic CI Setup

GitHub Actions

name: Swarm Test
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours

jobs:
  swarmtest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install swarmtest
        run: |
          cd swarmtest
          npm ci
          npm run build

      - name: Start game server
        run: |
          # Start your game server in the background
          cd game-server && cargo run &
          sleep 5  # Wait for server to be ready

      - name: Run swarm test
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          node swarmtest/dist/cli.js run \
            --game tipo \
            --url ws://localhost:5001 \
            --agents 20 \
            --duration 120 \
            --json

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: swarm-report
          path: reports/
          retention-days: 30

      - name: Upload regression trees
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: regression-trees
          path: trees/

Without LLM

If you do not want to use an API key in CI, disable LLM generation:

- name: Run swarm test (no LLM)
  run: |
    node swarmtest/dist/cli.js run \
      --game tipo \
      --url ws://localhost:5001 \
      --agents 20 \
      --duration 120 \
      --no-llm \
      --json

This runs with 50% regression trees and 50% handwritten trees. As long as you commit the trees/ directory, regression coverage grows over time.

Persisting Regression Trees

The tree library (trees/ directory) should be committed to version control or stored as a CI artifact. This ensures that behavior trees which triggered bugs in past runs are replayed as regression tests in future runs.

- name: Commit new regression trees
  run: |
    git add trees/
    git diff --staged --quiet || git commit -m "Update regression trees"
    git push

Alternatively, use CI artifacts to pass trees between runs:

- name: Restore regression trees
  uses: actions/download-artifact@v4
  with:
    name: regression-trees
    path: trees/
  continue-on-error: true  # First run won't have any

Interpreting Results

Exit Codes

swarmtest currently exits with code 0 regardless of findings. To fail CI on critical issues, check the JSON report:

# Fail if any crash or bug findings exist
node swarmtest/dist/cli.js run --game tipo --url ws://localhost:5001 --json

CRASHES=$(jq '.findings | map(select(.severity == "crash" or .severity == "bug")) | length' reports/swarm-report-*.json)
if [ "$CRASHES" -gt 0 ]; then
  echo "Found $CRASHES critical findings"
  exit 1
fi

JSON Report Structure

The JSON report contains the full SwarmSummary:

{
  "durationMs": 120000,
  "agentCount": 20,
  "connectedCount": 18,
  "failedCount": 2,
  "totalMessagesSent": 4521,
  "totalMessagesReceived": 12340,
  "totalErrors": 3,
  "findings": [...],
  "findingsBySeverity": {
    "crash": 1,
    "bug": 2,
    "jank": 5,
    "warning": 3
  },
  "agentSummaries": [...]
}

Scaling

Agent Count

Start with 10-20 agents for basic testing. Increase to 50-100 for load testing. The main bottleneck is the game server, not swarmtest.

Duration

Short runs (60s) are good for smoke tests. Longer runs (5-10 minutes) catch latency degradation and memory leaks. Overnight runs can catch rare race conditions.

Tick Interval

The default 100ms tick interval is suitable for most games. For real-time games that process inputs at higher rates, lower the tick to 50ms or even 20ms.

Connection Stagger

The 200ms default stagger prevents thundering herd on connection. For load testing, reduce to 50ms or 0ms to stress the server’s connection handling.

  1. Smoke test (on every push) – 10 agents, 30 seconds, --no-llm
  2. Standard test (on merge to main) – 20 agents, 2 minutes, full behavior mix
  3. Load test (nightly or weekly) – 50-100 agents, 5-10 minutes
  4. Exploration run (weekly) – 20 agents, --explore-only, 5 minutes, to discover new bug patterns with fresh LLM trees

Tree Library Management

Over time, the tree library grows. Use the trim command periodically to deduplicate:

swarmtest trim --game tipo --threshold 0.2

You can also manually review trees with list-trees and delete trees that are no longer relevant.