CI/CD Integration

swarmtest is designed to run headless, making it straightforward to integrate into CI/CD pipelines for automated game server testing.

Basic CI Setup

GitHub Actions

name: Swarm Test
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours

jobs:
  swarmtest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install swarmtest
        run: |
          cd swarmtest
          npm ci
          npm run build

      - name: Start game server
        run: |
          # Start your game server in the background
          cd game-server && cargo run &
          sleep 5  # Wait for server to be ready

      - name: Run swarm test
        env:
          ANTHROPIC_API_KEY: $&#123;&#123; secrets.ANTHROPIC_API_KEY &#125;&#125;
        run: |
          node swarmtest/dist/cli.js run \
            --game tipo \
            --url ws://localhost:5001 \
            --agents 20 \
            --duration 120 \
            --json

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: swarm-report
          path: reports/
          retention-days: 30

      - name: Upload regression trees
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: regression-trees
          path: trees/

Without LLM

If you do not want to use an API key in CI, disable LLM generation:

- name: Run swarm test (no LLM)
  run: |
    node swarmtest/dist/cli.js run \
      --game tipo \
      --url ws://localhost:5001 \
      --agents 20 \
      --duration 120 \
      --no-llm \
      --json

This runs with 50% regression trees and 50% handwritten trees. As long as you commit the trees/ directory, regression coverage grows over time.

The tree library (trees/ directory) should be committed to version control or stored as a CI artifact. This ensures that behavior trees which triggered bugs in past runs are replayed as regression tests in future runs.

- name: Commit new regression trees
  run: |
    git add trees/
    git diff --staged --quiet || git commit -m "Update regression trees"
    git push

Alternatively, use CI artifacts to pass trees between runs:

- name: Restore regression trees
  uses: actions/download-artifact@v4
  with:
    name: regression-trees
    path: trees/
  continue-on-error: true  # First run won't have any

Interpreting Results

Exit Codes

swarmtest currently exits with code 0 regardless of findings. To fail CI on critical issues, check the JSON report:

# Fail if any crash or bug findings exist
node swarmtest/dist/cli.js run --game tipo --url ws://localhost:5001 --json

CRASHES=$(jq '.findings | map(select(.severity == "crash" or .severity == "bug")) | length' reports/swarm-report-*.json)
if [ "$CRASHES" -gt 0 ]; then
  echo "Found $CRASHES critical findings"
  exit 1
fi

JSON Report Structure

The JSON report contains the full SwarmSummary:

&#123;
  "durationMs": 120000,
  "agentCount": 20,
  "connectedCount": 18,
  "failedCount": 2,
  "totalMessagesSent": 4521,
  "totalMessagesReceived": 12340,
  "totalErrors": 3,
  "findings": [...],
  "findingsBySeverity": &#123;
    "crash": 1,
    "bug": 2,
    "jank": 5,
    "warning": 3
  &#125;,
  "agentSummaries": [...]
&#125;

Smoke test (on every push) – 10 agents, 30 seconds, --no-llm
Standard test (on merge to main) – 20 agents, 2 minutes, full behavior mix
Load test (nightly or weekly) – 50-100 agents, 5-10 minutes
Exploration run (weekly) – 20 agents, --explore-only, 5 minutes, to discover new bug patterns with fresh LLM trees

Tree Library Management

Over time, the tree library grows. Use the trim command periodically to deduplicate:

swarmtest trim --game tipo --threshold 0.2

You can also manually review trees with list-trees and delete trees that are no longer relevant.

CI/CD Integration

Basic CI Setup

GitHub Actions

Without LLM

Persisting Regression Trees

Interpreting Results

Exit Codes

JSON Report Structure

Scaling

Agent Count

Duration

Tick Interval

Connection Stagger

Recommended Pipeline Stages

Tree Library Management