CI/CD Integration
swarmtest is designed to run headless, making it straightforward to integrate into CI/CD pipelines for automated game server testing.
Basic CI Setup
GitHub Actions
name: Swarm Test
on:
push:
branches: [main]
schedule:
- cron: '0 */6 * * *' # Every 6 hours
jobs:
swarmtest:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install swarmtest
run: |
cd swarmtest
npm ci
npm run build
- name: Start game server
run: |
# Start your game server in the background
cd game-server && cargo run &
sleep 5 # Wait for server to be ready
- name: Run swarm test
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
node swarmtest/dist/cli.js run \
--game tipo \
--url ws://localhost:5001 \
--agents 20 \
--duration 120 \
--json
- name: Upload report
if: always()
uses: actions/upload-artifact@v4
with:
name: swarm-report
path: reports/
retention-days: 30
- name: Upload regression trees
if: always()
uses: actions/upload-artifact@v4
with:
name: regression-trees
path: trees/Without LLM
If you do not want to use an API key in CI, disable LLM generation:
- name: Run swarm test (no LLM)
run: |
node swarmtest/dist/cli.js run \
--game tipo \
--url ws://localhost:5001 \
--agents 20 \
--duration 120 \
--no-llm \
--jsonThis runs with 50% regression trees and 50% handwritten trees. As long as you commit the trees/ directory, regression coverage grows over time.
Persisting Regression Trees
The tree library (trees/ directory) should be committed to version control or stored as a CI artifact. This ensures that behavior trees which triggered bugs in past runs are replayed as regression tests in future runs.
- name: Commit new regression trees
run: |
git add trees/
git diff --staged --quiet || git commit -m "Update regression trees"
git pushAlternatively, use CI artifacts to pass trees between runs:
- name: Restore regression trees
uses: actions/download-artifact@v4
with:
name: regression-trees
path: trees/
continue-on-error: true # First run won't have anyInterpreting Results
Exit Codes
swarmtest currently exits with code 0 regardless of findings. To fail CI on critical issues, check the JSON report:
# Fail if any crash or bug findings exist
node swarmtest/dist/cli.js run --game tipo --url ws://localhost:5001 --json
CRASHES=$(jq '.findings | map(select(.severity == "crash" or .severity == "bug")) | length' reports/swarm-report-*.json)
if [ "$CRASHES" -gt 0 ]; then
echo "Found $CRASHES critical findings"
exit 1
fiJSON Report Structure
The JSON report contains the full SwarmSummary:
{
"durationMs": 120000,
"agentCount": 20,
"connectedCount": 18,
"failedCount": 2,
"totalMessagesSent": 4521,
"totalMessagesReceived": 12340,
"totalErrors": 3,
"findings": [...],
"findingsBySeverity": {
"crash": 1,
"bug": 2,
"jank": 5,
"warning": 3
},
"agentSummaries": [...]
}Scaling
Agent Count
Start with 10-20 agents for basic testing. Increase to 50-100 for load testing. The main bottleneck is the game server, not swarmtest.
Duration
Short runs (60s) are good for smoke tests. Longer runs (5-10 minutes) catch latency degradation and memory leaks. Overnight runs can catch rare race conditions.
Tick Interval
The default 100ms tick interval is suitable for most games. For real-time games that process inputs at higher rates, lower the tick to 50ms or even 20ms.
Connection Stagger
The 200ms default stagger prevents thundering herd on connection. For load testing, reduce to 50ms or 0ms to stress the serverβs connection handling.
Recommended Pipeline Stages
- Smoke test (on every push) β 10 agents, 30 seconds,
--no-llm - Standard test (on merge to main) β 20 agents, 2 minutes, full behavior mix
- Load test (nightly or weekly) β 50-100 agents, 5-10 minutes
- Exploration run (weekly) β 20 agents,
--explore-only, 5 minutes, to discover new bug patterns with fresh LLM trees
Tree Library Management
Over time, the tree library grows. Use the trim command periodically to deduplicate:
swarmtest trim --game tipo --threshold 0.2You can also manually review trees with list-trees and delete trees that are no longer relevant.