SimC Simulation Service
Serverless simulation infrastructure that lets users run SimulationCraft against their characters or selected groups (raid comps, dungeon teams), producing DPS/HPS estimates without any always-on compute cost.
Architecture Overview
flowchart TD
subgraph API["NestJS API"]
direction TB
sim["POST /characters/:id/sim"] --> svc[SimulationService]
group["POST /organizations/:id/sim-group"] --> svc
svc -->|1. Create SimJob row| sqs
svc -->|2. Send SQS message| sqs
callback["POST /internal/sim-callback"] --> store[Store SimResults]
store --> update[Update Character]
store --> ws[Emit WebSocket]
read1["GET /sim-jobs/:id"]
read2["GET /sim-jobs/:id/results"]
read3["GET /sim-jobs/mine"]
end
sqs[("SQS: sim-jobs")]
subgraph Worker["ECS Fargate Spot (Rust worker)"]
direction TB
w1["1. Receive SQS message"]
w2["2. Send 'running' callback"]
w3["3. Per character (parallel):<br/>a. Generate armory profile<br/>b. Run SimC binary<br/>c. Parse JSON output<br/>d. Upload raw to S3<br/>e. Send progress callback"]
w4["4. POST final results to API"]
w5["5. Delete SQS message, exit"]
w1 --> w2 --> w3 --> w4 --> w5
end
sqs --> Worker
Worker -->|callbacks| callback
Components
NestJS API (Producer)
The API is responsible for:
- Job creation: Validates the request, creates a
SimJobrow in PostgreSQL, and enqueues an SQS message with character metadata (name, realm, region). - Authorization: Character owners can sim their own characters. Guild officers (
canManageEvents) can sim a selected group of org characters (raid comp, dungeon team). - Callback endpoint: Internal
POST /internal/sim-callbackreceives status updates and per-character results from the worker (API-key auth). - Result storage: On callback, stores
SimResultrows per character, updates each Character's cachedsimDps/simHps/simmedAtfields, and emits WebSocket events. - WebSocket relay: On callback, emits progress/completion events via the
/sim-syncgateway so the frontend can show live status.
SQS Queue
Standard SQS queue (sim-jobs). The message body contains the job ID and character metadata for armory import. In dev, LocalStack provides SQS locally. In production, a native AWS SQS queue with an ECS event-bridge rule triggers Fargate tasks on message arrival.
Message schema:
{
"jobId": "uuid",
"jobType": "single_character | group_batch",
"characters": [
{ "id": "uuid", "name": "Thrall", "realm": "area-52", "region": "us" }
],
"s3ResultPrefix": "sim-results/2026/03/05/",
"callbackUrl": "https://api.example.com/api/v1/internal/sim-callback",
"callbackApiKey": "secret"
}
Rust Worker (Consumer)
A standalone Rust binary packaged in a Docker image alongside the SimC C++ binary. Runs as an ECS Fargate Spot task.
- SQS long-poll: In dev mode, continuously polls the queue. In prod, the container processes one batch and exits (
SIMC_ONE_SHOT=true). - Profile generation: Uses SimC's
armory=region,realm,nameimport to fetch character gear/talents directly from Blizzard's API. - Parallel execution: Runs SimC processes across multiple characters concurrently (bounded by
SIMC_MAX_PARALLEL, default 4). - Result parsing: Reads SimC JSON output to extract
dps.mean,hps.mean, andfight_length.meanper character. - S3 upload: Raw SimC JSON output stored under the job's S3 prefix.
- Callback: Sends progress updates per-character and a final callback with all results.
- Mock mode: If SimC binary is not available (dev without C++ binary), generates deterministic mock DPS/HPS values.
- Graceful shutdown: Handles SIGTERM for Spot interruptions.
S3 Bucket
Raw SimC output stored under a prefix per job:
sim-results/<year>/<month>/<day>/
<characterId>.json # SimC JSON output
Data Model
SimJob
Tracks simulation requests from creation through completion.
| Field | Type | Notes |
|---|---|---|
| id | UUID | Primary key |
| organizationId | UUID? | Nullable — single-character sims have no org context |
| status | enum | queued, running, completed, failed, cancelled |
| jobType | enum | group_batch, single_character |
| characterCount | int | Total characters to simulate |
| completedCount | int | Characters finished so far (for progress) |
| sqsMessageId | string? | SQS message ID for tracking/cancellation |
| s3ResultPrefix | string? | S3 key prefix where results are stored |
| requestedBy | UUID | User who triggered the sim |
| error | string? | Error message if failed |
| startedAt | datetime? | When worker began processing |
| completedAt | datetime? | When worker finished |
| createdAt | datetime | Row creation time |
SimResult
Per-character simulation results. One row per character per job.
| Field | Type | Notes |
|---|---|---|
| id | UUID | Primary key |
| simJobId | UUID | FK to SimJob (cascade delete) |
| characterId | UUID | FK to Character (cascade delete) |
| dps | float | Mean DPS from SimC |
| hps | float | Mean HPS from SimC (0 for non-healers) |
| simDuration | float? | Mean fight length in seconds |
| error | string? | Error message if individual sim failed |
| rawOutputKey | string? | S3 key for raw SimC JSON output |
| createdAt | datetime | Row creation time |
Unique constraint: (simJobId, characterId) — prevents duplicate results.
Character (Cached Sim Fields)
Each Character caches its latest sim result for quick reads:
| Field | Type | Notes |
|---|---|---|
| simDps | float? | Latest simulated DPS |
| simHps | float? | Latest simulated HPS |
| simmedAt | datetime? | When the latest sim was run |
These fields are updated atomically when the callback stores SimResult rows.
Authorization
| Endpoint | Who | How |
|---|---|---|
POST /characters/:id/sim | Character owner | character.userId === currentUser.id |
POST /organizations/:id/sim-group | Guild officer | CASL canManageEvents permission |
GET /sim-jobs/:id | Job requester | simJob.requestedBy === currentUser.id |
GET /sim-jobs/:id/results | Job requester | simJob.requestedBy === currentUser.id |
GET /sim-jobs/mine | Any authenticated user | Scoped to own jobs |
POST /internal/sim-callback | Worker only | X-Sim-Callback-Key header |
Callback Payload
The worker sends status updates via POST /internal/sim-callback:
{
"jobId": "uuid",
"status": "running | completed | failed",
"completedCount": 5,
"error": null,
"results": [
{
"characterId": "uuid",
"dps": 95432.5,
"hps": 0,
"simDuration": 300.2,
"rawOutputKey": "sim-results/2026/03/05/uuid.json"
}
]
}
The results array is included in the final completed callback. Progress callbacks during execution include completedCount but not results.
Infrastructure Decisions
Why SQS + ECS Fargate Spot?
- Zero cost at rest: No compute charges when no sims are running. Fargate tasks spin up only when SQS has messages.
- 60-90% cheaper than on-demand: Spot pricing for CPU-intensive SimC workloads.
- Process isolation: SimC is a CPU-heavy C++ binary. Running it in-process with Node.js would block the event loop and risk OOM. A separate container keeps the API healthy.
- Horizontal scaling: Multiple Fargate tasks can process different jobs concurrently.
Why Rust for the Worker?
- Consistent with the GDL service (
services/gdl/). - Low memory footprint for a container that mostly shells out to SimC.
- Strong async runtime (tokio) for parallel character processing.
- Compile-time safety for SQS message parsing and S3 uploads.
Why Not Bull?
Bull queues run inside the Node.js process. SimC needs:
- A C++ binary (not available in the API container)
- Multi-core parallel execution
- Ability to scale to zero when idle
- Spot pricing for cost efficiency
SQS + ECS is the right abstraction for this workload.
Why Armory Import?
SimC's armory=region,realm,name command fetches the character's full gear, talents, and stats directly from the Blizzard API. This avoids:
- Maintaining our own SimC profile generation logic
- Keeping our local character data perfectly in sync with Blizzard
- Complex item ID → SimC item string translation
The trade-off is that the worker needs Blizzard API access, but SimC handles this internally.
Callback vs Direct DB Write
The worker POSTs results back to the API rather than writing directly to PostgreSQL because:
- The API owns the database schema and can validate/transform results.
- WebSocket events are emitted server-side — the worker doesn't run a WS server.
- Keeps the worker stateless and replaceable.
Development Setup
LocalStack provides SQS and S3 locally. The Rust worker runs as a Docker service alongside the API:
# Start everything (API + GDL + SimC worker + LocalStack)
docker compose -f docker-compose.yml -f docker-compose.dev.yml up
In dev, the SimC binary is not installed — the worker runs in mock mode, generating deterministic DPS/HPS values. This allows end-to-end pipeline testing without the C++ binary.
Environment variables:
| Variable | Dev Default | Description |
|---|---|---|
SQS_ENDPOINT | http://localstack:4566 | SQS/S3 endpoint (LocalStack in dev, omit for prod) |
SQS_SIM_QUEUE_URL | http://localstack:4566/000000000000/sim-jobs | Full queue URL |
S3_SIM_BUCKET | sim-results | S3 bucket for raw SimC output |
SIM_CALLBACK_API_KEY | (generated) | Shared secret for worker-to-API auth |
SIM_CALLBACK_URL | http://api:3001/api/v1/internal/sim-callback | API callback endpoint (Docker network) |
SIMC_BINARY | simc | Path to SimC binary |
SIMC_THREADS | 2 | Threads per SimC process |
SIMC_ITERATIONS | 10000 | Iterations per sim (higher = more accurate, slower) |
SIMC_MAX_PARALLEL | 4 | Max concurrent SimC processes per job |
SIMC_ONE_SHOT | false | Process one message and exit (true for Fargate prod) |
Future Work
- Frontend: RTK Query endpoints, progress UI, sim history page
- ECS task definition / Terraform / CDK for production deployment
- Raid/dungeon group entity modeling (named groups of characters)
- Cron scheduling for periodic group sims
- Rate limiting (prevent spamming sim requests)
- Gear change detection to invalidate cached sim results during enrichment