Skip to main content

SimC Simulation Service

Serverless simulation infrastructure that lets users run SimulationCraft against their characters or selected groups (raid comps, dungeon teams), producing DPS/HPS estimates without any always-on compute cost.

Architecture Overview

flowchart TD
subgraph API["NestJS API"]
direction TB
sim["POST /characters/:id/sim"] --> svc[SimulationService]
group["POST /organizations/:id/sim-group"] --> svc
svc -->|1. Create SimJob row| sqs
svc -->|2. Send SQS message| sqs
callback["POST /internal/sim-callback"] --> store[Store SimResults]
store --> update[Update Character]
store --> ws[Emit WebSocket]
read1["GET /sim-jobs/:id"]
read2["GET /sim-jobs/:id/results"]
read3["GET /sim-jobs/mine"]
end

sqs[("SQS: sim-jobs")]

subgraph Worker["ECS Fargate Spot (Rust worker)"]
direction TB
w1["1. Receive SQS message"]
w2["2. Send 'running' callback"]
w3["3. Per character (parallel):<br/>a. Generate armory profile<br/>b. Run SimC binary<br/>c. Parse JSON output<br/>d. Upload raw to S3<br/>e. Send progress callback"]
w4["4. POST final results to API"]
w5["5. Delete SQS message, exit"]
w1 --> w2 --> w3 --> w4 --> w5
end

sqs --> Worker
Worker -->|callbacks| callback

Components

NestJS API (Producer)

The API is responsible for:

  • Job creation: Validates the request, creates a SimJob row in PostgreSQL, and enqueues an SQS message with character metadata (name, realm, region).
  • Authorization: Character owners can sim their own characters. Guild officers (canManageEvents) can sim a selected group of org characters (raid comp, dungeon team).
  • Callback endpoint: Internal POST /internal/sim-callback receives status updates and per-character results from the worker (API-key auth).
  • Result storage: On callback, stores SimResult rows per character, updates each Character's cached simDps/simHps/simmedAt fields, and emits WebSocket events.
  • WebSocket relay: On callback, emits progress/completion events via the /sim-sync gateway so the frontend can show live status.

SQS Queue

Standard SQS queue (sim-jobs). The message body contains the job ID and character metadata for armory import. In dev, LocalStack provides SQS locally. In production, a native AWS SQS queue with an ECS event-bridge rule triggers Fargate tasks on message arrival.

Message schema:

{
"jobId": "uuid",
"jobType": "single_character | group_batch",
"characters": [
{ "id": "uuid", "name": "Thrall", "realm": "area-52", "region": "us" }
],
"s3ResultPrefix": "sim-results/2026/03/05/",
"callbackUrl": "https://api.example.com/api/v1/internal/sim-callback",
"callbackApiKey": "secret"
}

Rust Worker (Consumer)

A standalone Rust binary packaged in a Docker image alongside the SimC C++ binary. Runs as an ECS Fargate Spot task.

  • SQS long-poll: In dev mode, continuously polls the queue. In prod, the container processes one batch and exits (SIMC_ONE_SHOT=true).
  • Profile generation: Uses SimC's armory=region,realm,name import to fetch character gear/talents directly from Blizzard's API.
  • Parallel execution: Runs SimC processes across multiple characters concurrently (bounded by SIMC_MAX_PARALLEL, default 4).
  • Result parsing: Reads SimC JSON output to extract dps.mean, hps.mean, and fight_length.mean per character.
  • S3 upload: Raw SimC JSON output stored under the job's S3 prefix.
  • Callback: Sends progress updates per-character and a final callback with all results.
  • Mock mode: If SimC binary is not available (dev without C++ binary), generates deterministic mock DPS/HPS values.
  • Graceful shutdown: Handles SIGTERM for Spot interruptions.

S3 Bucket

Raw SimC output stored under a prefix per job:

sim-results/<year>/<month>/<day>/
<characterId>.json # SimC JSON output

Data Model

SimJob

Tracks simulation requests from creation through completion.

FieldTypeNotes
idUUIDPrimary key
organizationIdUUID?Nullable — single-character sims have no org context
statusenumqueued, running, completed, failed, cancelled
jobTypeenumgroup_batch, single_character
characterCountintTotal characters to simulate
completedCountintCharacters finished so far (for progress)
sqsMessageIdstring?SQS message ID for tracking/cancellation
s3ResultPrefixstring?S3 key prefix where results are stored
requestedByUUIDUser who triggered the sim
errorstring?Error message if failed
startedAtdatetime?When worker began processing
completedAtdatetime?When worker finished
createdAtdatetimeRow creation time

SimResult

Per-character simulation results. One row per character per job.

FieldTypeNotes
idUUIDPrimary key
simJobIdUUIDFK to SimJob (cascade delete)
characterIdUUIDFK to Character (cascade delete)
dpsfloatMean DPS from SimC
hpsfloatMean HPS from SimC (0 for non-healers)
simDurationfloat?Mean fight length in seconds
errorstring?Error message if individual sim failed
rawOutputKeystring?S3 key for raw SimC JSON output
createdAtdatetimeRow creation time

Unique constraint: (simJobId, characterId) — prevents duplicate results.

Character (Cached Sim Fields)

Each Character caches its latest sim result for quick reads:

FieldTypeNotes
simDpsfloat?Latest simulated DPS
simHpsfloat?Latest simulated HPS
simmedAtdatetime?When the latest sim was run

These fields are updated atomically when the callback stores SimResult rows.

Authorization

EndpointWhoHow
POST /characters/:id/simCharacter ownercharacter.userId === currentUser.id
POST /organizations/:id/sim-groupGuild officerCASL canManageEvents permission
GET /sim-jobs/:idJob requestersimJob.requestedBy === currentUser.id
GET /sim-jobs/:id/resultsJob requestersimJob.requestedBy === currentUser.id
GET /sim-jobs/mineAny authenticated userScoped to own jobs
POST /internal/sim-callbackWorker onlyX-Sim-Callback-Key header

Callback Payload

The worker sends status updates via POST /internal/sim-callback:

{
"jobId": "uuid",
"status": "running | completed | failed",
"completedCount": 5,
"error": null,
"results": [
{
"characterId": "uuid",
"dps": 95432.5,
"hps": 0,
"simDuration": 300.2,
"rawOutputKey": "sim-results/2026/03/05/uuid.json"
}
]
}

The results array is included in the final completed callback. Progress callbacks during execution include completedCount but not results.

Infrastructure Decisions

Why SQS + ECS Fargate Spot?

  • Zero cost at rest: No compute charges when no sims are running. Fargate tasks spin up only when SQS has messages.
  • 60-90% cheaper than on-demand: Spot pricing for CPU-intensive SimC workloads.
  • Process isolation: SimC is a CPU-heavy C++ binary. Running it in-process with Node.js would block the event loop and risk OOM. A separate container keeps the API healthy.
  • Horizontal scaling: Multiple Fargate tasks can process different jobs concurrently.

Why Rust for the Worker?

  • Consistent with the GDL service (services/gdl/).
  • Low memory footprint for a container that mostly shells out to SimC.
  • Strong async runtime (tokio) for parallel character processing.
  • Compile-time safety for SQS message parsing and S3 uploads.

Why Not Bull?

Bull queues run inside the Node.js process. SimC needs:

  • A C++ binary (not available in the API container)
  • Multi-core parallel execution
  • Ability to scale to zero when idle
  • Spot pricing for cost efficiency

SQS + ECS is the right abstraction for this workload.

Why Armory Import?

SimC's armory=region,realm,name command fetches the character's full gear, talents, and stats directly from the Blizzard API. This avoids:

  • Maintaining our own SimC profile generation logic
  • Keeping our local character data perfectly in sync with Blizzard
  • Complex item ID → SimC item string translation

The trade-off is that the worker needs Blizzard API access, but SimC handles this internally.

Callback vs Direct DB Write

The worker POSTs results back to the API rather than writing directly to PostgreSQL because:

  • The API owns the database schema and can validate/transform results.
  • WebSocket events are emitted server-side — the worker doesn't run a WS server.
  • Keeps the worker stateless and replaceable.

Development Setup

LocalStack provides SQS and S3 locally. The Rust worker runs as a Docker service alongside the API:

# Start everything (API + GDL + SimC worker + LocalStack)
docker compose -f docker-compose.yml -f docker-compose.dev.yml up

In dev, the SimC binary is not installed — the worker runs in mock mode, generating deterministic DPS/HPS values. This allows end-to-end pipeline testing without the C++ binary.

Environment variables:

VariableDev DefaultDescription
SQS_ENDPOINThttp://localstack:4566SQS/S3 endpoint (LocalStack in dev, omit for prod)
SQS_SIM_QUEUE_URLhttp://localstack:4566/000000000000/sim-jobsFull queue URL
S3_SIM_BUCKETsim-resultsS3 bucket for raw SimC output
SIM_CALLBACK_API_KEY(generated)Shared secret for worker-to-API auth
SIM_CALLBACK_URLhttp://api:3001/api/v1/internal/sim-callbackAPI callback endpoint (Docker network)
SIMC_BINARYsimcPath to SimC binary
SIMC_THREADS2Threads per SimC process
SIMC_ITERATIONS10000Iterations per sim (higher = more accurate, slower)
SIMC_MAX_PARALLEL4Max concurrent SimC processes per job
SIMC_ONE_SHOTfalseProcess one message and exit (true for Fargate prod)

Future Work

  • Frontend: RTK Query endpoints, progress UI, sim history page
  • ECS task definition / Terraform / CDK for production deployment
  • Raid/dungeon group entity modeling (named groups of characters)
  • Cron scheduling for periodic group sims
  • Rate limiting (prevent spamming sim requests)
  • Gear change detection to invalidate cached sim results during enrichment