Concurrency Limits
Silo provides three types of limits to control job execution: concurrency limits, floating concurrency limits, and rate limits. These limits can be attached to jobs when enqueueing and are checked in order before the job executes.
This guide focuses on the two concurrency-based limits. For rate limits, see the Enqueueing guide.
Concurrency Limits
Section titled “Concurrency Limits”Concurrency limits restrict how many jobs with the same key can run simultaneously. Jobs that share a limit key compete for available “slots” in a queue.
Basic Usage
Section titled “Basic Usage”await client.enqueue({ payload: { task: "process-upload", userId: 123 }, limits: [ { type: "concurrency", key: "user:123", // Jobs with same key share the limit maxConcurrency: 5 // Max 5 concurrent jobs for this user } ]});How It Works
Section titled “How It Works”When you enqueue a job with a concurrency limit, Silo:
- Checks capacity: If fewer than
maxConcurrencyjobs are running for that key, the job is granted a “ticket” immediately - Queues if full: If the limit is at capacity, the job waits in a request queue ordered by priority and start time
- Grants on completion: When a running job finishes, the next waiting job in the queue is automatically granted a ticket
Jobs with granted tickets proceed to any additional limit checks (if configured) or directly to execution.
Common Use Cases
Section titled “Common Use Cases”Per-user limits:
{ type: "concurrency", key: `user:${userId}`, maxConcurrency: 10}Per-tenant limits:
{ type: "concurrency", key: `tenant:${tenantId}`, maxConcurrency: 100}Resource protection:
{ type: "concurrency", key: "database:analytics", // Protect a shared resource maxConcurrency: 50}Floating Concurrency Limits
Section titled “Floating Concurrency Limits”Sometimes, fixed concurrency limits aren’t appropriate, because the value of the limit needs to change over time.
Floating concurrency limits are dynamic concurrency limits where the maximum concurrency value is computed by your worker code and refreshed periodically. This enables adaptive concurrency based on external factors like system load, quota availability, or business logic — its up to you!
Basic Usage
Section titled “Basic Usage”await client.enqueue({ payload: { task: "api-call", endpoint: "https://api.example.com" }, limits: [ { type: "floatingConcurrency", key: "external-api", defaultMaxConcurrency: 10, // Initial value until first refresh refreshIntervalMs: 60000n, // Refresh every 60 seconds metadata: { apiEndpoint: "https://api.example.com", apiKey: "key-123" } } ]});How It Works
Section titled “How It Works”Floating limits combine the mechanics of regular concurrency limits with worker-driven refresh tasks:
- Initial state: When first created, the limit starts at
defaultMaxConcurrency - Lazy refresh scheduling: When a job with a floating limit is enqueued (or dequeued) and the refresh interval has elapsed, Silo schedules a refresh task to run on a worker
- Worker processing: Some worker receives the refresh task and processes it just like regular job tasks
- Computing new values: The worker executes custom logic registered client side to compute the new
maxConcurrencyvalue - Reporting back: The worker reports the new value to Silo
- State update: Silo updates the limit’s current max concurrency, which affects all future jobs (until the next refresh)
Between refreshes, the floating limit behaves exactly like a regular concurrency limit, using the most recently reported maxConcurrency value.
Worker Implementation
Section titled “Worker Implementation”Workers handle floating limit refreshes using the refreshHandler option on SiloWorker. When a floating limit needs refreshing, the worker receives a refresh task and calls your handler to compute the new max concurrency value:
import { SiloGRPCClient, SiloWorker } from "@silo-ai/client";
const worker = new SiloWorker({ client, workerId: "worker-1", taskGroup: "api-calls", handler: async (ctx) => { // Handle regular job tasks await callExternalApi(ctx.task.payload); return { type: "success", result: {} }; }, refreshHandler: async (ctx) => { // ctx.task contains the refresh task details: // - ctx.task.queueKey: the floating limit key // - ctx.task.currentMaxConcurrency: the current value // - ctx.task.metadata: metadata from the limit definition
// Compute and return the new max concurrency return await computeMaxConcurrency( ctx.task.queueKey, ctx.task.currentMaxConcurrency, ctx.task.metadata ); }});
worker.start();The worker handles outcome reporting automatically. If your refreshHandler returns a number, it’s reported as a success. If it throws an error, it’s reported as a failure and Silo will retry with exponential backoff.
Computing Max Concurrency
Section titled “Computing Max Concurrency”Workers have complete freedom in how they compute the new concurrency value. Examples:
External API rate limits:
async function computeMaxConcurrency(queueKey, currentMax, metadata) { // Query an external API for current quota/limits const apiStatus = await fetch(metadata.apiEndpoint + '/status', { headers: { 'Authorization': `Bearer ${metadata.apiKey}` } });
const quota = await apiStatus.json();
// Set concurrency based on available quota return Math.floor(quota.remaining / 100);}System metrics:
async function computeMaxConcurrency(queueKey, currentMax, metadata) { // Check system load, memory, etc. const cpuUsage = await getSystemCPUUsage();
if (cpuUsage > 0.8) { // Reduce concurrency under high load return Math.max(1, Math.floor(currentMax * 0.5)); } else if (cpuUsage < 0.3) { // Increase concurrency when idle return Math.min(100, currentMax + 5); }
return currentMax; // Keep current level}Time-based:
async function computeMaxConcurrency(queueKey, currentMax, metadata) { const hour = new Date().getHours();
// More concurrency during business hours if (hour >= 9 && hour <= 17) { return 50; }
// Less during off-hours return 10;}Boundaries and Constraints
Section titled “Boundaries and Constraints”The computed maxConcurrency value must satisfy:
- Minimum: Must be at least
1(Silo enforces this as auint32) - Maximum: Limited by
uint32max value (4,294,967,295), but practical limits depend on your system resources - Responsiveness: The refresh task should complete quickly (seconds, not minutes) to avoid blocking worker capacity
Refresh Failures and Backoff
Section titled “Refresh Failures and Backoff”If a worker reports a refresh failure, Silo:
- Keeps using the current
maxConcurrencyvalue - Schedules a retry with exponential backoff:
- Initial backoff: 1 second
- Maximum backoff: 60 seconds
- Multiplier: 2.0
- Increments the retry counter on the floating limit state
- Resets the retry counter to 0 when a refresh succeeds
This ensures transient failures (network issues, API downtime) don’t permanently break the limit.
Metadata Usage
Section titled “Metadata Usage”The metadata map is opaque to Silo - it’s passed through to workers unchanged. Use it to provide context for computing the new max concurrency:
{ type: "floatingConcurrency", key: "stripe-api", defaultMaxConcurrency: 20, refreshIntervalMs: 30000n, metadata: { apiKey: "sk_test_...", apiEndpoint: "https://api.stripe.com", accountId: "acct_123", // Any other data your worker needs }}Handling Limit Changes
Section titled “Handling Limit Changes”When you enqueue a new job with a different limit value for the same key, Silo’s behavior depends on the limit type:
Regular Concurrency Limits
Section titled “Regular Concurrency Limits”For regular concurrency limits, each job carries its own limit definition. When granting tickets:
- Silo checks the in-memory count of currently running jobs for that key
- It compares against the
maxConcurrencyvalue from the job being granted - If capacity is available, the job is granted
This means:
- Different jobs with the same key can specify different
maxConcurrencyvalues - The limit check at grant time uses the incoming job’s limit value
- Currently running jobs are not affected by new enqueues with different limits
- As older jobs complete, newer jobs with their own limit values will be granted
Example scenario:
// Enqueue job A with maxConcurrency=5await client.enqueue({ payload: { id: "A" }, limits: [{ type: "concurrency", key: "queue-1", maxConcurrency: 5 }]});
// Later, enqueue job B with maxConcurrency=10await client.enqueue({ payload: { id: "B" }, limits: [{ type: "concurrency", key: "queue-1", maxConcurrency: 10 }]});Job B’s grant check will use maxConcurrency=10 when determining if it can run. This allows you to adjust limits per-job based on changing conditions.
Floating Concurrency Limits
Section titled “Floating Concurrency Limits”For floating concurrency limits, there is shared state stored per key:
- The first job to use a floating limit key creates the state with
defaultMaxConcurrency - Subsequent jobs with the same key share that state
- The
defaultMaxConcurrencyvalue in later enqueue requests is ignored - the stored state takes precedence - Workers update the shared state via refresh tasks
- All grant decisions use the
currentMaxConcurrencyfrom the shared state
This means:
- Changing
defaultMaxConcurrencyin new enqueue requests has no effect on existing limits - The only way to change the max concurrency is via worker-driven refreshes
- All jobs using the same floating limit key see the same, consistent max concurrency value
- The limit evolves over time as workers report new values
Example scenario:
// First job creates state with defaultMaxConcurrency=5await client.enqueue({ payload: { id: "A" }, limits: [{ type: "floatingConcurrency", key: "dynamic-queue", defaultMaxConcurrency: 5, refreshIntervalMs: 60000n, metadata: {} }]});
// Later job tries to use defaultMaxConcurrency=10 - this is ignored!await client.enqueue({ payload: { id: "B" }, limits: [{ type: "floatingConcurrency", key: "dynamic-queue", // Same key defaultMaxConcurrency: 10, // Ignored - state already exists refreshIntervalMs: 60000n, metadata: {} }]});Both jobs will use the same floating limit state with the original defaultMaxConcurrency=5 (until a refresh task updates it).
Changing Metadata and Refresh Interval
Section titled “Changing Metadata and Refresh Interval”For floating limits, metadata and refreshIntervalMs are also stored in the shared state:
- First job’s values are saved
- Subsequent jobs’ values are ignored
- To change these, you would need to delete the old floating limit state (not currently exposed in the API) or use a different key
Multiple Limits
Section titled “Multiple Limits”Jobs can have multiple limits that are checked sequentially:
await client.enqueue({ payload: { task: "expensive-op" }, limits: [ // Checked first { type: "concurrency", key: "tenant:abc", maxConcurrency: 10 }, // Checked second (only if first passes) { type: "floatingConcurrency", key: "external-api", defaultMaxConcurrency: 5, refreshIntervalMs: 30000n, metadata: {} }, // Checked third { type: "rateLimit", name: "tenant-hourly", uniqueKey: "tenant:abc", limit: 1000n, durationMs: 3600000n } ]});The job proceeds to the next limit only after passing the current one. If a limit blocks (at capacity or rate limited), the job waits until that limit clears.
Best Practices
Section titled “Best Practices”-
Choose the right limit type:
- Use regular concurrency limits when the max concurrency is static and known upfront
- Use floating limits when you need dynamic adjustment based on external factors
- Use rate limits when you care about requests-per-time-window rather than concurrency
-
Keep refresh intervals reasonable:
- Shorter intervals (10-60s) for rapidly changing systems
- Longer intervals (5-15min) for slower-changing external quotas
- Balance freshness against worker overhead
-
Handle refresh failures gracefully:
- Implement retries in your refresh logic for transient errors
- Log failures for debugging
- Consider falling back to safe defaults in your computation logic
-
Use descriptive keys:
- Include entity type:
user:123,tenant:abc,api:stripe - Make keys meaningful for debugging and monitoring
- Include entity type:
-
Monitor limit utilization:
- Use Silo’s query API to track how many jobs are waiting on limits
- Watch for persistent queues that may indicate limits are too restrictive
-
Limit ordering matters:
- Put cheaper checks first (e.g., concurrency before expensive rate limit API calls)
- Order by specificity (per-user limits before global limits)