Skip to content

Cancel, Restart, Delete

Jobs in Silo can be cancelled to stop processing, restarted to retry after failure, deleted to permanently remove them from the system, or expedited to run immediately if they were scheduled for the future.

Cancellation requests that a job stop processing. The behavior depends on the job’s current state:

  • Scheduled jobs: The job is immediately marked as Cancelled and will never run
  • Running jobs: A cancellation flag is set and the worker discovers this on its next heartbeat. The worker’s task then should stop processing and report a cancelled outcome, but the specific JS task handler can take arbitrarily long to do this as long as it continues to hold the lease.

You can cancel a job directly using the client:

import { SiloGRPCClient, JobNotFoundError } from "@silo-ai/client";
const client = new SiloGRPCClient({
servers: ["localhost:7450"],
});
// Cancel a job by ID
try {
await client.cancelJob("job-123");
console.log("Job cancelled");
} catch (error) {
if (error instanceof JobNotFoundError) {
console.log("Job not found");
}
throw error;
}

If you’re using tenancy, include the tenant:

await client.cancelJob("job-123", "customer-456");

Job handles provide a convenient cancel() method:

// From enqueue
const handle = await client.enqueue({
payload: { task: "process-data" }
});
// Cancel anytime later
await handle.cancel();

Or create a handle for an existing job:

// Create a handle from a known job ID
const handle = client.handle("job-123");
await handle.cancel();
// Or with a tenant
const handle = client.handle("job-456", "customer-123");
await handle.cancel();

The cancelJob() method can throw several errors:

ErrorCondition
JobNotFoundErrorThe job ID does not exist
RpcError (FAILED_PRECONDITION)The job is already cancelled
RpcError (FAILED_PRECONDITION)The job is already in a terminal state (Succeeded or Failed)

The SiloWorker handles heartbeats automatically. When a job is cancelled, the worker detects it on the next heartbeat and aborts the cancellationSignal passed to your handler. Your handler should check this signal and stop work:

const worker = new SiloWorker({
client,
workerId: "worker-1",
taskGroup: "data-processing",
handler: async (ctx) => {
for (const item of ctx.task.payload.items) {
// Check for cancellation between units of work
if (ctx.cancellationSignal.aborted) {
return { type: "cancelled" };
}
await processItem(item);
}
return { type: "success", result: { processed: true } };
}
});

You can also pass the signal to APIs that accept AbortSignal:

handler: async (ctx) => {
const response = await fetch(ctx.task.payload.url, {
signal: ctx.cancellationSignal
});
// ...
}

Deletion permanently removes a job and all its data from Silo. Unlike cancellation, deletion completely erases the job from storage.

// Delete a job by ID
await client.deleteJob("job-123");
// With tenant
await client.deleteJob("job-456", "customer-123");
const handle = client.handle("job-123");
await handle.delete();
ErrorCondition
JobNotFoundErrorThe job ID does not exist
RpcError (INTERNAL)The job is still in progress (Scheduled or Running)

To delete a running job, first cancel it, then delete:

const handle = client.handle("job-123");
// First cancel the job
await handle.cancel();
// Wait for cancellation to complete if needed
// (Running jobs need time for the worker to acknowledge)
const status = await handle.getStatus();
if (status === JobStatus.Cancelled) {
await handle.delete();
}

Restarting allows you to re-run a job that has stopped—either because it was cancelled or because it failed after exhausting its retries. The job is re-queued with a fresh retry counter, giving it another set of chances to successfully complete.

Restart is useful in several scenarios:

  • Accidental cancellation: A job was cancelled by mistake and needs to run
  • Transient failures: A job failed due to temporary issues (service outage, rate limits) that have been resolved, and an operator wants to manually gie it more retries
  • Manual retry: You want to give a failed job another attempt outside of its automatic retry policy because you really want it to succeed

You can restart a job directly using the client:

import { SiloGRPCClient, JobNotFoundError } from "@silo-ai/client";
const client = new SiloGRPCClient({
servers: ["localhost:7450"],
});
// Restart a job by ID
await client.restartJob("job-123");
console.log("Job restarted and re-queued");

If you’re using tenancy, include the tenant:

await client.restartJob("job-123", "customer-456");

Job handles provide a convenient restart() method:

// Create a handle for an existing job
const handle = client.handle("job-123");
await handle.restart();
// Or with a tenant
const handle = client.handle("job-456", "customer-123");
await handle.restart();

When you restart a job, Silo:

  1. Clears the cancellation flag (if the job was cancelled)
  2. Creates a new task with attempt_number = 1, resetting the retry counter
  3. Sets the status to Scheduled, placing the job back in the queue for immediate processing
  4. Preserves the original job data including payload, priority, limits, and metadata

The job will be picked up by the next available worker and processed as if it were newly enqueued.

Only jobs in terminal-but-recoverable states can be restarted:

StatusCan Restart?Reason
Cancelled✅ YesJob was stopped before completion
Failed✅ YesJob failed but can be retried
Succeeded❌ NoJob completed successfully—nothing to retry
Scheduled❌ NoJob is already queued to run
Running❌ NoJob is currently being processed

The restartJob() method can throw several errors:

ErrorCondition
JobNotFoundErrorThe job ID does not exist
RpcError (FAILED_PRECONDITION)Job already succeeded (truly terminal)
RpcError (FAILED_PRECONDITION)Job is still in progress (Scheduled or Running)
import { RpcError } from "@protobuf-ts/runtime-rpc";
try {
await handle.restart();
console.log("Job restarted successfully");
} catch (error) {
if (error instanceof RpcError && error.code === "FAILED_PRECONDITION") {
// Check the message to understand why
console.log("Cannot restart job:", error.message);
// e.g., "job already succeeded" or "job is still in progress"
}
throw error;
}

A common pattern is to monitor for failed jobs and restart them after fixing the underlying issue:

import { JobStatus } from "@silo-ai/client";
// Check if a job failed
const handle = client.handle("job-123", "customer-456");
const status = await handle.getStatus();
if (status === JobStatus.Failed) {
// Get job details to understand the failure
const job = await handle.getJob();
console.log(`Job failed at ${job.statusChangedAtMs}`);
// After fixing the issue, restart the job
await handle.restart();
console.log("Job restarted");
}

If a job was cancelled by mistake, you can restart it to allow processing:

import { JobStatus } from "@silo-ai/client";
const handle = client.handle("job-123");
const status = await handle.getStatus();
if (status === JobStatus.Cancelled) {
// Restart the cancelled job
await handle.restart();
console.log("Cancelled job has been restarted");
}

Expediting allows you to make a future-scheduled job or attempt run immediately, skipping any scheduled delay. This is useful for dragging forward jobs that were scheduled for later or for bypassing retry backoff delays.

Expedite is useful in several scenarios:

  • User-initiated urgency: A user requests immediate processing of a scheduled job
  • Skip retry delays: A job is waiting for retry backoff, but you’ve fixed the issue and want it to run now
  • Testing scheduled jobs: You want to test a future-scheduled job without waiting
  • Priority escalation: Business needs change and a scheduled job needs to run immediately

You can expedite a job directly using the client:

import { SiloGRPCClient, JobNotFoundError } from "@silo-ai/client";
const client = new SiloGRPCClient({
servers: ["localhost:7450"],
});
// Expedite a job by ID
try {
await client.expediteJob("job-123");
console.log("Job expedited and ready to run immediately");
} catch (error) {
if (error instanceof JobNotFoundError) {
console.log("Job not found");
}
throw error;
}

If you’re using tenancy, include the tenant:

await client.expediteJob("job-123", "customer-456");

Job handles provide a convenient expedite() method:

// Create a handle for an existing job
const handle = client.handle("job-123");
await handle.expedite();
// Or with a tenant
const handle = client.handle("job-456", "customer-123");
await handle.expedite();

When you expedite a job, Silo:

  1. Finds the future-scheduled task in the task queue
  2. Updates the task timestamp to the current time, making it immediately ready
  3. Wakes up the task broker to pick up the newly available task
  4. Preserves all other job data including attempt number, priority, limits, and metadata

The job becomes immediately available for workers to lease and process.

Only jobs with future-scheduled tasks can be expedited:

ConditionCan Expedite?Reason
Future-scheduled task✅ YesTask timestamp is in the future
Mid-retry with backoff✅ YesRetry is scheduled for future due to exponential backoff
Ready to run now❌ NoTask is already at current time or earlier
Running❌ NoJob is currently being processed
Terminal (Succeeded/Failed)❌ NoJob has finished processing
Cancelled❌ NoJob was cancelled
No pending task❌ NoJob has no task in the queue

The expediteJob() method can throw several errors:

ErrorCondition
JobNotFoundErrorThe job ID does not exist
RpcError (FAILED_PRECONDITION)Job is currently running
RpcError (FAILED_PRECONDITION)Job is terminal (Succeeded or Failed)
RpcError (FAILED_PRECONDITION)Job is cancelled
RpcError (FAILED_PRECONDITION)Task is already ready to run (not future-scheduled)
RpcError (FAILED_PRECONDITION)Job has no pending task in queue
import { RpcError } from "@protobuf-ts/runtime-rpc";
try {
await handle.expedite();
console.log("Job expedited successfully");
} catch (error) {
if (error instanceof RpcError && error.code === "FAILED_PRECONDITION") {
// Check the message to understand why
console.log("Cannot expedite job:", error.message);
// e.g., "job is already running" or "task is already ready to run"
}
throw error;
}

The most common use case is expediting jobs that were enqueued with a future startAtMs:

import { JobStatus } from "@silo-ai/client";
// Enqueue a job to run 1 hour from now
const handle = await client.enqueue({
payload: { task: "process-data" },
startAtMs: BigInt(Date.now() + 3_600_000), // 1 hour
});
// Check that it's scheduled
const status = await handle.getStatus();
console.log(status); // JobStatus.Scheduled
// Business needs changed - run it now!
await handle.expedite();
// Job is now immediately available for workers

When a job is retrying with exponential backoff, you can skip the waiting period:

import { JobStatus } from "@silo-ai/client";
// A job failed and is scheduled to retry in 5 minutes
const handle = client.handle("failed-job-123");
const status = await handle.getStatus();
if (status === JobStatus.Scheduled) {
// You fixed the underlying issue and want to retry immediately
await handle.expedite();
console.log("Retry backoff skipped - job will run now");
}

If you want a job to run sooner but it’s not necessarily urgent, consider using priority instead of expediting:

// During enqueue, use higher priority
const handle = await client.enqueue({
payload: { task: "process-data" },
taskGroup: "data-processing",
priority: 0, // 0 is highest priority, processed sooner
});
// Expedite is for jobs that must run NOW
// Priority is for jobs that should run SOONER

Expedite is an immediate operation that bypasses time entirely. Priority adjusts ordering among ready jobs. Priority can’t be changed once a job has been enqueued.