Cancel, Restart, Delete

Jobs in Silo can be cancelled to stop processing, restarted to retry after failure, deleted to permanently remove them from the system, or expedited to run immediately if they were scheduled for the future.

Cancelling Jobs

Cancellation requests that a job stop processing. The behavior depends on the job’s current state:

Scheduled jobs: The job is immediately marked as Cancelled and will never run
Running jobs: A cancellation flag is set and the worker discovers this on its next heartbeat. The worker’s task then should stop processing and report a cancelled outcome, but the specific JS task handler can take arbitrarily long to do this as long as it continues to hold the lease.

Using the Client

You can cancel a job directly using the client:

import { SiloGRPCClient, JobNotFoundError } from "@silo-ai/client";

const client = new SiloGRPCClient({
  servers: ["localhost:7450"],
});

// Cancel a job by ID
try {
  await client.cancelJob("job-123");
  console.log("Job cancelled");
} catch (error) {
  if (error instanceof JobNotFoundError) {
    console.log("Job not found");
  }
  throw error;
}

If you’re using tenancy, include the tenant:

await client.cancelJob("job-123", "customer-456");

Using Job Handles

Job handles provide a convenient cancel() method:

// From enqueue
const handle = await client.enqueue({
  payload: { task: "process-data" }
});

// Cancel anytime later
await handle.cancel();

Or create a handle for an existing job:

// Create a handle from a known job ID
const handle = client.handle("job-123");
await handle.cancel();

// Or with a tenant
const handle = client.handle("job-456", "customer-123");
await handle.cancel();

Cancellation Errors

The cancelJob() method can throw several errors:

Error	Condition
`JobNotFoundError`	The job ID does not exist
`RpcError` (FAILED_PRECONDITION)	The job is already cancelled
`RpcError` (FAILED_PRECONDITION)	The job is already in a terminal state (Succeeded or Failed)

How Workers Discover Cancellation

The SiloWorker handles heartbeats automatically. When a job is cancelled, the worker detects it on the next heartbeat and aborts the cancellationSignal passed to your handler. Your handler should check this signal and stop work:

const worker = new SiloWorker({
  client,
  workerId: "worker-1",
  taskGroup: "data-processing",
  handler: async (ctx) => {
    for (const item of ctx.task.payload.items) {
      // Check for cancellation between units of work
      if (ctx.cancellationSignal.aborted) {
        return { type: "cancelled" };
      }
      await processItem(item);
    }
    return { type: "success", result: { processed: true } };
  }
});

You can also pass the signal to APIs that accept AbortSignal:

handler: async (ctx) => {
  const response = await fetch(ctx.task.payload.url, {
    signal: ctx.cancellationSignal
  });
  // ...
}

Deleting Jobs

Deletion permanently removes a job and all its data from Silo. Unlike cancellation, deletion completely erases the job from storage.

Using the Client

// Delete a job by ID
await client.deleteJob("job-123");

// With tenant
await client.deleteJob("job-456", "customer-123");

Using Job Handles

const handle = client.handle("job-123");
await handle.delete();

Deletion Requirements

Deletion Errors

Error	Condition
`JobNotFoundError`	The job ID does not exist
`RpcError` (INTERNAL)	The job is still in progress (Scheduled or Running)

To delete a running job, first cancel it, then delete:

const handle = client.handle("job-123");

// First cancel the job
await handle.cancel();

// Wait for cancellation to complete if needed
// (Running jobs need time for the worker to acknowledge)
const status = await handle.getStatus();
if (status === JobStatus.Cancelled) {
  await handle.delete();
}

Restarting Jobs

Restarting allows you to re-run a job that has stopped—either because it was cancelled or because it failed after exhausting its retries. The job is re-queued with a fresh retry counter, giving it another set of chances to successfully complete.

When to Restart

Restart is useful in several scenarios:

Accidental cancellation: A job was cancelled by mistake and needs to run
Transient failures: A job failed due to temporary issues (service outage, rate limits) that have been resolved, and an operator wants to manually gie it more retries
Manual retry: You want to give a failed job another attempt outside of its automatic retry policy because you really want it to succeed

Using the Client

You can restart a job directly using the client:

import { SiloGRPCClient, JobNotFoundError } from "@silo-ai/client";

const client = new SiloGRPCClient({
  servers: ["localhost:7450"],
});

// Restart a job by ID
await client.restartJob("job-123");
console.log("Job restarted and re-queued");

If you’re using tenancy, include the tenant:

await client.restartJob("job-123", "customer-456");

Using Job Handles

Job handles provide a convenient restart() method:

// Create a handle for an existing job
const handle = client.handle("job-123");
await handle.restart();

// Or with a tenant
const handle = client.handle("job-456", "customer-123");
await handle.restart();

What Restart Does

When you restart a job, Silo:

Clears the cancellation flag (if the job was cancelled)
Creates a new task with attempt_number = 1, resetting the retry counter
Sets the status to Scheduled, placing the job back in the queue for immediate processing
Preserves the original job data including payload, priority, limits, and metadata

The job will be picked up by the next available worker and processed as if it were newly enqueued.

Restart Requirements

Only jobs in terminal-but-recoverable states can be restarted:

Status	Can Restart?	Reason
Cancelled	✅ Yes	Job was stopped before completion
Failed	✅ Yes	Job failed but can be retried
Succeeded	❌ No	Job completed successfully—nothing to retry
Scheduled	❌ No	Job is already queued to run
Running	❌ No	Job is currently being processed

Restart Errors

The restartJob() method can throw several errors:

Error	Condition
`JobNotFoundError`	The job ID does not exist
`RpcError` (FAILED_PRECONDITION)	Job already succeeded (truly terminal)
`RpcError` (FAILED_PRECONDITION)	Job is still in progress (Scheduled or Running)

import { RpcError } from "@protobuf-ts/runtime-rpc";

try {
  await handle.restart();
  console.log("Job restarted successfully");
} catch (error) {
  if (error instanceof RpcError && error.code === "FAILED_PRECONDITION") {
    // Check the message to understand why
    console.log("Cannot restart job:", error.message);
    // e.g., "job already succeeded" or "job is still in progress"
  }
  throw error;
}

Restarting Failed Jobs

A common pattern is to monitor for failed jobs and restart them after fixing the underlying issue:

import { JobStatus } from "@silo-ai/client";

// Check if a job failed
const handle = client.handle("job-123", "customer-456");
const status = await handle.getStatus();

if (status === JobStatus.Failed) {
  // Get job details to understand the failure
  const job = await handle.getJob();
  console.log(`Job failed at ${job.statusChangedAtMs}`);

  // After fixing the issue, restart the job
  await handle.restart();
  console.log("Job restarted");
}

Restarting Cancelled Jobs

If a job was cancelled by mistake, you can restart it to allow processing:

import { JobStatus } from "@silo-ai/client";

const handle = client.handle("job-123");
const status = await handle.getStatus();

if (status === JobStatus.Cancelled) {
  // Restart the cancelled job
  await handle.restart();
  console.log("Cancelled job has been restarted");
}

Expediting Jobs

Expediting allows you to make a future-scheduled job or attempt run immediately, skipping any scheduled delay. This is useful for dragging forward jobs that were scheduled for later or for bypassing retry backoff delays.

When to Expedite

Expedite is useful in several scenarios:

User-initiated urgency: A user requests immediate processing of a scheduled job
Skip retry delays: A job is waiting for retry backoff, but you’ve fixed the issue and want it to run now
Testing scheduled jobs: You want to test a future-scheduled job without waiting
Priority escalation: Business needs change and a scheduled job needs to run immediately

Using the Client

You can expedite a job directly using the client:

import { SiloGRPCClient, JobNotFoundError } from "@silo-ai/client";

const client = new SiloGRPCClient({
  servers: ["localhost:7450"],
});

// Expedite a job by ID
try {
  await client.expediteJob("job-123");
  console.log("Job expedited and ready to run immediately");
} catch (error) {
  if (error instanceof JobNotFoundError) {
    console.log("Job not found");
  }
  throw error;
}

If you’re using tenancy, include the tenant:

await client.expediteJob("job-123", "customer-456");

Using Job Handles

Job handles provide a convenient expedite() method:

// Create a handle for an existing job
const handle = client.handle("job-123");
await handle.expedite();

// Or with a tenant
const handle = client.handle("job-456", "customer-123");
await handle.expedite();

What Expedite Does

When you expedite a job, Silo:

Finds the future-scheduled task in the task queue
Updates the task timestamp to the current time, making it immediately ready
Wakes up the task broker to pick up the newly available task
Preserves all other job data including attempt number, priority, limits, and metadata

The job becomes immediately available for workers to lease and process.

Expedite Requirements

Only jobs with future-scheduled tasks can be expedited:

Condition	Can Expedite?	Reason
Future-scheduled task	✅ Yes	Task timestamp is in the future
Mid-retry with backoff	✅ Yes	Retry is scheduled for future due to exponential backoff
Ready to run now	❌ No	Task is already at current time or earlier
Running	❌ No	Job is currently being processed
Terminal (Succeeded/Failed)	❌ No	Job has finished processing
Cancelled	❌ No	Job was cancelled
No pending task	❌ No	Job has no task in the queue

Expedite Errors

The expediteJob() method can throw several errors:

Error	Condition
`JobNotFoundError`	The job ID does not exist
`RpcError` (FAILED_PRECONDITION)	Job is currently running
`RpcError` (FAILED_PRECONDITION)	Job is terminal (Succeeded or Failed)
`RpcError` (FAILED_PRECONDITION)	Job is cancelled
`RpcError` (FAILED_PRECONDITION)	Task is already ready to run (not future-scheduled)
`RpcError` (FAILED_PRECONDITION)	Job has no pending task in queue

import { RpcError } from "@protobuf-ts/runtime-rpc";

try {
  await handle.expedite();
  console.log("Job expedited successfully");
} catch (error) {
  if (error instanceof RpcError && error.code === "FAILED_PRECONDITION") {
    // Check the message to understand why
    console.log("Cannot expedite job:", error.message);
    // e.g., "job is already running" or "task is already ready to run"
  }
  throw error;
}

Expediting Scheduled Jobs

The most common use case is expediting jobs that were enqueued with a future startAtMs:

import { JobStatus } from "@silo-ai/client";

// Enqueue a job to run 1 hour from now
const handle = await client.enqueue({
  payload: { task: "process-data" },
  startAtMs: BigInt(Date.now() + 3_600_000), // 1 hour
});

// Check that it's scheduled
const status = await handle.getStatus();
console.log(status); // JobStatus.Scheduled

// Business needs changed - run it now!
await handle.expedite();

// Job is now immediately available for workers

Expediting Mid-Retry Jobs

When a job is retrying with exponential backoff, you can skip the waiting period:

import { JobStatus } from "@silo-ai/client";

// A job failed and is scheduled to retry in 5 minutes
const handle = client.handle("failed-job-123");
const status = await handle.getStatus();

if (status === JobStatus.Scheduled) {
  // You fixed the underlying issue and want to retry immediately
  await handle.expedite();
  console.log("Retry backoff skipped - job will run now");
}

Expediting vs Higher Priority

If you want a job to run sooner but it’s not necessarily urgent, consider using priority instead of expediting:

// During enqueue, use higher priority
const handle = await client.enqueue({
  payload: { task: "process-data" },
  taskGroup: "data-processing",
  priority: 0, // 0 is highest priority, processed sooner
});

// Expedite is for jobs that must run NOW
// Priority is for jobs that should run SOONER

Expedite is an immediate operation that bypasses time entirely. Priority adjusts ordering among ready jobs. Priority can’t be changed once a job has been enqueued.

Next Steps

Learn about running workers to handle job execution and cancellation
Set up observability to monitor cancellations and failures
Explore concurrency limits to control job execution