Querying Jobs

Silo includes a built-in SQL query engine powered by Apache DataFusion. You can run SQL queries to inspect jobs, monitor queue health, and debug issues — from the TypeScript client, the siloctl CLI, or the web UI.

Running Queries

From the TypeScript Client

// Rows are automatically deserialized from MessagePack — no manual decoding needed.
const result = await client.query(
  "SELECT id, status_kind, priority FROM jobs WHERE tenant = 'customer-123' LIMIT 10",
  "customer-123"  // tenant for shard routing
);

for (const row of result.rows) {
  console.log(row);  // { id: "job-abc", status_kind: "Running", priority: 10 }
}

// You can pass a generic type parameter for typed rows:
interface JobRow {
  id: string;
  status_kind: string;
  count: number;
}

const typed = await client.query<JobRow>(
  "SELECT id, status_kind, COUNT(*) as count FROM jobs GROUP BY id, status_kind",
  "customer-123"
);

for (const row of typed.rows) {
  console.log(row.id, row.status_kind, row.count);  // fully typed
}

The second argument is the tenant used for routing the query to the correct shard. If you’re not using multi-tenancy, you can omit it.

You can also pass positional bind parameters as a third argument. Placeholders use $1, $2, … and are matched by position.

const tenant = "customer-123";
const minPriority = 10;

const result = await client.query(
  "SELECT id, priority FROM jobs WHERE tenant = $1 AND priority >= $2 ORDER BY priority ASC LIMIT 25",
  tenant,
  [tenant, minPriority]
);

Supported bind parameter types in the TypeScript client are boolean, number, string, Uint8Array, and null.

The result includes:

columns — schema information (name and dataType for each column)
rows — deserialized JavaScript objects, one per row
rowCount — total number of rows returned

From siloctl

# Query a specific shard
siloctl query <shard-id> "SELECT id, status_kind FROM jobs LIMIT 10"

# With a tenant
siloctl -t customer-123 query <shard-id> "SELECT id, status_kind FROM jobs WHERE tenant = 'customer-123'"

# JSON output for scripting
siloctl --json query <shard-id> "SELECT id, status_kind FROM jobs WHERE status_kind = 'Failed'"

Tables

Silo exposes two tables: jobs and queues.

`jobs` Table

The primary table containing all job data.

Column	Type	Description
`shard_id`	String	UUID of the shard storing this job
`tenant`	String	Tenant identifier
`id`	String	Job ID
`priority`	UInt8	Priority (0 = highest, 99 = lowest)
`enqueue_time_ms`	Int64	Unix timestamp in milliseconds when the job was enqueued
`payload`	String (nullable)	JSON-encoded job payload
`status_kind`	String (nullable)	Current job status (see Status Values)
`status_changed_at_ms`	Int64 (nullable)	Unix timestamp in milliseconds of last status change
`task_group`	String	Task group the job belongs to
`current_attempt`	UInt32 (nullable)	Current attempt number (1-indexed). Present for `Waiting`/`Scheduled` jobs, null for `Running` and terminal statuses
`next_attempt_starts_after_ms`	Int64 (nullable)	Unix timestamp in milliseconds when the next attempt is scheduled to start. Present for `Waiting`/`Scheduled` jobs, null otherwise
`metadata`	Map<String, String> (nullable)	Key-value metadata attached to the job

`queues` Table

Shows the current state of concurrency queues — which jobs hold concurrency tickets and which are waiting.

Column	Type	Description
`shard_id`	String	UUID of the shard
`tenant`	String	Tenant identifier
`queue_name`	String	Name of the concurrency queue (the limit key)
`entry_type`	String	`"holder"` (has a ticket) or `"requester"` (waiting for one)
`task_id`	String	Task or request ID
`job_id`	String (nullable)	Job ID (present for requesters)
`priority`	UInt8 (nullable)	Job priority (present for requesters)
`timestamp_ms`	Int64	When the ticket was granted or the request was made

Status Values

The status_kind column has the following values:

Status	Meaning
`Waiting`	Ready to run — the job’s start time has arrived and it’s in the task queue
`Scheduled`	Scheduled for the future — the job’s start time hasn’t arrived yet, or it’s waiting for a retry backoff
`Running`	Currently being processed by a worker
`Succeeded`	Completed successfully
`Failed`	Failed after exhausting all retries
`Cancelled`	Cancelled by the user

Example Queries

Listing Jobs

-- All jobs for a tenant
SELECT id, status_kind, priority, task_group
FROM jobs
WHERE tenant = 'customer-123'
ORDER BY enqueue_time_ms DESC
LIMIT 50

-- Jobs in a specific task group
SELECT id, status_kind, enqueue_time_ms
FROM jobs
WHERE tenant = 'customer-123' AND task_group = 'emails'
ORDER BY enqueue_time_ms DESC

Filtering by Status

-- All currently running jobs
SELECT id, task_group, priority
FROM jobs
WHERE tenant = 'customer-123' AND status_kind = 'Running'

-- Failed jobs
SELECT id, task_group, status_changed_at_ms
FROM jobs
WHERE tenant = 'customer-123' AND status_kind = 'Failed'
ORDER BY status_changed_at_ms DESC

-- Jobs waiting to be picked up
SELECT id, priority, task_group
FROM jobs
WHERE tenant = 'customer-123' AND status_kind = 'Waiting'
ORDER BY priority ASC

-- Future-scheduled jobs (including retry backoffs)
SELECT id, task_group
FROM jobs
WHERE tenant = 'customer-123' AND status_kind = 'Scheduled'

-- New future-scheduled jobs (never attempted)
SELECT id, task_group, next_attempt_starts_after_ms
FROM jobs
WHERE tenant = 'customer-123' AND status_kind = 'Scheduled' AND current_attempt = 1

-- Jobs waiting to retry after a failure (attempted at least once)
SELECT id, task_group, current_attempt, next_attempt_starts_after_ms
FROM jobs
WHERE tenant = 'customer-123' AND status_kind = 'Scheduled' AND current_attempt > 1

Aggregations

-- Count jobs by status
SELECT status_kind, COUNT(*) as count
FROM jobs
WHERE tenant = 'customer-123'
GROUP BY status_kind

-- Count by task group and status
SELECT task_group, status_kind, COUNT(*) as count
FROM jobs
WHERE tenant = 'customer-123'
GROUP BY task_group, status_kind
ORDER BY task_group, status_kind

-- Average priority by task group
SELECT task_group, AVG(CAST(priority AS DOUBLE)) as avg_priority
FROM jobs
WHERE tenant = 'customer-123'
GROUP BY task_group

Querying Metadata

Jobs can have arbitrary key-value metadata. Querying metadata uses DataFusion’s map functions.

Exact match — find jobs where a metadata key equals a specific value:

SELECT id, status_kind
FROM jobs
WHERE tenant = 'customer-123'
  AND array_contains(element_at(metadata, 'env'), 'production')

Prefix match — find jobs where a metadata value starts with a prefix:

SELECT id, status_kind
FROM jobs
WHERE tenant = 'customer-123'
  AND starts_with(array_any_value(element_at(metadata, 'region')), 'us-')

You can also use LIKE for prefix matching:

SELECT id, status_kind
FROM jobs
WHERE tenant = 'customer-123'
  AND array_any_value(element_at(metadata, 'region')) LIKE 'us-%'

Combining metadata filters with status:

SELECT id, status_kind, priority
FROM jobs
WHERE tenant = 'customer-123'
  AND array_contains(element_at(metadata, 'env'), 'production')
  AND status_kind = 'Running'

Inspecting Concurrency Queues

-- See all holders and requesters for a queue
SELECT queue_name, entry_type, task_id, job_id, timestamp_ms
FROM queues
WHERE tenant = 'customer-123' AND queue_name = 'user:456'

-- Count holders vs requesters per queue
SELECT queue_name, entry_type, COUNT(*) as count
FROM queues
WHERE tenant = 'customer-123'
GROUP BY queue_name, entry_type

-- Find queues with waiting requesters (indicates jobs are blocked on concurrency)
SELECT queue_name, COUNT(*) as waiting
FROM queues
WHERE tenant = 'customer-123' AND entry_type = 'requester'
GROUP BY queue_name
ORDER BY waiting DESC

Looking Up a Specific Job

-- Get full details for a job by ID
SELECT *
FROM jobs
WHERE tenant = 'customer-123' AND id = 'job-abc-123'

Writing Performant Queries

Silo uses predicate pushdown to select the most efficient scan strategy based on your WHERE clause. Understanding how this works helps you write fast queries.

Always filter by tenant

When multi-tenancy is enabled, always include a WHERE tenant = '...' condition. Without it, Silo must scan all tenants on the shard, which is much slower.

-- Fast: scans only one tenant's data
SELECT id FROM jobs WHERE tenant = 'customer-123' AND status_kind = 'Running'

-- Slow: scans all tenants on the shard
SELECT id FROM jobs WHERE status_kind = 'Running'

Scan strategy priority

Silo picks the best scan strategy from your filters in this priority order:

Exact ID lookup — WHERE id = '...' — direct key lookup, fastest
Metadata exact match — WHERE array_contains(element_at(metadata, 'key'), 'value') — uses the metadata index
Metadata prefix — WHERE starts_with(array_any_value(element_at(metadata, 'key')), 'prefix') — uses the metadata index with prefix scan
Status filter — WHERE status_kind = '...' — uses the status/time index
Full scan — no indexed filter — scans all jobs for the tenant

Only one index-backed filter is used per query. If you combine filters (e.g., status + metadata), the highest-priority one drives the scan and the rest are applied as post-filters.

-- Uses ExactId scan (priority 1), even though status filter is also present
SELECT * FROM jobs WHERE tenant = 'customer-123' AND id = 'job-abc' AND status_kind = 'Running'

-- Uses MetadataExact scan (priority 2)
SELECT * FROM jobs WHERE tenant = 'customer-123' AND array_contains(element_at(metadata, 'env'), 'prod')

-- Uses Status scan (priority 4)
SELECT * FROM jobs WHERE tenant = 'customer-123' AND status_kind = 'Failed'

Use LIMIT

Always add LIMIT to exploratory queries. Without it, Silo will scan and return all matching rows.

-- Good: bounded result set
SELECT id, status_kind FROM jobs WHERE tenant = 'customer-123' LIMIT 100

-- Risky with large tenants: returns everything
SELECT id, status_kind FROM jobs WHERE tenant = 'customer-123'

Cross-shard queries

The TypeScript client routes queries to a single shard based on the tenant. If you need to query across all shards (e.g., for cluster-wide aggregations), use siloctl to query each shard, or use the cluster query engine if available.

# Query each shard separately
siloctl query <shard-0> "SELECT COUNT(*) as count FROM jobs"
siloctl query <shard-1> "SELECT COUNT(*) as count FROM jobs"

Supported SQL

Silo’s query engine supports standard SQL via Apache DataFusion:

SELECT with column projection and *
WHERE with AND, OR, comparisons (=, !=, >, <, >=, <=), LIKE, IN
ORDER BY (single or multiple columns, ASC/DESC)
LIMIT and OFFSET
GROUP BY with aggregate functions (COUNT, SUM, AVG, MIN, MAX)
CAST for type conversions
CASE expressions
Arithmetic expressions
Map functions: element_at(), array_contains(), array_any_value(), starts_with()

Next Steps

Use siloctl for command-line query access and common workflows
Set up observability for continuous monitoring alongside ad-hoc queries
Learn about concurrency limits to understand the queues table