siloctl CLI Reference

siloctl is a command-line tool for administering Silo clusters. It connects to Silo servers via gRPC and provides commands for cluster inspection, job management, and debugging.

Installation

siloctl is included in the Silo distribution. Build it with:

cargo build --release --bin siloctl

The binary will be at target/release/siloctl.

Global Options

These options apply to all commands:

Option	Description
`-a`, `--address <URL>`	Silo server address (default: `http://localhost:7450`)
`-t`, `--tenant <ID>`	Tenant ID for multi-tenant clusters
`--json`	Output in JSON format instead of human-readable tables
`-h`, `--help`	Print help information
`-V`, `--version`	Print version information

Commands

cluster info

Show cluster topology and shard ownership information.

siloctl cluster info

Example output:

Cluster Information
===================
Total shards: 8
Connected to: node-1 (10.0.0.5:7450)

Shard Ownership:
   Shard               Node ID  gRPC Address
------------------------------------------------------------
       0                node-1  10.0.0.5:7450
       1                node-1  10.0.0.5:7450
       2                node-2  10.0.0.6:7450
       3                node-2  10.0.0.6:7450
       4                node-3  10.0.0.7:7450
       5                node-3  10.0.0.7:7450
       6                node-4  10.0.0.8:7450
       7                node-4  10.0.0.8:7450

JSON output:

siloctl --json cluster info

{
  "num_shards": 8,
  "this_node_id": "node-1",
  "this_grpc_addr": "10.0.0.5:7450",
  "shard_owners": [
    {"shard_id": 0, "node_id": "node-1", "grpc_addr": "10.0.0.5:7450"},
    {"shard_id": 1, "node_id": "node-1", "grpc_addr": "10.0.0.5:7450"}
  ]
}

job get

Get detailed information about a specific job.

siloctl job get <shard> <id> [--attempts]

Argument	Description
`<shard>`	Shard ID where the job is stored
`<id>`	Job ID
`--attempts`	Include attempt history

Example:

siloctl job get 0 abc123-def456

Job Details
===========
ID:              abc123-def456
Status:          running
Priority:        50
Task Group:      default
Enqueued:        1706123456 (1706123456000ms)
Status Changed:  1706123460 (1706123460000ms)

Metadata:
  user_id: 12345
  request_id: req-789

With attempts:

siloctl job get 0 abc123-def456 --attempts

Job Details
===========
ID:              abc123-def456
Status:          failed
Priority:        50
Task Group:      default
Enqueued:        1706123456 (1706123456000ms)
Status Changed:  1706123500 (1706123500000ms)

Attempts:
 Attempt                              Task ID      Status              Finished
--------------------------------------------------------------------------------
       1      task-111-222-333-444      failed      1706123480 (1706123480000ms)
       2      task-555-666-777-888      failed      1706123500 (1706123500000ms)

job cancel

Cancel a running or scheduled job. Workers will be notified via heartbeat and should stop processing.

siloctl job cancel <shard> <id>

Argument	Description
`<shard>`	Shard ID where the job is stored
`<id>`	Job ID

Example:

siloctl job cancel 3 job-to-cancel

Job job-to-cancel cancelled successfully

job restart

Restart a cancelled or failed job. The job will be rescheduled with a fresh set of retry attempts.

siloctl job restart <shard> <id>

Argument	Description
`<shard>`	Shard ID where the job is stored
`<id>`	Job ID

Example:

siloctl job restart 2 failed-job-123

Job failed-job-123 restarted successfully

job expedite

Expedite a scheduled job to run immediately. This is useful for:

Dragging forward a job scheduled for the future
Skipping retry backoff delays for a job that’s waiting to retry

siloctl job expedite <shard> <id>

Argument	Description
`<shard>`	Shard ID where the job is stored
`<id>`	Job ID

Example:

siloctl job expedite 1 scheduled-job-456

Job scheduled-job-456 expedited successfully

job delete

Permanently delete a job and all its data.

siloctl job delete <shard> <id>

Argument	Description
`<shard>`	Shard ID where the job is stored
`<id>`	Job ID

Example:

siloctl job delete 0 old-job-789

Job old-job-789 deleted successfully

shard force-release

Force-release a shard’s ownership lease regardless of the current holder. This is an operator escape hatch for recovering from permanently lost nodes.

With permanent shard leases, a crashed node retains its shard leases to protect unflushed WAL data. If the node will never come back (e.g., the disk was destroyed), use this command to release the lease so another node can acquire the shard.

siloctl shard force-release <shard>

Argument	Description
`<shard>`	Shard ID (UUID) to force-release

Example:

siloctl shard force-release a1b2c3d4-e5f6-7890-abcd-ef1234567890

Force-released shard lease for a1b2c3d4-e5f6-7890-abcd-ef1234567890

query

Execute an SQL query against a shard’s data. Useful for debugging and inspection.

siloctl query <shard> "<sql>"

Argument	Description
`<shard>`	Shard ID to query
`<sql>`	SQL query string

Example:

siloctl query 0 "SELECT id, status, task_group FROM jobs LIMIT 5"

              id |          status |      task_group
----------------------------------------------------
   abc123-def456 |        running |         default
   ghi789-jkl012 |      scheduled |         default
   mno345-pqr678 |      succeeded |       payments
   stu901-vwx234 |         failed |       payments
   yza567-bcd890 |      scheduled |         emails

5 row(s) returned

JSON output for scripting:

siloctl --json query 0 "SELECT id, status FROM jobs WHERE status = 'failed' LIMIT 10"

{
  "columns": [
    {"name": "id", "data_type": "Utf8"},
    {"name": "status", "data_type": "Utf8"}
  ],
  "row_count": 2,
  "rows": [
    {"id": "job-123", "status": "failed"},
    {"id": "job-456", "status": "failed"}
  ]
}

profile

Capture a CPU profile from the connected Silo node. Useful for performance debugging in production.

siloctl profile [OPTIONS]

Option	Description
`-d`, `--duration <SECONDS>`	Profile duration in seconds (1-300, default: 30)
`-f`, `--frequency <HZ>`	Sampling frequency in Hz (1-1000, default: 100)
`-o`, `--output <FILE>`	Output file path (default: `profile-{timestamp}.pb.gz`)

Example:

siloctl -a http://silo-node:7450 profile --duration 30

Starting CPU profile for 30 seconds at 100Hz...

Profile saved to: profile-1706123456.pb.gz
Duration: 30s, Samples: 2847, Size: 45632 bytes

Analyze with:
  pprof -http=:8080 profile-1706123456.pb.gz
  go tool pprof -http=:8080 profile-1706123456.pb.gz

With custom options:

siloctl profile --duration 60 --frequency 250 --output my-profile.pb.gz

JSON output:

siloctl --json profile --duration 10

{
  "status": "completed",
  "output_file": "profile-1706123456.pb.gz",
  "duration_seconds": 10,
  "samples": 987,
  "profile_bytes": 12345
}

validate-config

Validate a Silo configuration file without starting the server. This is useful for checking configuration syntax and semantics before deployment.

siloctl validate-config --config <path>

Option	Description
`-c`, `--config <PATH>`	Path to the TOML configuration file to validate

Example:

siloctl validate-config --config /etc/silo/config.toml

Config is valid: /etc/silo/config.toml

JSON output:

siloctl --json validate-config --config /etc/silo/config.toml

{"status": "valid", "config_path": "/etc/silo/config.toml"}

Error example:

siloctl validate-config --config invalid-config.toml

Config error: missing field `database`
Error: Config validation failed: missing field `database`

Common Workflows

Debugging a Stuck Job

Find the job status:

siloctl job get 0 problematic-job-id --attempts

If the job is stuck in retry backoff, expedite it:

siloctl job expedite 0 problematic-job-id

Or cancel and restart it:

siloctl job cancel 0 problematic-job-id
siloctl job restart 0 problematic-job-id

Recovering from a Lost Node

If a node is permanently lost (e.g., disk destroyed, VM terminated), its shard leases persist and block other nodes from acquiring those shards. Force-release the leases to recover:

# Check which shards the lost node owned
siloctl cluster info

# Force-release each shard owned by the lost node
siloctl shard force-release <shard-id-1>
siloctl shard force-release <shard-id-2>

# Verify other nodes acquired the shards
siloctl cluster info

Inspecting Cluster Health

# Check cluster topology
siloctl cluster info

# Query job counts per status
siloctl query 0 "SELECT status, COUNT(*) as count FROM jobs GROUP BY status"

Scripting with JSON Output

# Get all failed jobs and process with jq
siloctl --json query 0 "SELECT id FROM jobs WHERE status = 'failed'" \
  | jq -r '.rows[].id' \
  | while read job_id; do
      siloctl job restart 0 "$job_id"
    done

Multi-Tenant Operations

When working with a multi-tenant cluster, always specify the tenant:

siloctl -t customer-123 job get 0 job-id
siloctl -t customer-123 job cancel 0 job-id

Connecting to a Remote Cluster

# Connect to production
siloctl -a http://silo.production.internal:7450 cluster info

# Or set an environment variable (not directly supported, but you can alias)
alias siloctl-prod='siloctl -a http://silo.production.internal:7450'
siloctl-prod cluster info