siloctl CLI Reference
siloctl is a command-line tool for administering Silo clusters. It connects to Silo servers via gRPC and provides commands for cluster inspection, job management, and debugging.
Installation
Section titled “Installation”siloctl is included in the Silo distribution. Build it with:
cargo build --release --bin siloctlThe binary will be at target/release/siloctl.
Global Options
Section titled “Global Options”These options apply to all commands:
| Option | Description |
|---|---|
-a, --address <URL> | Silo server address (default: http://localhost:7450) |
-t, --tenant <ID> | Tenant ID for multi-tenant clusters |
--json | Output in JSON format instead of human-readable tables |
-h, --help | Print help information |
-V, --version | Print version information |
Commands
Section titled “Commands”cluster info
Section titled “cluster info”Show cluster topology and shard ownership information.
siloctl cluster infoExample output:
Cluster Information===================Total shards: 8Connected to: node-1 (10.0.0.5:7450)
Shard Ownership: Shard Node ID gRPC Address------------------------------------------------------------ 0 node-1 10.0.0.5:7450 1 node-1 10.0.0.5:7450 2 node-2 10.0.0.6:7450 3 node-2 10.0.0.6:7450 4 node-3 10.0.0.7:7450 5 node-3 10.0.0.7:7450 6 node-4 10.0.0.8:7450 7 node-4 10.0.0.8:7450JSON output:
siloctl --json cluster info{ "num_shards": 8, "this_node_id": "node-1", "this_grpc_addr": "10.0.0.5:7450", "shard_owners": [ {"shard_id": 0, "node_id": "node-1", "grpc_addr": "10.0.0.5:7450"}, {"shard_id": 1, "node_id": "node-1", "grpc_addr": "10.0.0.5:7450"} ]}job get
Section titled “job get”Get detailed information about a specific job.
siloctl job get <shard> <id> [--attempts]| Argument | Description |
|---|---|
<shard> | Shard ID where the job is stored |
<id> | Job ID |
--attempts | Include attempt history |
Example:
siloctl job get 0 abc123-def456Job Details===========ID: abc123-def456Status: runningPriority: 50Task Group: defaultEnqueued: 1706123456 (1706123456000ms)Status Changed: 1706123460 (1706123460000ms)
Metadata: user_id: 12345 request_id: req-789With attempts:
siloctl job get 0 abc123-def456 --attemptsJob Details===========ID: abc123-def456Status: failedPriority: 50Task Group: defaultEnqueued: 1706123456 (1706123456000ms)Status Changed: 1706123500 (1706123500000ms)
Attempts: Attempt Task ID Status Finished-------------------------------------------------------------------------------- 1 task-111-222-333-444 failed 1706123480 (1706123480000ms) 2 task-555-666-777-888 failed 1706123500 (1706123500000ms)job cancel
Section titled “job cancel”Cancel a running or scheduled job. Workers will be notified via heartbeat and should stop processing.
siloctl job cancel <shard> <id>| Argument | Description |
|---|---|
<shard> | Shard ID where the job is stored |
<id> | Job ID |
Example:
siloctl job cancel 3 job-to-cancelJob job-to-cancel cancelled successfullyjob restart
Section titled “job restart”Restart a cancelled or failed job. The job will be rescheduled with a fresh set of retry attempts.
siloctl job restart <shard> <id>| Argument | Description |
|---|---|
<shard> | Shard ID where the job is stored |
<id> | Job ID |
Example:
siloctl job restart 2 failed-job-123Job failed-job-123 restarted successfullyjob expedite
Section titled “job expedite”Expedite a scheduled job to run immediately. This is useful for:
- Dragging forward a job scheduled for the future
- Skipping retry backoff delays for a job that’s waiting to retry
siloctl job expedite <shard> <id>| Argument | Description |
|---|---|
<shard> | Shard ID where the job is stored |
<id> | Job ID |
Example:
siloctl job expedite 1 scheduled-job-456Job scheduled-job-456 expedited successfullyjob delete
Section titled “job delete”Permanently delete a job and all its data.
siloctl job delete <shard> <id>| Argument | Description |
|---|---|
<shard> | Shard ID where the job is stored |
<id> | Job ID |
Example:
siloctl job delete 0 old-job-789Job old-job-789 deleted successfullyshard force-release
Section titled “shard force-release”Force-release a shard’s ownership lease regardless of the current holder. This is an operator escape hatch for recovering from permanently lost nodes.
With permanent shard leases, a crashed node retains its shard leases to protect unflushed WAL data. If the node will never come back (e.g., the disk was destroyed), use this command to release the lease so another node can acquire the shard.
siloctl shard force-release <shard>| Argument | Description |
|---|---|
<shard> | Shard ID (UUID) to force-release |
Example:
siloctl shard force-release a1b2c3d4-e5f6-7890-abcd-ef1234567890Force-released shard lease for a1b2c3d4-e5f6-7890-abcd-ef1234567890Execute an SQL query against a shard’s data. Useful for debugging and inspection.
siloctl query <shard> "<sql>"| Argument | Description |
|---|---|
<shard> | Shard ID to query |
<sql> | SQL query string |
Example:
siloctl query 0 "SELECT id, status, task_group FROM jobs LIMIT 5" id | status | task_group---------------------------------------------------- abc123-def456 | running | default ghi789-jkl012 | scheduled | default mno345-pqr678 | succeeded | payments stu901-vwx234 | failed | payments yza567-bcd890 | scheduled | emails
5 row(s) returnedJSON output for scripting:
siloctl --json query 0 "SELECT id, status FROM jobs WHERE status = 'failed' LIMIT 10"{ "columns": [ {"name": "id", "data_type": "Utf8"}, {"name": "status", "data_type": "Utf8"} ], "row_count": 2, "rows": [ {"id": "job-123", "status": "failed"}, {"id": "job-456", "status": "failed"} ]}profile
Section titled “profile”Capture a CPU profile from the connected Silo node. Useful for performance debugging in production.
siloctl profile [OPTIONS]| Option | Description |
|---|---|
-d, --duration <SECONDS> | Profile duration in seconds (1-300, default: 30) |
-f, --frequency <HZ> | Sampling frequency in Hz (1-1000, default: 100) |
-o, --output <FILE> | Output file path (default: profile-{timestamp}.pb.gz) |
Example:
siloctl -a http://silo-node:7450 profile --duration 30Starting CPU profile for 30 seconds at 100Hz...
Profile saved to: profile-1706123456.pb.gzDuration: 30s, Samples: 2847, Size: 45632 bytes
Analyze with: pprof -http=:8080 profile-1706123456.pb.gz go tool pprof -http=:8080 profile-1706123456.pb.gzWith custom options:
siloctl profile --duration 60 --frequency 250 --output my-profile.pb.gzJSON output:
siloctl --json profile --duration 10{ "status": "completed", "output_file": "profile-1706123456.pb.gz", "duration_seconds": 10, "samples": 987, "profile_bytes": 12345}validate-config
Section titled “validate-config”Validate a Silo configuration file without starting the server. This is useful for checking configuration syntax and semantics before deployment.
siloctl validate-config --config <path>| Option | Description |
|---|---|
-c, --config <PATH> | Path to the TOML configuration file to validate |
Example:
siloctl validate-config --config /etc/silo/config.tomlConfig is valid: /etc/silo/config.tomlJSON output:
siloctl --json validate-config --config /etc/silo/config.toml{"status": "valid", "config_path": "/etc/silo/config.toml"}Error example:
siloctl validate-config --config invalid-config.tomlConfig error: missing field `database`Error: Config validation failed: missing field `database`Common Workflows
Section titled “Common Workflows”Debugging a Stuck Job
Section titled “Debugging a Stuck Job”- Find the job status:
siloctl job get 0 problematic-job-id --attempts- If the job is stuck in retry backoff, expedite it:
siloctl job expedite 0 problematic-job-id- Or cancel and restart it:
siloctl job cancel 0 problematic-job-idsiloctl job restart 0 problematic-job-idRecovering from a Lost Node
Section titled “Recovering from a Lost Node”If a node is permanently lost (e.g., disk destroyed, VM terminated), its shard leases persist and block other nodes from acquiring those shards. Force-release the leases to recover:
# Check which shards the lost node ownedsiloctl cluster info
# Force-release each shard owned by the lost nodesiloctl shard force-release <shard-id-1>siloctl shard force-release <shard-id-2>
# Verify other nodes acquired the shardssiloctl cluster infoInspecting Cluster Health
Section titled “Inspecting Cluster Health”# Check cluster topologysiloctl cluster info
# Query job counts per statussiloctl query 0 "SELECT status, COUNT(*) as count FROM jobs GROUP BY status"Scripting with JSON Output
Section titled “Scripting with JSON Output”# Get all failed jobs and process with jqsiloctl --json query 0 "SELECT id FROM jobs WHERE status = 'failed'" \ | jq -r '.rows[].id' \ | while read job_id; do siloctl job restart 0 "$job_id" doneMulti-Tenant Operations
Section titled “Multi-Tenant Operations”When working with a multi-tenant cluster, always specify the tenant:
siloctl -t customer-123 job get 0 job-idsiloctl -t customer-123 job cancel 0 job-idConnecting to a Remote Cluster
Section titled “Connecting to a Remote Cluster”# Connect to productionsiloctl -a http://silo.production.internal:7450 cluster info
# Or set an environment variable (not directly supported, but you can alias)alias siloctl-prod='siloctl -a http://silo.production.internal:7450'siloctl-prod cluster info