Server Configuration
Silo is configured using a TOML configuration file. You can specify the configuration file path using the -c or --config CLI flag:
silo -c /path/to/config.tomlIf no configuration file is specified, Silo uses sensible defaults suitable for local development.
CLI Arguments
Section titled “CLI Arguments”| Argument | Description |
|---|---|
-c, --config <path> | Path to a TOML configuration file |
-v | Enable verbose output |
Validating Your Config
Section titled “Validating Your Config”Validate your configuration without starting the server using siloctl validate-config:
siloctl validate-config --config config.tomlThis command parses the configuration file and reports any errors.
Server Configuration
Section titled “Server Configuration”The [server] section configures the main gRPC server.
[server]grpc_addr = "127.0.0.1:7450"dev_mode = falsestatement_timeout_ms = 5000| Option | Type | Default | Description |
|---|---|---|---|
grpc_addr | string | "127.0.0.1:7450" | Address and port for the gRPC server to listen on |
dev_mode | bool | false | Enable development mode features like ResetShards RPC. Never enable in production. |
statement_timeout_ms | number | 5000 | Maximum SQL statement execution time in milliseconds. Query execution is aborted when this timeout is hit. Set to 0 to disable statement timeout. |
auth_token | string | (none) | Shared secret for gRPC authentication. When set, all incoming gRPC requests must include this token as a Bearer token in the authorization metadata header. When unset (default), authentication is disabled. |
gRPC Authentication
Section titled “gRPC Authentication”Silo supports optional shared-secret authentication for gRPC requests. When auth_token is set in the [server] section, all incoming RPCs must include an authorization: Bearer <token> metadata header matching the configured value. Requests without a valid token are rejected with UNAUTHENTICATED.
When auth_token is not set (the default), authentication is disabled and all clients can connect freely.
[server]grpc_addr = "0.0.0.0:7450"auth_token = "${SILO_AUTH_TOKEN}"When authentication is enabled, node-to-node cluster communication and WebUI remote queries automatically use the configured token. External clients (workers, siloctl) must provide the token themselves.
siloctl supports the token via the --auth-token flag or the SILO_AUTH_TOKEN environment variable:
siloctl --auth-token <token> cluster info# orSILO_AUTH_TOKEN=<token> siloctl cluster infoDatabase Configuration
Section titled “Database Configuration”The [database] section configures how Silo stores job data. Silo uses SlateDB as its embedded database, which stores data in object storage.
[database]backend = "gcs"path = "gs://my-bucket/silo/%shard%"apply_wal_on_close = true# Optional: periodic self-healing scan for pending concurrency requests# concurrency_reconcile_interval_ms = 5000
# Optional: separate WAL storage[database.wal]backend = "fs"path = "/var/lib/silo/wal/%shard%"Database Options
Section titled “Database Options”| Option | Type | Default | Description |
|---|---|---|---|
backend | string | "fs" | Storage backend type (see below) |
path | string | "/tmp/silo/%shard%" | Path or URL for data storage. Use %shard% as a placeholder for the shard number. |
apply_wal_on_close | bool | true | Flush WAL to object storage before closing shards (recommended for durability) |
concurrency_reconcile_interval_ms | number | 5000 | Optional interval for periodic pending-request reconciliation in the concurrency manager |
Storage Backends
Section titled “Storage Backends”| Backend | Description | Path Format |
|---|---|---|
fs | Local filesystem | /var/lib/silo/%shard% |
s3 | Amazon S3 | s3://bucket-name/prefix/%shard% |
gcs | Google Cloud Storage | gs://bucket-name/prefix/%shard% |
memory | In-memory (testing only) | Any string |
url | Generic URL-based object store | URL understood by SlateDB |
Cloud Storage Authentication
Section titled “Cloud Storage Authentication”For S3 and GCS backends, Silo uses the standard credential chain:
- S3: AWS credential chain (
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY, instance profiles, etc.) - GCS:
GOOGLE_APPLICATION_CREDENTIALSenvironment variable or GKE Workload Identity
WAL Configuration
Section titled “WAL Configuration”By default, the Write-Ahead Log (WAL) uses the same backend and location as the main data store. For better write performance, you can configure a separate local WAL:
[database]backend = "gcs"path = "gs://my-bucket/silo/%shard%"
[database.wal]backend = "fs"path = "/var/lib/silo/wal/%shard%"| Option | Type | Description |
|---|---|---|
backend | string | Storage backend for WAL |
path | string | Path for WAL storage (supports %shard% placeholder) |
When using a local WAL with cloud object storage:
- Writes are faster because they go to local disk first
- On graceful shard close (or node shutdown), WAL is flushed to object storage to ensure durability
- The local WAL directory is deleted after successful flush
- On crash, shard leases are permanent and persist until the node restarts and recovers the WAL. Set
node_idto a stable value (e.g.,"${POD_NAME}") to enable automatic WAL recovery after restarts. See the Internals guide for details.
apply_wal_on_close
Section titled “apply_wal_on_close”When running on pure object storage, your Silo instances don’t necessarily need to apply the WAL to the rest of the storage before shutting down. But, when running with a split WAL, where the WAL is not on object storage, you should apply the WAL to the rest of the storage before shutting down to ensure durability. apply_wal_on_close is set to true by default, which triggers this behavior.
Note that apply_wal_on_close only applies during graceful shutdowns. If a node crashes, the WAL is not flushed. Silo’s permanent shard leases ensure the crashed node retains ownership of its shards, so when the node restarts, it can recover the unflushed WAL from local disk.
concurrency_reconcile_interval_ms
Section titled “concurrency_reconcile_interval_ms”Silo runs a background reconciliation loop while each shard is open. This loop periodically scans pending concurrency request records and re-signals grant processing. It is a self-healing mechanism for cases where durable requests exist but in-memory notifications were missed (for example, due to a crash between separate write/notify phases).
This option is optional. If omitted, Silo uses the default value of 5000 milliseconds.
SlateDB Configuration
Section titled “SlateDB Configuration”Silo uses SlateDB as its embedded storage engine. You can configure SlateDB-specific options via the [database.slatedb] section. All SlateDB configuration options are passed directly to SlateDB, so you can refer to the SlateDB Configuration Documentation for the full list of available options.
[database]backend = "gcs"path = "gs://my-bucket/silo/%shard%"
[database.slatedb]flush_interval = "100ms"l0_sst_size_bytes = 67108864l0_max_ssts = 8
[database.slatedb.compactor_options]poll_interval = "5s"max_sst_size = 1073741824Common SlateDB Options
Section titled “Common SlateDB Options”| Option | Type | Default | Description |
|---|---|---|---|
flush_interval | duration | "100ms" | How often to flush the memtable to SST files |
l0_sst_size_bytes | number | 67108864 | Target size for L0 SST files (64MB default) |
l0_max_ssts | number | 8 | Maximum number of L0 SSTs before compaction triggers |
max_unflushed_bytes | number | 536870912 | Maximum unflushed data in memory (512MB default) |
Compactor Options
Section titled “Compactor Options”Configure compaction behavior via [database.slatedb.compactor_options]:
| Option | Type | Default | Description |
|---|---|---|---|
poll_interval | duration | "5s" | How often to check for compaction work |
max_sst_size | number | 1073741824 | Maximum size of compacted SST files (1GB default) |
max_concurrent_compactions | number | 4 | Maximum concurrent compaction jobs |
Garbage Collector Options
Section titled “Garbage Collector Options”Configure garbage collection via [database.slatedb.garbage_collector_options]:
[database.slatedb.garbage_collector_options.manifest_options]interval = "300s"min_age = "86400s"
[database.slatedb.garbage_collector_options.wal_options]interval = "60s"min_age = "60s"Partial Configuration
Section titled “Partial Configuration”You can specify only the SlateDB options you want to customize—unspecified options will use SlateDB’s defaults. For example, to only configure the flush interval and object store cache:
[database.slatedb]flush_interval = "1ms"
[database.slatedb.object_store_cache_options]root_folder = "/var/silo-cache"cache_puts = trueAll other SlateDB settings (like l0_sst_size_bytes, manifest_poll_interval, etc.) will automatically use their default values. This allows you to tune specific parameters without needing to specify the entire configuration.
Coordination Configuration
Section titled “Coordination Configuration”The [coordination] section configures how Silo nodes discover each other and coordinate shard ownership in a cluster.
[coordination]backend = "etcd"cluster_prefix = "silo-prod"num_shards = 8lease_ttl_secs = 10Coordination Options
Section titled “Coordination Options”| Option | Type | Default | Description |
|---|---|---|---|
backend | string | "none" | Coordination backend: "none", "etcd", or "k8s" |
cluster_prefix | string | "silo" | Prefix for namespacing coordination keys/leases |
num_shards | number | 8 | Total number of shards in the cluster |
lease_ttl_secs | number | 10 | TTL for the membership lease in seconds. This controls how quickly the cluster detects a node has crashed. Shard ownership leases are permanent and not affected by this TTL. |
advertised_grpc_addr | string | (none) | Address other nodes use to connect to this node |
node_id | string | (random UUID) | Stable node identity for this instance. If set, the node will reclaim shard leases from a previous run on startup, enabling WAL recovery after crashes. See Permanent shard leases. |
Coordination Backends
Section titled “Coordination Backends”For local development or single-node deployments:
[coordination]backend = "none"In this mode, a single Silo instance owns all shards and no coordination is needed.
For distributed deployments using etcd:
[coordination]backend = "etcd"cluster_prefix = "silo-prod"num_shards = 16lease_ttl_secs = 10etcd_endpoints = ["http://etcd-0:2379", "http://etcd-1:2379", "http://etcd-2:2379"]| Option | Type | Default | Description |
|---|---|---|---|
etcd_endpoints | array | ["http://127.0.0.1:2379"] | List of etcd endpoint URLs |
For Kubernetes deployments using native Lease objects:
[coordination]backend = "k8s"cluster_prefix = "silo-prod"num_shards = 16lease_ttl_secs = 15k8s_namespace = "silo"advertised_grpc_addr = "${POD_IP}:7450"node_id = "${POD_NAME}"| Option | Type | Default | Description |
|---|---|---|---|
k8s_namespace | string | "default" | Kubernetes namespace for Lease objects |
The k8s backend uses Kubernetes Lease objects for coordination. Each shard gets a Lease named {cluster_prefix}-shard-{n}.
Advertised gRPC Address
Section titled “Advertised gRPC Address”In clustered deployments, the advertised_grpc_addr tells other nodes how to connect to this node. This is important when:
- You bind to
0.0.0.0but need to advertise a specific IP - You’re running in Kubernetes and need to advertise the pod IP
[server]grpc_addr = "0.0.0.0:7450" # Bind to all interfaces
[coordination]backend = "k8s"# Advertise the pod IP (injected via Downward API)advertised_grpc_addr = "${POD_IP}:7450"Web UI Configuration
Section titled “Web UI Configuration”The [webui] section configures the built-in web dashboard.
[webui]enabled = trueaddr = "127.0.0.1:8080"| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable the web UI server |
addr | string | "127.0.0.1:8080" | Address and port for the web UI |
The web UI provides:
- Cluster overview and health status
- Queue inspection and job browsing
- SQL query interface for debugging
- Configuration viewer
Metrics Configuration
Section titled “Metrics Configuration”The [metrics] section configures the Prometheus metrics endpoint.
[metrics]enabled = trueaddr = "127.0.0.1:9090"| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable the Prometheus metrics endpoint |
addr | string | "127.0.0.1:9090" | Address and port for the metrics server |
Metrics are exposed in Prometheus format at /metrics. See the Observability Guide for available metrics.
Logging Configuration
Section titled “Logging Configuration”The [logging] section configures log output format.
[logging]format = "json"| Option | Type | Default | Description |
|---|---|---|---|
format | string | "text" | Log format: "text" (human-readable) or "json" (structured) |
Use json format for production deployments to enable log aggregation and analysis.
Gubernator (Rate Limiting) Configuration
Section titled “Gubernator (Rate Limiting) Configuration”The [gubernator] section configures integration with Gubernator for distributed rate limiting.
[gubernator]address = "http://gubernator:9991"coalesce_interval_ms = 5max_batch_size = 100connect_timeout_ms = 5000request_timeout_ms = 10000| Option | Type | Default | Description |
|---|---|---|---|
address | string | (none) | Gubernator server URL. If not set, rate limiting is disabled. |
coalesce_interval_ms | number | 5 | Max time to wait before sending a batch |
max_batch_size | number | 100 | Max requests to batch together |
connect_timeout_ms | number | 5000 | Connection timeout in milliseconds |
request_timeout_ms | number | 10000 | Request timeout in milliseconds |
Tenancy Configuration
Section titled “Tenancy Configuration”The [tenancy] section enables multi-tenant features.
[tenancy]enabled = true| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable multi-tenancy support |
Environment Variable Substitution
Section titled “Environment Variable Substitution”Silo supports environment variable substitution in configuration values using shell-like syntax:
${VAR}- Expands to the value ofVAR, or empty string if not set${VAR:-default}- Expands to the value ofVAR, or"default"if not set
For example:
[database]# If DATABASE_PATH is set to "/data/silo", this becomes "/data/silo/%shard%"# If DATABASE_PATH is not set, this becomes "/var/lib/silo/%shard%"path = "${DATABASE_PATH:-/var/lib/silo}/%shard%"Substituting pod IPs in for advertised_grpc_addr in Kubernetes
Section titled “Substituting pod IPs in for advertised_grpc_addr in Kubernetes”You can use environment variable substitution in Kubernetes to inject values via the Downward API, which is required for configuring dynamic values that aren’t known in advance like advertised_grpc_addr:
[coordination]# Inject pod IP from Kubernetes Downward APIadvertised_grpc_addr = "${POD_IP}:7450"
# Use a default if the env var isn't setcluster_prefix = "${CLUSTER_NAME:-silo-default}"For example, if you pass the POD_IP environment variable to your Silo pod via the downward API like so:
apiVersion: v1kind: Podmetadata: name: silospec: containers: - name: silo image: ghcr.io/gadget-inc/silo:latest env: - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP args: - "-c" - "/etc/silo/config.toml" volumeMounts: - name: config mountPath: /etc/silo# ...