Skip to content

TOML Configuration Reference

Full schema reference for the TOML file consumed by yd serve --config <path>. The file has three top-level sections: [engine], [[tables]], and the optional [sidecar].


[engine]

Core engine settings. All keys are optional except where noted.

oltp_path

TypeDefault
string"yoda.db"

Filesystem path to the SQLite database used for OLTP writes. Created if it does not exist. WAL mode and PRAGMA synchronous=NORMAL are applied automatically.

toml
[engine]
oltp_path = "/var/lib/yoda/htap.db"

olap_backend

TypeDefault
string"datafusion"

Which OLAP engine to instantiate. Accepted values:

  • "datafusion" — Apache DataFusion (pure Rust, natively async). Always available in default builds.
  • "duckdb" — DuckDB (C++ bundled, requires --features duckdb-backend).
toml
[engine]
olap_backend = "datafusion"

olap_in_memory

TypeDefault
booltrue

Keep the OLAP backend entirely in memory. For DataFusion, prefer the [engine.datafusion_storage] sub-table instead; this flag is primarily for the DuckDB backend. Set to false and provide olap_path for durable DuckDB storage.

olap_path

TypeDefault
string (optional)

Filesystem path for durable OLAP storage when olap_in_memory = false. For DuckDB this is a .duckdb file. Ignored for DataFusion when [engine.datafusion_storage] is set.

sync_interval_ms

TypeDefault
integer (ms)500

How often the background CDC sync loop polls _yoda_cdc_log and applies events to the OLAP mirror.

  • ≤ 50 — near real-time; adds measurable SQLite I/O pressure.
  • 100–500 — balanced; recommended for most workloads.
  • ≥ 1000 — low overhead; suitable for batch analytics.
toml
[engine]
sync_interval_ms = 250

sync_batch_size

TypeDefault
integer1000

Maximum number of CDC events consumed per sync cycle. Larger batches amortise per-transaction overhead and increase throughput at the cost of higher per-cycle latency.

read_pool_size

TypeDefault
integer4

Number of read connections in the OLTP connection pool (round-robin). Increase for workloads with many concurrent OLTP reads.

sync_mode

TypeDefault
string"destructive"

How CDC events are applied to the OLAP mirror:

  • "destructive" (alias "mirror") — standard mirror semantics. UPDATE overwrites the row, DELETE removes it.
  • "temporal" (aliases "scd2", "scd_type_2") — SCD Type 2 append-only mode. Every change appends a new version with _yoda_valid_from, _yoda_valid_to, and _yoda_operation columns. Enables point-in-time queries.

See Sync modes for details.

toml
[engine]
sync_mode = "temporal"

rocksdb_cdc_path

TypeDefaultFeature
string (optional)rocksdb-cdc

Path to a RocksDB directory used as a durable CDC event buffer. When set, SQLite triggers still fire into _yoda_cdc_log, but a bridge drains them into RocksDB on each poll cycle. The sync engine then reads exclusively from RocksDB, giving crash-durable event buffering (5–7x faster CDC write path than SQLite triggers alone). Ignored in sidecar mode.

toml
[engine]
rocksdb_cdc_path = "/var/lib/yoda/cdc-log"

flight_port

TypeDefaultFeature
integer (optional)flight-sql

TCP port on which the Arrow Flight SQL gRPC server listens. When set, yd serve starts the Flight SQL endpoint at 0.0.0.0:<port>. Requires the binary to be compiled with --features flight-sql.

See FlightSQL for client examples.

toml
[engine]
flight_port = 50051

flight_auth_token

TypeDefaultFeature
string (optional)flight-sql

Bearer token that clients must supply in the authorization: Bearer <token> gRPC metadata header. Falls back to the YODA_FLIGHT_AUTH_TOKEN environment variable if this key is absent, so operators can avoid storing tokens on disk.

Prefer the environment variable

Store the token in YODA_FLIGHT_AUTH_TOKEN rather than in the TOML file to avoid leaking it via config-file access logs or version control.

log_format

TypeDefault
string"text"

Log format used in headless / stdout mode:

  • "text" — human-readable tracing output (default).
  • "json" — structured JSON log lines compatible with Loki, Datadog, and the ELK stack.
toml
[engine]
log_format = "json"

schema_registry_path

TypeDefault
string (optional)

Path to a JSON file where the schema registry is persisted between restarts. When set, HtapEngine::register_table writes the file atomically after each registration. On the next start, all previously registered tables are restored automatically — no need to re-declare them in [[tables]].

toml
[engine]
schema_registry_path = "/var/lib/yoda/registry.json"

metrics_port

TypeDefaultFeature
integer (optional)metrics-exporter

TCP port for the Prometheus metrics HTTP endpoint. When set, yd serve exposes counters and gauges from yoda-sync and yoda at http://0.0.0.0:<port>/metrics. Requires the binary to be compiled with --features metrics-exporter.

toml
[engine]
metrics_port = 9100

[engine.datafusion_storage]

Optional sub-table controlling where DataFusion persists Arrow data. When absent, DataFusion defaults to in-memory storage. Ignored when olap_backend = "duckdb".

mode

TypeDefault
string"in_memory" (when section absent)

Storage mode. Accepted values:

ValueAliasesDescriptionRequires
"in_memory""memory", "inmemory"No persistence (default)
"arrow_ipc""ipc"Arrow IPC files on local diskpath
"parquet"Parquet files on local diskpath
"s3-parquet""s3_parquet"Parquet on Amazon S3url, cloud-storage feature
"gcs-parquet""gcs_parquet"Parquet on Google Cloud Storageurl, cloud-storage feature

path

TypeDefault
string (optional)

Local filesystem path for arrow_ipc and parquet modes. Yoda creates the directory on first use.

url

TypeDefault
string (optional)

Object-store URL for cloud storage modes. Examples:

  • "s3://my-bucket/yoda-data" for s3-parquet
  • "gs://my-bucket/yoda-data" for gcs-parquet
toml
[engine.datafusion_storage]
mode = "parquet"
path = "/var/lib/yoda/data"
toml
[engine.datafusion_storage]
mode = "s3-parquet"
url  = "s3://analytics-bucket/htap"

See Configuration for cloud-storage credential setup.


[[tables]]

Declares HTAP tables. Each entry is an element of the [[tables]] array. At least one entry is required for the engine to replicate data.

name

TypeRequired
stringyes

Table name. Must match the SQLite table name exactly and consist only of [A-Za-z0-9_] characters.

ddl

TypeDefault
string (optional)

CREATE TABLE statement executed on the OLTP layer at startup before registering the table. Use CREATE TABLE IF NOT EXISTS to make it idempotent.

toml
[[tables]]
name = "orders"
ddl  = "CREATE TABLE IF NOT EXISTS orders (id INTEGER PRIMARY KEY, amount REAL)"

[[tables.columns]]

Array of column definitions. At least one column must have primary_key = true; the engine rejects tables with no primary key at startup.

name

TypeRequired
stringyes

Column name. Must consist only of [A-Za-z0-9_] characters.

type

TypeRequired
stringyes

Arrow data type. Accepted values:

ValueAliasesArrow type
"int64""integer", "bigint"Int64
"int32""int"Int32
"int16""smallint"Int16
"int8""tinyint"Int8
"uint64"UInt64
"uint32"UInt32
"utf8""text", "string", "varchar"Utf8
"float64""double", "real"Float64
"float32""float"Float32
"boolean""bool"Boolean
"date""date32"Date32
"timestamp"Timestamp(Microsecond, None)
"binary""blob"Binary

nullable

TypeDefault
boolfalse

Whether the column allows NULL values.

primary_key

TypeDefault
boolfalse

If true, includes this column in the table's primary key. Composite primary keys are supported by setting primary_key = true on multiple columns.


[sidecar]

Optional section that switches Yoda into sidecar mode: it follows an external database via timestamp-based CDC instead of using local SQLite triggers. Requires the sidecar Cargo feature (--features sidecar).

See Sidecar mode for a conceptual overview.

source_path

TypeRequired
stringyes

Connection path or DSN for the external database:

  • SQLite: local filesystem path, e.g. "/data/app.db"
  • PostgreSQL: connection string, e.g. "postgres://user:pass@host:5432/mydb"

source_type

TypeDefault
string"sqlite"

Source database type: "sqlite" or "postgres".

enable_local_oltp

TypeDefault
boolfalse

When true, a local Rusqlite engine is also started at oltp_path. This lets you perform local OLTP writes alongside the sidecar CDC source. Defaults to false for pure sidecar mode.

poll_batch_size

TypeDefault
integer500

Maximum number of rows fetched from the source database per poll cycle.

watermark_path

TypeDefault
string (optional)

Path to a RocksDB directory for persisting the CDC watermark between restarts. When absent, the watermark is in-memory only and polling restarts from the beginning on each restart. Requires the rocksdb-watermark feature on yoda-sidecar (automatically enabled when the sidecar top-level feature is on).

[sidecar.delete_detection]

Optional sub-table controlling how deleted rows in the source database are detected.

mode

ValueDescription
"disabled"No delete detection (default when section absent)
"soft_delete"Detects deletes via a boolean/flag column
"full_diff"Periodically scans the source to detect missing rows

column

TypeWhen required
stringmode = "soft_delete"

Name of the soft-delete flag column. A non-zero integer or non-empty string value indicates the row is deleted.

every_n_cycles

TypeDefaultWhen used
integer60mode = "full_diff"

How many poll cycles between full-diff scans.

[[sidecar.tables]]

Per-table configuration for the sidecar CDC consumer. Mirrors [[tables]] but describes the source database schema (not the OLAP target).

table_name

TypeRequired
stringyes

Name of the table in the source database.

primary_key

TypeRequired
array of stringsyes

Column names forming the primary key. Composite primary keys are supported.

toml
primary_key = ["tenant_id", "order_id"]

created_at_column

TypeDefault
string"created_at"

Column carrying the row's creation timestamp. Used with updated_at_column to distinguish INSERTs from UPDATEs: if created_at == updated_at the event is classified as an INSERT.

updated_at_column

TypeDefault
string"updated_at"

Column carrying the row's last-update timestamp. Used as the watermark for incremental polling.

columns

TypeDefault
array of strings[] (all columns)

Explicit list of column names to replicate. When empty, all columns are replicated.


Complete examples

(a) Standard HTAP with DataFusion

toml
[engine]
oltp_path        = "app.db"
olap_backend     = "datafusion"
sync_interval_ms = 500
sync_mode        = "destructive"
read_pool_size   = 4
log_format       = "text"

[engine.datafusion_storage]
mode = "parquet"
path = "/var/lib/yoda/data"

[[tables]]
name = "users"
ddl  = "CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, name TEXT NOT NULL, email TEXT)"

  [[tables.columns]]
  name        = "id"
  type        = "int64"
  nullable    = false
  primary_key = true

  [[tables.columns]]
  name     = "name"
  type     = "utf8"
  nullable = false

  [[tables.columns]]
  name     = "email"
  type     = "utf8"
  nullable = true

[[tables]]
name = "events"
ddl  = "CREATE TABLE IF NOT EXISTS events (id INTEGER PRIMARY KEY, user_id INTEGER, type TEXT, ts TEXT)"

  [[tables.columns]]
  name        = "id"
  type        = "int64"
  nullable    = false
  primary_key = true

  [[tables.columns]]
  name     = "user_id"
  type     = "int64"
  nullable = true

  [[tables.columns]]
  name     = "type"
  type     = "utf8"
  nullable = true

  [[tables.columns]]
  name     = "ts"
  type     = "utf8"
  nullable = true

(b) Sidecar mode polling a PostgreSQL source

toml
[engine]
oltp_path        = ":memory:"     # not used; OLTP is disabled
olap_backend     = "datafusion"
sync_interval_ms = 1000
sync_mode        = "temporal"     # keep full history
log_format       = "json"

[engine.datafusion_storage]
mode = "parquet"
path = "/var/lib/yoda/sidecar-data"

[sidecar]
source_path       = "postgres://app_user:[email protected]:5432/production"
source_type       = "postgres"
enable_local_oltp = false
poll_batch_size   = 1000
watermark_path    = "/var/lib/yoda/watermarks"

[sidecar.delete_detection]
mode   = "soft_delete"
column = "deleted_at"

[[sidecar.tables]]
table_name        = "orders"
primary_key       = ["id"]
created_at_column = "created_at"
updated_at_column = "updated_at"
columns           = ["id", "customer_id", "amount", "status", "created_at", "updated_at", "deleted_at"]

[[sidecar.tables]]
table_name        = "customers"
primary_key       = ["id"]
created_at_column = "created_at"
updated_at_column = "updated_at"

(c) HTAP + FlightSQL + temporal mode + Prometheus metrics

toml
[engine]
oltp_path            = "/data/htap.db"
olap_backend         = "datafusion"
sync_interval_ms     = 200
sync_mode            = "temporal"
read_pool_size       = 8
log_format           = "json"
schema_registry_path = "/data/registry.json"

# Arrow Flight SQL gRPC server (--features flight-sql)
flight_port = 50051
# flight_auth_token = "my-secret"   # or set YODA_FLIGHT_AUTH_TOKEN env var

# Prometheus metrics (--features metrics-exporter)
metrics_port = 9100

[engine.datafusion_storage]
mode = "parquet"
path = "/data/olap"

[[tables]]
name = "transactions"
ddl  = """
  CREATE TABLE IF NOT EXISTS transactions (
    id         INTEGER PRIMARY KEY,
    account_id INTEGER NOT NULL,
    amount     REAL    NOT NULL,
    currency   TEXT,
    ts         TEXT    NOT NULL
  )
"""

  [[tables.columns]]
  name        = "id"
  type        = "int64"
  nullable    = false
  primary_key = true

  [[tables.columns]]
  name     = "account_id"
  type     = "int64"
  nullable = false

  [[tables.columns]]
  name     = "amount"
  type     = "float64"
  nullable = false

  [[tables.columns]]
  name     = "currency"
  type     = "utf8"
  nullable = true

  [[tables.columns]]
  name     = "ts"
  type     = "utf8"
  nullable = false

Temporal columns are added automatically

When sync_mode = "temporal", Yoda appends _yoda_valid_from, _yoda_valid_to, and _yoda_operation columns to the OLAP table automatically. Do not declare them in [[tables.columns]].

Feature-gated keys

flight_port, flight_auth_token, and metrics_port are silently ignored if the corresponding feature flag (flight-sql, metrics-exporter) was not enabled at compile time. Build with the appropriate --features flags or use a pre-built binary that includes them.

Released under the Apache-2.0 License.