Skip to content

Sidecar Mode

Sidecar mode lets Yoda follow an existing database — SQLite or PostgreSQL — and maintain a continuously updated OLAP mirror, optionally with full SCD Type 2 history. No schema changes are required on the source database; the only prerequisite is that each tracked table has an updated_at column (and optionally a deleted_at column for soft-delete detection).

Requires the sidecar Cargo feature:

toml
yoda = { version = "1", features = ["sidecar"] }

How It Works

Instead of installing SQLite triggers, the sidecar consumer polls the source database on each cycle:

sql
SELECT <columns>
FROM   <table>
WHERE  (updated_at, pk1, pk2, …) > (watermark_ts, last_pk1, last_pk2, …)
ORDER  BY updated_at, pk1, pk2, …
LIMIT  <poll_batch_size>

The watermark advances to the (updated_at, pk1, pk2, …) tuple of the last row seen. Composite primary keys are supported via SQL tuple comparison, so ties on updated_at are broken correctly.

INSERT vs UPDATE Heuristic

Because a WHERE updated_at > watermark query cannot inherently distinguish a first-ever insert from an update, the consumer uses:

  • created_at == updated_at → emit as CdcOperation::Insert
  • created_at != updated_at → emit as CdcOperation::Update

This heuristic requires that the source table sets created_at only once at row creation and that updated_at is updated on every subsequent change.


Configuration

Rust

rust
use yoda::{
    HtapConfig, SidecarConfig, SidecarSource, SyncMode,
    TimestampCdcConfig, TimestampTableConfig, DeleteDetection,
};

let config = HtapConfig {
    // OLAP engine — receives the replicated data
    olap_in_memory: false,
    olap_path: Some("/var/lib/myapp/olap".to_string()),
    sync_mode: SyncMode::Temporal, // optional: keep full history
    sync_interval: Some(std::time::Duration::from_millis(500)),

    sidecar: Some(SidecarConfig {
        source: SidecarSource::Postgres(
            "host=db.example.com user=analytics dbname=production".to_string()
        ),
        timestamp_config: TimestampCdcConfig {
            tables: vec![
                TimestampTableConfig {
                    table_name:       "users".to_string(),
                    primary_key:      vec!["id".to_string()],
                    created_at_column: "created_at".to_string(),
                    updated_at_column: "updated_at".to_string(),
                    columns:          vec![], // empty = SELECT *
                },
                TimestampTableConfig {
                    table_name:        "orders".to_string(),
                    primary_key:       vec!["order_id".to_string()],
                    created_at_column:  "created_at".to_string(),
                    updated_at_column:  "updated_at".to_string(),
                    columns:           vec![
                        "order_id".to_string(),
                        "user_id".to_string(),
                        "total".to_string(),
                        "created_at".to_string(),
                        "updated_at".to_string(),
                    ],
                },
            ],
            poll_batch_size: 500,
            delete_detection: DeleteDetection::SoftDelete {
                column: "deleted_at".to_string(),
            },
        },
        enable_local_oltp: false, // pure sidecar — no local SQLite write path
        watermark_path: Some("/var/lib/myapp/watermark-db".to_string()),
    }),
    ..HtapConfig::default()
};

Python

python
import yoda

config = yoda.HtapConfig(
    olap_backend="datafusion",
    storage_mode="parquet",
    storage_path="/var/lib/myapp/olap",
    sync_mode="temporal",
    sidecar_source="host=db.example.com user=analytics dbname=production",
    sidecar_source_type="postgres",
    sidecar_tables=[
        yoda.TimestampTableConfig(
            table_name="users",
            primary_key=["id"],
            created_at_column="created_at",
            updated_at_column="updated_at",
        ),
    ],
    sidecar_poll_batch_size=500,
    sidecar_delete_detection="soft_delete:deleted_at",
    sidecar_enable_oltp=False,
)
engine = yoda.HtapEngine(config)

TimestampCdcConfig Fields

FieldTypeDescription
tablesVec<TimestampTableConfig>One entry per table to replicate.
poll_batch_sizeu32Rows fetched per table per cycle. Smaller values reduce memory pressure; larger values speed up initial bulk sync.
delete_detectionDeleteDetectionStrategy for detecting deleted rows. See below.

TimestampTableConfig Fields

FieldTypeDescription
table_nameStringTable name in the source database.
primary_keyVec<String>Primary key columns (at least one required). Used for watermark tie-breaking and CDC event keying.
created_at_columnStringColumn set once at row creation — used with updated_at for the INSERT/UPDATE heuristic.
updated_at_columnStringColumn updated on every change — the primary watermark column.
columnsVec<String>Columns to SELECT. Empty means SELECT *. Provide an explicit list to reduce bandwidth or exclude irrelevant columns.

Delete Detection

VariantBehaviour
DeleteDetection::DisabledHard deletes are not detected. Rows deleted in the source remain in OLAP unchanged. Use when the source never hard-deletes rows or when stale data is acceptable.
DeleteDetection::SoftDelete { column }A second query polls WHERE column IS NOT NULL AND column > watermark. Rows returned are emitted as Delete events. The column value is used as the event timestamp so SCD Type 2 validity boundaries are correct.
DeleteDetection::FullDiff { every_n_cycles }Not yet implemented. Reserved for a future release. Configuring it logs a warning on each cycle and produces no delete events.

Watermark Persistence

By default (watermark_path = None), the polling watermark is kept in memory and is lost on process restart. On the next start, the consumer replays from (updated_at = 0, pk = min), which re-processes all historical rows.

Set watermark_path to a RocksDB directory to persist the watermark durably:

rust
watermark_path: Some("/var/lib/myapp/watermark-db".to_string()),

With persistence, the consumer resumes from the last seen (updated_at, pk…) tuple after a restart, avoiding a full replay. This requires the rocksdb-watermark sub-feature, which is enabled automatically when the sidecar feature is on.


Pure Sidecar vs Hybrid

Modeenable_local_oltpOLTP write pathUse case
Pure sidecarfalseNot availableRead-only OLAP layer on top of an existing app DB
HybridtrueLocal SQLite at oltp_pathYoda as both a local write store and a sidecar follower

In pure sidecar mode, calling execute() returns HtapError::OltpNotAvailable. SELECT queries fall through to OLAP automatically.


Source Requirements

  • Each tracked table must have a reliable updated_at column that is updated on every row modification. Rows with a stale or missing updated_at will be missed between poll cycles.
  • Composite primary keys are fully supported.
  • The source database must allow the polling connection to run SELECT queries on the tracked tables.
  • For PostgreSQL, the connection string follows the libpq format (host=… user=… dbname=…).

Next Steps

Released under the Apache-2.0 License.