diff --git a/docs/database-schema/database-schema-overview.md b/docs/database-schema/database-schema-overview.md
index 757d210..7dfefca 100644
--- a/docs/database-schema/database-schema-overview.md
+++ b/docs/database-schema/database-schema-overview.md
@@ -115,14 +115,15 @@ flowchart LR
AD -->|HTTPS-mTLS| CH
Cleanup -->|HTTPS-mTLS| CH
- BFsdex[SDEX Backfill
Local Rust CLI on workstation
ADR 0005] -->|local Postgres| LPG[(workstation Postgres)]
- BFamm[Soroban AMM Backfill
Local Rust CLI on workstation
ADR 0001] -->|local Postgres| LPG
- LPG -->|sdex-cloud-push,
HTTPS-mTLS| CH
- LCH[(local ClickHouse
populated by BE backfill-runner
Docker, workstation)] --> BFamm
+ BFsdex[SDEX Backfill
Local Rust CLI on workstation
ADR 0005] -->|write OHLCV
to local ClickHouse| LCHsdex[(local ClickHouse
SDEX backfill
Docker, workstation)]
+ LCHsdex -->|sdex-cloud-push,
HTTPS-mTLS| CH
+ LCH[(local ClickHouse
soroban_events input + prices.* mirror
Docker, workstation)] -->|read soroban_events| BFamm[Soroban AMM Backfill
Local Rust CLI on workstation
ADR 0001]
+ BFamm -->|write per-source OHLCV
to local prices.* mirror| LCH
+ LCH -->|amm-cloud-push,
HTTPS-mTLS| CH
classDef store fill:#e8f1ff,stroke:#3a6ea5,stroke-width:2px;
classDef external fill:#f3e8ff,stroke:#6a3a8a,stroke-width:1px;
- class CH,CH15,CH1h,CHRollups,S3,LPG,LCH store;
+ class CH,CH15,CH1h,CHRollups,S3,LCHsdex,LCH store;
class SNS external;
```
@@ -138,16 +139,16 @@ are pushed to the Hetzner cluster via separate post-backfill tools.
## 2. Database Tech Stack
-| Component | Technology |
-| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| Database engine | **ClickHouse** on BE's shared Hetzner cluster (separate `prices` database, ADR 0007) |
-| Storage engines | `ReplacingMergeTree(version)` for OHLCV; `ReplacingMergeTree(updated_at)` for `current_prices` / `assets` / `backfill_progress`; `ReplacingMergeTree` for `oracle_prices` |
-| Rollups | Chain of CH materialised views: `price_ohlcv_1m → _15m → _1h → _4h → _1d → _1w → _1M` (replaces the OHLCV Rollup Lambda) |
-| Partitioning | `PARTITION BY toYYYYMM(timestamp)` on every OHLCV/oracle table; cleanup via `ALTER TABLE … DROP PARTITION` |
-| Database client (Rust) | [`clickhouse`](https://crates.io/crates/clickhouse) — async, native protocol over HTTPS-mTLS |
-| Schema tooling | Plain SQL DDL applied by the prices-api schema applier on first deploy; prices-api owns `prices.*` migrations unilaterally (ADR 0007 §3.7) |
-| Hosting | BE-managed Hetzner box behind Caddy:443; cross-cloud (AWS → Hetzner) hop, ~80–130 ms RTT mitigated by warm connection reuse and batched per-ledger writes |
-| Credentials | AWS Secrets Manager — per-env client cert + key for Caddy mTLS (2 secrets per env) |
+| Component | Technology |
+| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Database engine | **ClickHouse** on BE's shared Hetzner cluster (separate `prices` database, ADR 0007) |
+| Storage engines | `ReplacingMergeTree(version)` for OHLCV; `ReplacingMergeTree(updated_at)` for `current_prices` / `assets` / `backfill_progress`; `ReplacingMergeTree` for `oracle_prices` |
+| Rollups | Chain of CH materialised views: `price_ohlcv_1m → _15m → _1h → _4h → _1d → _1w → _1M` (replaces the OHLCV Rollup Lambda) |
+| Partitioning | `PARTITION BY toYYYYMM(timestamp)` on every OHLCV/oracle table; cleanup via `ALTER TABLE … DROP PARTITION` |
+| Database client (Rust) | [`clickhouse`](https://crates.io/crates/clickhouse) — async, native protocol over HTTPS-mTLS |
+| Schema tooling | Plain SQL DDL applied by the prices-api schema applier on first deploy; prices-api owns `prices.*` migrations unilaterally (ADR 0007 §3.7) |
+| Hosting | BE-managed Hetzner box behind Caddy:443; cross-cloud (AWS → Hetzner) hop, ~80–130 ms RTT mitigated by warm connection reuse and batched per-ledger writes |
+| Credentials | AWS Secrets Manager — per-env client `{cert,key,ca}` as a single JSON bundle secret per identity (one secret per identity per env, named by `MTLS_SECRET_NAME`; ADR 0007 / task 0063) |
**Why ClickHouse on a BE-shared cluster (ADR 0007):**
@@ -768,7 +769,7 @@ minutes. ADR 0005 made Stream 2 a local workstation CLI; ADR 0001 had
already done the same for Stream 1. Neither stream has a continuously-running
cloud-side process now, so none of those fields had a meaningful value to
write. Operators inspect live CLI progress (rate, ETA) via direct SQL on the
-workstation Postgres; the cloud row carries only the most recent push state.
+local workstation ClickHouse; the cloud row carries only the most recent push state.
**Freshness alarm (replaces heartbeat alarm).** A CloudWatch alarm watches
`sdex.last_push_at`. If it is older than the configured push-cadence
@@ -1092,10 +1093,10 @@ monthly partitions, plus heartbeat/status to `backfill_progress`.
### 7.1 Two-stream design (ADRs 0001, 0005, 0007)
-| Stream | Data location | Era | Method |
-| --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| **SDEX trades** | `ClaimAtom` from the five trade-shaped op types in `LedgerCloseMeta` XDR | All-time (2015 → present, ~57M ledgers) | Local Rust CLI on operator workstation (anonymous reads against `s3://aws-public-blockchain`) → local Postgres → `sdex-cloud-push` lands rows in Hetzner CH `prices.*` (ADR 0005) |
-| **Soroban AMM swaps** (Soroswap, Aquarius, Phoenix) | `soroban_events` in **local** ClickHouse, populated upfront by BE's `backfill-runner --target=clickhouse` against the same public S3 archive | Soroban activation (Nov 2023) → present (~8.5M ledgers) | Local Rust CLI on operator workstation; one-shot completion push lands rows in Hetzner CH `prices.*` (ADR 0001) |
+| Stream | Data location | Era | Method |
+| --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **SDEX trades** | `ClaimAtom` from the five trade-shaped op types in `LedgerCloseMeta` XDR | All-time (2015 → present, ~57M ledgers) | Local Rust CLI on operator workstation (anonymous reads against `s3://aws-public-blockchain`) → local ClickHouse → `sdex-cloud-push` lands rows in Hetzner CH `prices.*` (ADR 0005) |
+| **Soroban AMM swaps** (Soroswap, Aquarius, Phoenix) | `soroban_events` in **local** ClickHouse, populated upfront by BE's `backfill-runner --target=clickhouse` against the same public S3 archive | Soroban activation (Nov 2023) → present (~8.5M ledgers) | Local Rust CLI on operator workstation; one-shot completion push lands rows in Hetzner CH `prices.*` (ADR 0001) |
The Soroban AMM stream is handled first (Tranche 1). The operator runs BE's
`backfill-runner --target=clickhouse` to populate a local CH instance
@@ -1139,7 +1140,7 @@ Local ClickHouse (Docker) on operator workstation
│ `stellar-xdr` crate │
│ - Extracts token pair + amounts │
│ - Buckets to per-source 1-minute rows │
-│ - Writes to local Postgres (Docker) on workstation │
+│ - Writes to local CH prices.* mirror (Docker) │
└──────────────────────────────────────────────────────────┘
│
▼ one-shot completion push (only Hetzner-CH-touching step on Stream 1)
@@ -1155,12 +1156,12 @@ Hetzner ClickHouse `prices.*` (HTTPS-mTLS to Caddy:443)
flowchart LR
S3archive[(s3://aws-public-blockchain
Stellar public history)] -->|backfill-runner --target=clickhouse| LocalCH[(Local ClickHouse
Docker, workstation
soroban_events)]
LocalCH -->|signature='swap'
contract_id IN Soroswap/Aquarius/Phoenix
JOIN ledgers ON closed_at| CLI[soroban-amm-backfill
Local Rust CLI
ScVal decode via stellar-xdr]
- CLI -->|per-source 1-min rows| LocalPG[(workstation Postgres)]
- LocalPG -->|one-shot completion push
HTTPS-mTLS to Caddy:443| CH[(Hetzner ClickHouse
prices.price_ohlcv_*)]
+ CLI -->|per-source 1-min rows| LocalMirror[(local ClickHouse
prices.* mirror)]
+ LocalMirror -->|one-shot completion push
HTTPS-mTLS to Caddy:443| CH[(Hetzner ClickHouse
prices.price_ohlcv_*)]
CLI -->|status=completed,
last_push_at| BP[(prices.backfill_progress)]
classDef store fill:#e8f1ff,stroke:#3a6ea5,stroke-width:2px;
- class S3archive,LocalCH,LocalPG,CH,BP store;
+ class S3archive,LocalCH,LocalMirror,CH,BP store;
```
| Metric | Value | Notes |
@@ -1169,7 +1170,7 @@ flowchart LR
| Ledger range | ~48.5M–57M (Nov 2023 to present) | ~8.5M ledgers worth of events |
| Runtime | Local Rust CLI on operator workstation (`soroban-amm-backfill`) | No AWS infrastructure for the backfill itself; mirrors §7.4 Stream 2's local-CLI pattern |
| Workstation prep step | BE `backfill-runner --target=clickhouse` populates local CH | One-shot; runs against `s3://aws-public-blockchain` anonymous reads |
-| Sink during backfill | Local Postgres (Docker) on workstation | Hetzner ClickHouse is **not** written until the one-shot completion push |
+| Sink during backfill | Local ClickHouse `prices.*` mirror (Docker) on workstation | Hetzner ClickHouse is **not** written until the one-shot completion push |
| Estimated wall-clock | A few hours, dominated by `backfill-runner` archive ingestion | CH query + extraction + OHLCV write is fast against an indexed local store |
| Cloud-push cadence | One-shot completion push only | `prices.backfill_progress.soroban_amm` advances from `running` to `completed` in a single transition |
| Expected completion | During Tranche 1 (Week 2–3) | After the push, the local CH instance is torn down |
@@ -1189,12 +1190,12 @@ s3://aws-public-blockchain/v1.1/stellar/ledgers/pubnet/
│ - Filter + extract per task 0022's spec: │
│ 5 trade-shaped op types → ClaimAtom → TradeTick │
│ - Buckets to per-source 1-minute rows │
-│ - Per-ledger atomic txn: row UPSERTs + │
-│ backfill_progress checkpoint commit together │
+│ - Per-ledger checkpoint: INSERT rows, then │
+│ record ledger in backfill_sdex_ledgers │
└─────────────────────────────────────────────────────────┘
│
▼
- Local Postgres (Docker) on workstation
+ Local ClickHouse (Docker) on workstation
(operator-owned; backfill writes here, not Hetzner CH)
│
▼ `sdex-cloud-push` (separate post-backfill tool, HTTPS-mTLS)
@@ -1208,8 +1209,8 @@ s3://aws-public-blockchain/v1.1/stellar/ledgers/pubnet/
flowchart TD
Arch[(s3://aws-public-blockchain
Stellar public history
anonymous --no-sign-request)] -->|aws s3 sync| LocalDisk[(local .zst files)]
LocalDisk -->|xdr-parser crate
git Cargo dep| CLI[sdex-backfill
Local Rust CLI on workstation
~311 ledgers/s, ~1.12M/hr]
- CLI -->|filter 5 trade-shaped op types
extract ClaimAtom
bucket to per-source rows| LocalPG[(workstation Postgres)]
- LocalPG -->|sdex-cloud-push
tip-backward chunks
HTTPS-mTLS to Caddy:443| CH[(Hetzner ClickHouse
prices.price_ohlcv_*)]
+ CLI -->|filter 5 trade-shaped op types
extract ClaimAtom
bucket to per-source rows| LocalChStage[(local ClickHouse
price_ohlcv staging)]
+ LocalChStage -->|sdex-cloud-push
tip-backward chunks
HTTPS-mTLS to Caddy:443| CH[(Hetzner ClickHouse
prices.price_ohlcv_*)]
CLI -->|per-ledger atomic checkpoint| BPlocal[(local backfill_progress)]
CH -->|push updates row| BP[(prices.backfill_progress
sdex_archive)]
BP -->|last_push_at > tranche threshold| Alarm[CloudWatch Alarm
→ SNS email + Slack]
@@ -1218,7 +1219,7 @@ flowchart TD
classDef store fill:#e8f1ff,stroke:#3a6ea5,stroke-width:2px;
classDef alarm fill:#ffe5e5,stroke:#a53a3a,stroke-width:2px;
- class Arch,LocalDisk,LocalPG,CH,BP,BPlocal,CH1m store;
+ class Arch,LocalDisk,LocalChStage,CH,BP,BPlocal,CH1m store;
class Alarm alarm;
```
@@ -1227,18 +1228,19 @@ flowchart TD
| Total ledgers | ~57 million | Ledger 1 (Nov 2015) to current tip |
| Runtime | Local Rust CLI on operator workstation | No AWS infrastructure for the backfill itself; mirrors BE `backfill-runner` |
| Source | `s3://aws-public-blockchain/v1.1/stellar/ledgers/pubnet/` | Anonymous `--no-sign-request`; no AWS account needed to read |
-| Sink during backfill | Local Postgres (Docker) on workstation | Hetzner ClickHouse is **not** written during backfill — only by `sdex-cloud-push` |
+| Sink during backfill | Local ClickHouse (Docker) on workstation | Hetzner ClickHouse is **not** written during backfill — only by `sdex-cloud-push` |
| Measured CLI rate | ~311 ledgers/s (~1.12M ledgers/hour) | Per task 0022's measurement against the SDEX filter |
| Effective wall-clock (network-bound) | ~12–16 days continuous on one laptop | Archive sync is the bottleneck; CPU rarely saturates |
| Cloud-push cadence | Tip-backward chunks | The cloud `GET /backfill/status` view advances at push cadence, not CLI cadence |
| Expected completion | Full historical coverage extends past Tranche 3 | Tranche 3 acceptance is "progressing", not "complete" |
The `sdex-backfill` CLI is **resumable at per-ledger granularity**: each
-ledger's row UPSERTs and its `backfill_progress` checkpoint advance commit
-atomically in a single local-Postgres transaction. A crash mid-ledger leaves
-`current_ledger` pointing at the last fully-processed ledger; restart
-re-fetches and re-UPSERTs that ledger idempotently. Early ledgers (pre-2018)
-have very few DEX trades and process faster.
+processed ledger is recorded in the local `backfill_sdex_ledgers` checkpoint
+table after its rows are INSERTed. A crash mid-ledger leaves `current_ledger`
+pointing at the last fully-processed ledger; restart skips ledgers already
+recorded and re-inserts the in-flight ledger idempotently (re-inserted rows
+collapse under `ReplacingMergeTree(version)`). Early ledgers (pre-2018) have
+very few DEX trades and process faster.
### 7.4a Backfill state machine (`prices.backfill_progress.status`)
@@ -1404,9 +1406,10 @@ Scaled-up at high traffic (DB-relevant):
- **ClickHouse endpoint reachable only via mTLS through Caddy:443** on the
BE-managed Hetzner box. There is no other network surface to `prices.*`.
-- **mTLS material in AWS Secrets Manager** (per-env client cert + key, two
- secrets per env). Cert + key are loaded into the Lambda runtime on cold
- start and held in memory for the container's lifetime; never in env vars
+- **mTLS material in AWS Secrets Manager** (per-env client `{cert,key,ca}`
+ as a single JSON bundle secret per identity, named by `MTLS_SECRET_NAME`).
+ The bundle is loaded into the Lambda runtime on cold start and held in
+ memory for the container's lifetime; never in env vars
or source. Rotation: 1-year manual cadence (BE Cluster C agreement);
CloudWatch alarm on cert NotAfter approaching expiry; revocation = CA
rotation on compromise.
@@ -1801,7 +1804,7 @@ flowchart TB
%% ============================================================
%% AWS — SECRETS MANAGER (mTLS material)
%% ============================================================
- SM[("AWS Secrets Manager
per-env mTLS cert + key
(2 secrets per env)")]
+ SM[("AWS Secrets Manager
per-env mTLS {cert,key,ca}
(1 JSON bundle secret per identity)")]
class SM store
SM -.->|"loaded on cold start"| LIVE
@@ -1814,21 +1817,21 @@ flowchart TB
LocalCH[("Local ClickHouse
soroban_events
(Docker, torn down after push)")]
AMM["soroban-amm-backfill
Rust CLI (Stream 1)
ScVal decode via stellar-xdr"]
SDEX["sdex-backfill
Rust CLI (Stream 2)
~311 ledgers/s, ~1.12M/h"]
- LocalPG[("Workstation Postgres
price_ohlcv staging
(Docker)")]
+ LocalStage[("Local ClickHouse
prices.* staging
(Docker)")]
SDEXpush["sdex-cloud-push
tip-backward chunks"]
AMMpush["AMM completion push
one-shot"]
end
class BERun,AMM,SDEX,SDEXpush,AMMpush writer
- class LocalCH,LocalPG workstation
+ class LocalCH,LocalStage workstation
Archives -->|"backfill-runner --target=clickhouse"| BERun
BERun --> LocalCH
LocalCH --> AMM
Archives -->|"aws s3 sync"| SDEX
- AMM --> LocalPG
- SDEX --> LocalPG
- LocalPG --> SDEXpush
- LocalPG --> AMMpush
+ AMM --> LocalStage
+ SDEX --> LocalStage
+ LocalStage --> SDEXpush
+ LocalStage --> AMMpush
%% ============================================================
%% HETZNER — CADDY + CLICKHOUSE (BE-managed)
@@ -1954,13 +1957,13 @@ flowchart TB
**How to read the diagram**
- **Blue cylinders** are persistent stores (Hetzner CH tables, S3, archives,
- workstation Postgres, AWS Secrets Manager).
+ workstation ClickHouse, AWS Secrets Manager).
- **Green nodes** are writers (Lambdas + workstation CLIs + cloud-push tools).
- **Yellow nodes** are public API endpoints (readers).
- **Purple nodes** are external services and BE-managed infrastructure
(Reflector, SNS topic, Caddy).
-- **Orange nodes** are workstation-local components (Docker-hosted CH, local
- Postgres) — outside the AWS / Hetzner production surface.
+- **Orange nodes** are workstation-local components (Docker-hosted local
+ ClickHouse) — outside the AWS / Hetzner production surface.
- **Red node** is the CloudWatch alarm fed by `prices.backfill_progress.sdex.last_push_at`
(push freshness alarm; the heartbeat-style alarm from the prior design is
gone).
diff --git a/docs/prices-api-general-overview.md b/docs/prices-api-general-overview.md
index 9eb04fb..0f0bb00 100644
--- a/docs/prices-api-general-overview.md
+++ b/docs/prices-api-general-overview.md
@@ -163,7 +163,7 @@ to their own infrastructure at any time if needed.
| **Lambda — API handlers** | Public API | Individual functions per route group. Rust / axum via `lambda_runtime`, 256–512 MB, 15s timeout. No VPC; outbound HTTPS-mTLS to Caddy:443 |
| **API Gateway** | Public API entry point | REST API, usage plans, API key auth, rate limiting (100 req/s per key), request validation. Built-in response cache (0.5 GB) with per-endpoint TTLs |
| **EventBridge Scheduler** | Scheduled triggers | Cron/rate rules for all periodic Lambda workers |
-| **Secrets Manager** | Credentials & mTLS material | Per-env client cert + key for Caddy:443 mTLS (2 secrets per env); Soroswap/Aquarius API keys; oracle contract address |
+| **Secrets Manager** | Credentials & mTLS material | Per-env client `{cert,key,ca}` for Caddy:443 mTLS (single JSON bundle secret per identity, named by `MTLS_SECRET_NAME`); Soroswap/Aquarius API keys; oracle contract address |
| **CloudWatch + X-Ray** | Observability | API latency, error rates, ingestion lag, Lambda duration/concurrency, backfill progress; mTLS cert NotAfter alarm |
| **S3** (API docs) | Documentation hosting | OpenAPI spec + self-service onboarding portal, served via CloudFront |
@@ -462,7 +462,7 @@ stream, heartbeating into the cloud row every 15 minutes). ADR 0005 made
Stream 2 a local workstation CLI; ADR 0001 had already done the same for
Stream 1. Neither stream has a continuously-running cloud-side process, so
neither field has a meaningful value to write. Operators inspect live CLI
-progress (rate, ETA) via direct SQL on the workstation Postgres; the cloud
+progress (rate, ETA) via direct SQL on the local workstation ClickHouse; the cloud
DB carries only the most recent push state. If a future stream reintroduces a
continuous cloud-side writer, these columns can be added back at that time.
@@ -713,7 +713,7 @@ Per ADRs 0001 and 0005, both canonical streams run as workstation-local processe
cloud-side `prices.backfill_progress` row (ClickHouse, `ReplacingMergeTree(updated_at)`)
advances only when each stream's push step runs. The response therefore reflects the most
recent push state, not a live task heartbeat. Live CLI progress is visible to the operator
-via direct SQL on the workstation Postgres but is not surfaced to API consumers. A
+via direct SQL on the local workstation ClickHouse but is not surfaced to API consumers. A
CloudWatch alarm fires if `last_push_at` falls behind the configured push-cadence threshold
for the stream's tranche (operator-tunable; see §5.6 "GET /backfill/status Freshness").
@@ -810,8 +810,8 @@ table). The decoder writes the per-source bucket `vwap = volume_quote / volume_b
nothing more.
**mTLS write path.** The Lambda holds a `clickhouse` Rust client configured with the
-per-env client certificate + key loaded from AWS Secrets Manager (two secrets per env per
-ADR 0007 §3.5). Connections to Caddy:443 are warmed in the global init of the Lambda
+per-env client `{cert,key,ca}` loaded from AWS Secrets Manager (a single JSON bundle
+secret per identity, named by `MTLS_SECRET_NAME`, per ADR 0007 §3.5). Connections to Caddy:443 are warmed in the global init of the Lambda
runtime and reused across invocations to amortise TLS handshake cost — important because
the cross-cloud RTT (AWS → Hetzner, ~80-130 ms) dominates per-request latency if every
INSERT opens a fresh connection.
@@ -825,8 +825,8 @@ INSERT opens a fresh connection.
| **Asset Discovery** | EventBridge rate(1 hour) | Ledger account entries in `LedgerCloseMeta` | New classic asset issuances; new SEP-41 contract deployments → `prices.assets` |
| **Current Price Updater** | EventBridge rate(1 min) | `prices.price_ohlcv_1m` (after ingestion) | Cross-source VWAP per §5.5 → `prices.current_prices` |
| **Cleanup Worker** | EventBridge cron(02:00 UTC) | `prices.*` | `ALTER TABLE … DROP PARTITION` for expired month-partitions of `price_ohlcv_1m`, `_15m`, `oracle_prices` |
-| **SDEX Backfill CLI** (`sdex-backfill`, ADR 0005) | Local Rust CLI on operator workstation, run in tip-backward chunks during the project | `s3://aws-public-blockchain` (anonymous `--no-sign-request`) | Historical SDEX trades → per-source 1-min rows in **local Postgres** on the workstation |
-| **SDEX Cloud Push** (`sdex-cloud-push`, ADR 0005) | Operator-invoked between chunks; only Hetzner-CH-touching component on the Stream 2 path | Local Postgres `price_ohlcv` + `assets` | Streams accumulated rows to `prices.price_ohlcv_*` over HTTPS-mTLS to Caddy:443; advances `prices.backfill_progress` row for `sdex_archive` (`current_ledger`, `last_push_at`) |
+| **SDEX Backfill CLI** (`sdex-backfill`, ADR 0005) | Local Rust CLI on operator workstation, run in tip-backward chunks during the project | `s3://aws-public-blockchain` (anonymous `--no-sign-request`) | Historical SDEX trades → per-source 1-min rows in **local ClickHouse** on the workstation |
+| **SDEX Cloud Push** (`sdex-cloud-push`, ADR 0005) | Operator-invoked between chunks; only Hetzner-CH-touching component on the Stream 2 path | Local ClickHouse `price_ohlcv` + `assets` | Streams accumulated rows to `prices.price_ohlcv_*` over HTTPS-mTLS to Caddy:443; advances `prices.backfill_progress` row for `sdex_archive` (`current_ledger`, `last_push_at`) |
| **Soroban AMM Backfill CLI** (`soroban-amm-backfill`, ADR 0001) | One-shot Local Rust CLI on operator workstation, run once during Tranche 1 | Local ClickHouse `soroban_events` (populated upfront by BE's `backfill-runner --target=clickhouse`); ScVal decoding via the `stellar-xdr` crate | Historical Soroswap/Aquarius/Phoenix swaps → per-source 1-min rows; on completion runs the cloud push that lands all rows into `prices.*` on Hetzner CH and sets the `soroban_amm` `backfill_progress` row to `status='completed'` |
**Worker removed: OHLCV Rollup Lambda** (eliminated by ADR 0007 §3.4). The 1m → 15m → 1h →
@@ -896,7 +896,7 @@ which drives a two-stream backfill design:
| Stream | Data location | Era | Method |
| --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| **SDEX trades** | `TransactionResultMeta` (`ClaimAtom` from the five trade-shaped op types) in `LedgerCloseMeta` XDR | All-time (2015 → present, ~57M ledgers) | Local Rust CLI on operator workstation (`s3://aws-public-blockchain` anonymous reads) → local Postgres → post-backfill cloud push to Hetzner ClickHouse `prices.*` (see ADR 0005) |
+| **SDEX trades** | `TransactionResultMeta` (`ClaimAtom` from the five trade-shaped op types) in `LedgerCloseMeta` XDR | All-time (2015 → present, ~57M ledgers) | Local Rust CLI on operator workstation (`s3://aws-public-blockchain` anonymous reads) → local ClickHouse → post-backfill cloud push to Hetzner ClickHouse `prices.*` (see ADR 0005) |
| **Soroban AMM swaps** (Soroswap, Aquarius, Phoenix) | `SorobanTransactionMeta.events` (CAP-67), inlined `topics_xdr` + `data_xdr` per row in BE's ClickHouse `soroban_events` (BE ADRs [0033](../../soroban-block-explorer/lore/2-adrs/0033_soroban-events-appearances-read-time-detail.md), [0044](../../soroban-block-explorer/lore/2-adrs/0044_clickhouse-pilot-parallel-store.md)) | Soroban activation (Nov 2023) → present (~8.5M ledgers) | Local Rust CLI on operator workstation, consuming a local ClickHouse instance populated upfront by BE's `backfill-runner --target=clickhouse` (see [ADR 0001](../lore/2-adrs/0001_stream1-clickhouse-sourced-amm-backfill.md)) → one-shot push to Hetzner ClickHouse `prices.*` |
The Soroban AMM stream is handled first (Tranche 1). A short BE-tooling prep step runs BE's
@@ -932,7 +932,7 @@ Local ClickHouse (Docker) on operator workstation
│ `stellar-xdr` crate (shared BE-authored library) │
│ - Extracts token pair + amounts │
│ - Buckets to 1-minute price_ohlcv (ADR 0003 PK shape) │
-│ - Writes to local Postgres (Docker) on workstation │
+│ - Writes to local CH prices.* mirror (Docker) │
└──────────────────────────────────────────────────────────┘
│
▼ one-shot completion push (only Hetzner-CH-touching step on Stream 1)
@@ -961,12 +961,12 @@ s3://aws-public-blockchain/v1.1/stellar/ledgers/pubnet/
│ - Filter + extract per task 0022's spec: │
│ 5 trade-shaped op types → ClaimAtom → TradeTick │
│ - Buckets to 1-minute price_ohlcv (ADR 0003 PK shape) │
-│ - Per-ledger atomic txn: price_ohlcv UPSERTs + │
-│ backfill_progress checkpoint commit together │
+│ - Per-ledger checkpoint: INSERT rows, then │
+│ record ledger in backfill_sdex_ledgers │
└─────────────────────────────────────────────────────────┘
│
▼
- Local Postgres (Docker) on workstation
+ Local ClickHouse (Docker) on workstation
(operator-owned; backfill writes here, not Hetzner CH)
│
▼ `sdex-cloud-push` (separate post-backfill tool, HTTPS-mTLS)
@@ -1008,7 +1008,7 @@ becomes a plain `xdr-parser = "X.Y.Z"` pin — no design change.)
| Runtime | Local Rust CLI on operator workstation (`soroban-amm-backfill`) | No AWS infrastructure for the backfill itself; mirrors §5.6 Stream 2's local-CLI pattern |
| Workstation prep step | BE `backfill-runner --target=clickhouse` populates local CH | One-shot; runs against `s3://aws-public-blockchain` anonymous reads — no AWS account required |
| Local CH footprint | Hundreds of GB pre-compression; substantially smaller post-ZSTD on disk | Sized for an operator workstation per ADR 0001 §Rationale (dev-laptop, not Fargate/EC2) |
-| Sink during backfill | Local Postgres (Docker) on workstation | Hetzner ClickHouse `prices.*` is **not** written until the one-shot completion push |
+| Sink during backfill | Local ClickHouse (Docker) on workstation | Hetzner ClickHouse `prices.*` is **not** written until the one-shot completion push |
| Estimated wall-clock (including CH prep) | A few hours, dominated by `backfill-runner` archive ingestion | CH query + extraction + OHLCV write is fast against an indexed local store |
| Cloud-push cadence | One-shot completion push only | `prices.backfill_progress` row for `soroban_amm` advances from `running` to `completed` in a single transition |
| Expected completion | During Tranche 1 (Week 2–3) | After the push, the local CH instance is torn down |
@@ -1020,7 +1020,7 @@ becomes a plain `xdr-parser = "X.Y.Z"` pin — no design change.)
| Total ledgers | ~57 million | Ledger 1 (Nov 2015) to current tip |
| Runtime | Local Rust CLI on operator workstation | No AWS infrastructure for the backfill itself; mirrors BE `backfill-runner` |
| Source | `s3://aws-public-blockchain/v1.1/stellar/ledgers/pubnet/` | Anonymous `--no-sign-request`; no AWS account needed to read |
-| Sink during backfill | Local Postgres (Docker) on workstation | Hetzner ClickHouse `prices.*` is **not** written during backfill — only by the post-backfill `sdex-cloud-push` step |
+| Sink during backfill | Local ClickHouse (Docker) on workstation | Hetzner ClickHouse `prices.*` is **not** written during backfill — only by the post-backfill `sdex-cloud-push` step |
| Measured CLI rate | ~311 ledgers/s (~1.12M ledgers/hour) | Per task 0022's measurement against the SDEX filter |
| Effective wall-clock (network-bound) | ~12–16 days continuous on one laptop | Archive sync is the bottleneck; CPU rarely saturates |
| Cloud-push cadence | Tip-backward chunks (e.g. `--start=tip-1.1M --end=tip` for the first Tranche 1 chunk, then older ranges) | The cloud `GET /backfill/status` view advances at push cadence, not CLI cadence |
@@ -1068,7 +1068,7 @@ because push cadence is driven by tip-backward chunk size, not by a continuous h
A laptop-side staleness check is **not** wired into AWS alarms — workstation uptime is an
operator-managed concern (BE accepts the same trade in BE ADR 0010). Operators inspect local
-CLI progress via direct SQL on the workstation Postgres.
+CLI progress via direct SQL on the local workstation ClickHouse.
---
@@ -1132,7 +1132,7 @@ on ClickHouse, that machinery is not part of the prices-api budget at any traffi
local CH instance
- **Trust:** BE-managed self-signed CA; prices-api receives per-env client cert + key
issued via BE's per-AWS-service issuance script
- - **Storage:** cert + key live in AWS Secrets Manager (2 secrets per env), loaded into the
+ - **Storage:** `{cert,key,ca}` live in AWS Secrets Manager as a single JSON bundle secret per identity (named by `MTLS_SECRET_NAME`), loaded into the
Lambda runtime on cold start, held in memory for the container's lifetime
- **Rotation:** 1-year manual rotation cadence (BE Cluster C agreement); CloudWatch
alarm on cert NotAfter approaching expiry; re-issuance is a single operator step on
diff --git a/lore/1-tasks/backlog/0017_FEATURE_local-clickhouse-for-prices-backfill.md b/lore/1-tasks/backlog/0017_FEATURE_local-clickhouse-for-prices-backfill.md
index 06f78ab..fce7a1f 100644
--- a/lore/1-tasks/backlog/0017_FEATURE_local-clickhouse-for-prices-backfill.md
+++ b/lore/1-tasks/backlog/0017_FEATURE_local-clickhouse-for-prices-backfill.md
@@ -3,94 +3,175 @@ id: "0017"
title: "Local ClickHouse instance setup and access for prices-api Tranche 1 backfill"
type: FEATURE
status: backlog
-related_adr: ["0001"]
-related_tasks: ["0015", "0018"]
-tags: [layer-infra, priority-high, effort-small, milestone-M1, infra, backfill, clickhouse, block-explorer]
+related_adr: ["0001", "0003", "0004", "0007"]
+related_tasks: ["0015", "0018", "0051", "0052", "0053", "0058", "0026", "0061"]
+tags: [layer-infra, priority-high, effort-small, milestone-M1, infra, backfill, clickhouse, hetzner, block-explorer]
milestone: 1
links:
- "../../2-adrs/0001_stream1-clickhouse-sourced-amm-backfill.md"
+ - "../../2-adrs/0007_live-data-sink-on-shared-hetzner-clickhouse.md"
+ - "../../../packages/prices-clickhouse/schema/init.sql"
- "../archive/0015_RESEARCH_redefine-backfill-with-be-clickhouse-events/notes/S-redesigned-backfill-recommendation.md"
- "../../../../soroban-block-explorer/lore/1-tasks/archive/0205_FEATURE_backfill-runner-clickhouse-target-flag.md"
- - "../../../../soroban-block-explorer/lore/1-tasks/active/0206_FEATURE_clickhouse-persist-real-inserts/README.md"
+ - "../../../../soroban-block-explorer/lore/1-tasks/archive/0206_FEATURE_clickhouse-persist-real-inserts/README.md"
+ - "../../../../soroban-block-explorer/lore/2-adrs/0045_clickhouse-local-backfill-then-mirror-to-hetzner-via-freeze-rsync-attach.md"
history:
- date: 2026-05-12
status: backlog
- who: okarcz
+ who: operator
note: >
Spawned from task 0015 closure. ADR 0001 commits Stream 1 to a
- local-CH-sourced backfill on a developer laptop (okarcz's). This
+ local-CH-sourced backfill on the operator's workstation. This
task is the operational landing: spin up CH locally, run BE's
backfill-runner against it, document the access mechanism that
- lets the prices-api Tranche 1 consumer query the laptop's CH.
+ lets the prices-api Tranche 1 consumer query the local CH.
+ - date: 2026-06-19
+ status: backlog
+ who: claude
+ note: >
+ Refreshed against post-ADR-0007 architecture. Local store is now
+ ClickHouse mirroring the Hetzner `prices.*` schema (NOT a local
+ Postgres) — same local-CH→Hetzner shape BE uses. Added the
+ archival/historical USD asset price requirement (volume_quote /
+ volume_quote_usd / close_usd + oracle_prices; tasks 0058/0026/0061)
+ and the mTLS cloud-push path via the 0052 client. Replaced personal
+ username with "operator".
---
# Local ClickHouse instance setup and access for prices-api Tranche 1 backfill
## Summary
-Stand up a local ClickHouse instance on okarcz's developer laptop,
-populated by running BE's `backfill-runner --target=clickhouse`
-(BE task 0205) for the Soroban-activation-onward ledger range
-(~ledger 48.5M to current tip, ~8.5M ledgers). Document the access
-mechanism that lets the prices-api Tranche 1 consumer (separate
-follow-up) query this CH instance to extract Soroban AMM swap events.
+Stand up a local ClickHouse instance on the operator's workstation that
+serves **two roles** for the Stream 1 historical AMM backfill:
+
+1. **Raw input** — populated by running BE's `backfill-runner
+ --target=clickhouse` (BE task 0205) for the Soroban-activation-onward
+ ledger range (~ledger 48.5M to current tip, ~8.5M ledgers), giving the
+ backfill CLI a local `soroban_events` source to query.
+2. **Local `prices.*` mirror** — the same `prices` schema that lives on
+ the Hetzner CH box, applied locally from our canonical
+ `packages/prices-clickhouse/schema/init.sql`. The Stream 1 backfill
+ (task 0053) aggregates decoded swaps into this local mirror, then runs
+ a one-shot mTLS push to Hetzner `prices.*`. **This replaces the earlier
+ "aggregate into a local Postgres" design** — we now mirror Hetzner in a
+ local ClickHouse, the same local-CH→Hetzner shape BE uses (BE ADR 0045).
-Tear-down trigger: once Tranche 1 backfill completes and the
-extracted OHLCV trade points are persisted in prices-api PostgreSQL.
+Document the access mechanism that lets the backfill CLI query the local
+CH. Tear-down trigger: once Tranche 1 backfill completes and the OHLCV
+rows are pushed to Hetzner `prices.*`.
## Context
-ADR 0001 commits prices-api Stream 1 to local-CH-sourced backfill.
-This task is the infrastructure side. It is gated on BE task 0206
-(real CH writer) reaching a quality bar that prices-api can consume —
-BE task 0117 (local backfill benchmark) is the proxy signal.
+ADR 0001 commits prices-api Stream 1 to local-CH-sourced backfill; this
+task is the infrastructure side. Two architectural shifts since the
+2026-05-12 draft change the shape:
+
+- **ADR 0007 (accepted 2026-05-20)** — the live data sink is BE's shared
+ Hetzner ClickHouse, **not** prices-owned RDS Postgres. All OHLCV / oracle
+ / asset / backfill-progress data lives in the `prices` database on the
+ Hetzner box. The backfill therefore stages into a *local CH mirror* of
+ `prices.*` and pushes to Hetzner — no Postgres anywhere in the path.
+- **ADR 0003/0004** — OHLCV rows are per-source
+ (`source ∈ {soroswap, aquarius, phoenix, sdex, …}`) on
+ `ReplacingMergeTree(version)`, PK `(timestamp, asset_id, quote_asset_id,
+ source)`. Cross-source merge happens at read time.
+
+The `prices.*` schema is already authored and shipped (tasks 0059/0060/0061)
+in `packages/prices-clickhouse/schema/` (`init.sql` + `seed.sql` +
+`rollups.sql` + `views.sql`), applied locally by the
+`packages/prices-clickhouse-init` binary — the exact same artifact that
+applies to Hetzner. The cloud push uses the shared mTLS client from task
+0052 (`packages/prices-clickhouse`, `aws-mtls` feature) to Caddy:443.
+
+The input side is gated on BE task 0206 (real CH writer) — **shipped**
+(archived), as are BE 0205 (the `--target=clickhouse` flag) and BE 0117
+(the local backfill benchmark, the storage/throughput proxy signal).
+
+### Archival / historical USD asset price requirement
+
+The `prices.*` OHLCV tables now carry USD columns that the historical
+backfill must account for:
+
+- `volume_quote Decimal(38,14)` — native quote-asset volume, restored by
+ task 0058. The backfill (0053) **must populate this** per bucket
+ (`Σ |quote_amount|`); it is the input to USD enrichment.
+- `volume_quote_usd Decimal(38,14) DEFAULT 0` and
+ `close_usd Decimal(38,14) DEFAULT 0` (task 0061) — filled by enrichment
+ as `oracle_usd × volume_quote` and `oracle_usd × close`.
+- `oracle_prices` table — the USD reference series enrichment ASOF-joins
+ against (tasks 0024/0026).
+
+For a **historical** backfill this needs **archival USD asset prices**:
+USD references for *past* timestamps back across the Soroban range, not
+just live oracle ticks. Whether `oracle_prices` actually carries history
+that far back — or whether an archival USD price source must be loaded
+into the local mirror before enrichment can run — is an **open question**
+(see Notes / spawn point). The local mirror must at minimum include the
+`oracle_prices` table and the USD columns so the enrichment path exists
+end-to-end; sourcing the archival USD series is tracked with 0026.
## Implementation
-- Docker compose service mirroring BE's task 0204 compose definition
- (CH version, port mapping, healthcheck, volume mount).
-- Run BE's `backfill-runner --target=clickhouse` against
- `` for the Soroban range. Estimate disk: BE's task
- 0117 benchmark is the input here.
-- Decide access mechanism for the prices-api consumer:
- - **Option a:** SSH tunnel from the consumer's host to laptop CH
- HTTP port (8123). Lowest friction, requires laptop online.
- - **Option b:** Cloudflare tunnel / tailscale-exposed CH for
- multi-machine prices-api workflow.
- - **Option c:** Read-only CH snapshot exported to S3, consumer
- reads from S3 via clickhouse-local.
-- Document the chosen mechanism in `lore/3-wiki/` (or inline in
- prices-api repo docs).
-- Capture the actual disk usage, completion time, and any
- population errors as a closing note for ADR 0001's "consequences"
- section.
+- Docker-compose CH service mirroring BE's compose definition (CH version,
+ port mapping `127.0.0.1:8123`, healthcheck, volume mount).
+- Apply **both** schemas to the local instance:
+ - BE's `init.sql` (`soroban_events` + friends) so `backfill-runner` has
+ its target tables — the raw input side.
+ - Our `packages/prices-clickhouse/schema/{init,seed,rollups,views}.sql`
+ via `prices-clickhouse-init` so the `prices.*` mirror matches Hetzner
+ exactly (USD columns + `oracle_prices` + per-source OHLCV +
+ `backfill_progress` seeded `soroban_amm`).
+- Run BE's `backfill-runner --target=clickhouse` against the public ledger
+ archive for the Soroban range to populate `soroban_events`. Disk estimate:
+ BE task 0117 benchmark is the input (~100–150 GB order-of-magnitude for
+ the Soroban-only range; the `prices.*` mirror itself is tiny, ~0.45 GB/yr).
+- Access mechanism for the backfill CLI: local plaintext HTTP
+ (`localhost:8123`) on the workstation — no mTLS for the local hop;
+ mTLS is only the Hetzner push (task 0052 client), not local reads.
+- Capture actual disk usage, completion time, and any population errors as
+ a closing note for ADR 0001's "consequences" section.
## Acceptance Criteria
-- [ ] Local CH instance running on okarcz's laptop with BE schema
- applied (idempotent `init.sql` from BE task 0204).
-- [ ] BE `backfill-runner --target=clickhouse` completes against
- the Soroban-activation-onward ledger range with no
- `parts_to_throw_insert` errors and zero parser-data loss
- (verified against BE task 0206's coverage contract).
-- [ ] Access mechanism chosen and documented; prices-api Tranche 1
- consumer (separate follow-up) can run a smoke query
+- [ ] Local CH instance running on the operator's workstation with **BE's
+ schema** applied (idempotent `init.sql` from BE task 0204) for the
+ `soroban_events` input.
+- [ ] Local CH also hosts the **`prices.*` mirror** of the Hetzner schema,
+ applied from `packages/prices-clickhouse/schema/*` via
+ `prices-clickhouse-init` — same artifact as the Hetzner apply,
+ including the USD columns (`volume_quote`, `volume_quote_usd`,
+ `close_usd`) and the `oracle_prices` table.
+- [ ] BE `backfill-runner --target=clickhouse` completes against the
+ Soroban-activation-onward ledger range with no `parts_to_throw_insert`
+ errors and zero parser-data loss (verified against BE task 0206's
+ coverage contract).
+- [ ] Access mechanism documented; the backfill CLI can run a smoke query
(`SELECT count() FROM soroban_events WHERE signature = 'swap'`)
- against the laptop CH.
-- [ ] Disk usage, run time, and any anomalies captured as a closing
- note appended to ADR 0001 or a `notes/G-backfill-run-log.md`.
-- [ ] Tear-down checklist documented (when to nuke the volume,
- how to do it cleanly).
+ against the local CH and write a smoke row into the local
+ `prices.price_ohlcv_1m` mirror.
+- [ ] Archival USD price availability assessed and recorded: whether
+ `oracle_prices` covers the historical range, or an archival USD price
+ source is required (hand-off to 0026 if so).
+- [ ] Disk usage, run time, and any anomalies captured as a closing note
+ appended to ADR 0001 or a `notes/G-backfill-run-log.md`.
+- [ ] Tear-down checklist documented (when to nuke the volume, how to do
+ it cleanly).
## Notes
-- BE's task 0206 must be merged before this task can run to completion.
- If it is still active when this task starts, coordinate with BE
- (fmazur) on whether a development-grade CH writer is good enough
- for prices-api's Tranche 1 consumption, or whether we wait for
- 0206's full landing.
-- Storage estimate: BE has not published a hard number for the
- Soroban-activation-onward window. Order-of-magnitude estimate
- from extrapolating ADR 0044's ~550 GB full-mainnet backfill is
- ~100–150 GB for the Soroban-only range; verify against BE task
- 0117 benchmark output.
+- **Local-only / prepare-only constraint**: the whole flow runs against
+ local Docker (CH on localhost); the only public network access is the
+ read-only `--no-sign-request` ledger fetch by `backfill-runner`. The
+ Hetzner push (0053) is a separate, approval-gated step — not part of
+ this setup task.
+- BE's task 0206 must be merged before this task can run to completion —
+ **done** (archived). If a future BE writer change lands, re-verify
+ against 0206's coverage contract.
+- **Stale references to fix elsewhere** (out of scope here, flag only):
+ task 0053 and `docs/database-schema/database-schema-overview.md` still
+ describe the backfill as aggregating into a *local Postgres*. Those need
+ the same local-CH-mirror correction this task adopts.
+- Storage estimate for `soroban_events`: BE has not published a hard number
+ for the Soroban-activation-onward window; order-of-magnitude ~100–150 GB,
+ verify against BE task 0117 benchmark output.
diff --git a/lore/1-tasks/backlog/0028_FEATURE_sdex-cloud-push.md b/lore/1-tasks/backlog/0028_FEATURE_sdex-cloud-push.md
index a85b82d..cee86f4 100644
--- a/lore/1-tasks/backlog/0028_FEATURE_sdex-cloud-push.md
+++ b/lore/1-tasks/backlog/0028_FEATURE_sdex-cloud-push.md
@@ -1,133 +1,158 @@
---
id: "0028"
-title: "SDEX cloud-push — stream local price_ohlcv + assets to cloud RDS after backfill"
+title: "SDEX cloud-push — stream local price_ohlcv + assets to Hetzner ClickHouse prices.* after backfill"
type: FEATURE
status: backlog
-related_adr: ["0003", "0005"]
-related_tasks: ["0011", "0012", "0027"]
-tags: [layer-indexing, priority-medium, effort-medium, milestone-M1, cloud-push, clickhouse, hetzner, postgres, sdex, stream-2, rust]
+related_adr: ["0003", "0005", "0007"]
+related_tasks: ["0012", "0027", "0052", "0063", "0051"]
+tags: [layer-indexing, priority-medium, effort-medium, milestone-M1, cloud-push, clickhouse, hetzner, mtls, sdex, stream-2, rust]
milestone: 1
links:
- "../active/0012_FEATURE_design-prices-owned-backfill-fargate/notes/G-sdex-backfill-local-design.md"
- "../../2-adrs/0005_stream2-sdex-local-workstation-backfill.md"
- "../../2-adrs/0003_price-ohlcv-pk-includes-quote-asset-id.md"
+ - "../../2-adrs/0007_live-data-sink-on-shared-hetzner-clickhouse.md"
- "../../../../soroban-block-explorer/lore/2-adrs/0040_multi-laptop-backfill-snapshot-merge-hazards.md"
history:
- date: 2026-05-14
status: backlog
- who: okarcz
+ who: operator
note: >
Spawned from task 0012 future work alongside the ADR 0005
pivot. Implements the post-backfill cloud-push step sketched
- in 0012 G-note §11. Blocked on task 0011 (cloud RDS exists)
- and task 0027 (local backfill data exists).
+ in 0012 G-note §11. Blocked on task 0027 (local backfill data
+ exists) and the Hetzner CH target being provisioned.
+ - date: 2026-06-19
+ status: backlog
+ who: claude
+ note: >
+ Refreshed to the post-ADR-0007 reality: source is the local
+ ClickHouse from 0027 (not local Postgres), target is the shared
+ Hetzner CH `prices.*` (not cloud RDS), push uses the 0052 mTLS
+ client, idempotency is ReplacingMergeTree(version) (not Postgres
+ ON CONFLICT / xmax). Repointed RDS blocker 0011 → provisioning
+ 0063 + schema 0051. Replaced personal username with "operator".
---
-# SDEX cloud-push — stream local `price_ohlcv` + `assets` to cloud RDS
+# SDEX cloud-push — stream local `price_ohlcv` + `assets` to Hetzner CH `prices.*`
## Summary
Lands a small Rust CLI (`sdex-cloud-push`) that streams the finalised
-prices tables (`price_ohlcv` + `assets`) from the operator's local
-Postgres (task 0027 output) to the cloud RDS instance (task 0011
-output). The push is idempotent and re-runnable; it resolves
-surrogate-id collisions on `assets` via natural-key matching so that
-the cloud DB can already contain rows written by the live-ingestion
-Lambda.
+prices tables (`price_ohlcv_1m` + `assets`) from the operator's local
+ClickHouse (task 0027 output) to the shared Hetzner ClickHouse `prices.*`
+database via the 0052 mTLS client (Caddy:443). The push is idempotent and
+re-runnable; it resolves surrogate-id collisions on `assets` via
+natural-key matching so the cloud DB can already contain rows written by
+the live-ingestion Lambda (0038).
## Context
-ADR 0005 (supersedes ADR 0002) commits Stream 2 SDEX backfill to a
-local workstation pattern; the cloud is exposed to API consumers via
-a separate push step described in §11 of task 0012's design G-note.
-This task is that push step.
+ADR 0005 (supersedes ADR 0002) commits Stream 2 SDEX backfill to a local
+workstation pattern; the data is exposed to API consumers via a separate
+push step described in §11 of task 0012's design G-note. This task is that
+push step. ADR 0007 then pivoted the live sink from RDS Postgres to BE's
+shared Hetzner ClickHouse — so both the local store (0027, already shipped
+on local ClickHouse) and the cloud target are now ClickHouse, and the push
+is a CH→CH copy over mTLS rather than a Postgres→RDS UPSERT.
-The shape mirrors a narrowed version of BE's `crates/db-merge`
-(BE ADR 0040): natural-key remap on the only FK-source table
-(`assets`), then a batched UPSERT into the downstream
-`price_ohlcv` keyed by `(timestamp, asset_id, quote_asset_id,
-granularity)` per ADR 0003. Two tables in scope, vs. BE's twelve —
-significantly simpler.
+The shape mirrors a narrowed version of BE's `crates/db-merge` (BE ADR
+0040): natural-key remap on the only FK-source table (`assets`), then a
+batched INSERT into `price_ohlcv_1m` keyed by `(timestamp, asset_id,
+quote_asset_id, source)` per ADR 0003/0004. Idempotency comes from
+`ReplacingMergeTree(version)` collapse, not row-level UPSERT.
Blocked on:
-- **Task 0011** — Cloud RDS must exist (CDK bootstrap landing).
-- **Task 0027** — Local backfill must have produced data to push.
+- **Task 0027** — local backfill must have produced data to push (done).
+- **Task 0063** — the `prices` database, scoped users, and per-env mTLS
+ cert/endpoint must be provisioned on the Hetzner box.
+- **Task 0051** — the target `prices.*` schema must be applied.
+- **Task 0052** — the shared mTLS CH client used for the cloud hop.
## Implementation
-1. **`sdex-cloud-push` bin crate** alongside `sdex-backfill` in the
- Cargo workspace established by task 0027. Reuses the `db` lib
- crate's sqlx pool.
+1. **`sdex-cloud-push` bin crate** alongside `sdex-backfill` in the Cargo
+ workspace established by task 0027. Reads the local plaintext CH with
+ the raw `clickhouse` crate; writes the Hetzner target with the 0052
+ shared mTLS client (`packages/prices-clickhouse`, `aws-mtls` feature).
-2. **CLI flags** per 0012 G-note §11.1:
+2. **CLI flags** (CH-flavoured update of 0012 G-note §11.1):
```bash
sdex-cloud-push \
- --source-url postgres://...local... \
- --target-url postgres://...cloud... \
- --tables price_ohlcv,assets \
- --since-ledger # optional; defaults to all
+ --source-ch-url http://localhost:8123 # local CH (0027 output) \
+ --target-db prices # Hetzner prices.* via Caddy:443 \
+ --tables price_ohlcv_1m,assets \
+ --since-ledger # optional; defaults to all
```
- - `--source-url` reads `DATABASE_URL_LOCAL` env.
- - `--target-url` reads `DATABASE_URL_CLOUD` env.
- - `--tables` defaults to `assets,price_ohlcv`.
- - `--since-ledger` filters local rows by `MIN(ledger)` derived
- from `price_ohlcv` row's source ledger range. Optional.
-
-3. **Assets remap** per 0012 G-note §11.2:
- - For each local `assets` row, look up the cloud row by its
- natural key (the same unique constraint columns the
- live-ingestion Lambda upserts on).
+ - `--source-ch-url` reads the local CH (no mTLS).
+ - The target endpoint + per-env cert come from the 0052 client
+ (`MTLS_SECRET_NAME` → AWS Secrets Manager `{cert,key,ca}` bundle).
+ - `--tables` defaults to `assets,price_ohlcv_1m`.
+ - `--since-ledger` filters local rows by the source ledger range.
+
+3. **Assets remap** per 0012 G-note §11.2 (still required because 0027
+ assigns local surrogate ids `max+1`, which can diverge from cloud ids
+ the live Lambda already wrote):
+ - For each local `assets` row, look up the cloud row by its natural key
+ (the same unique columns the live-ingestion path keys on).
- Build a `local_id → cloud_id` map in memory.
- - INSERT new assets into cloud, capturing the returned `id`s
- into the map.
+ - INSERT genuinely-new assets into the cloud table.
-4. **`price_ohlcv` push:**
- - Stream rows from local in batches (5-10k rows / round-trip).
+4. **`price_ohlcv_1m` push:**
+ - Stream rows from local CH in batches (5–10k rows / round-trip).
- Rewrite `asset_id` and `quote_asset_id` via the map from step 3.
- - `INSERT … ON CONFLICT (timestamp, asset_id, quote_asset_id, granularity)
- DO UPDATE SET …` matching the whole-row replacement contract
- from task 0022 decode-and-bucket §5.4.
+ - INSERT into Hetzner `prices.price_ohlcv_1m`. `version` (per ADR 0004)
+ carries the dedup key; `ReplacingMergeTree(version)` collapses any
+ overlap with live-ingested or previously-pushed rows on background
+ merge. No `ON CONFLICT` — CH has no row-level UPSERT.
-5. **Idempotency:** the tool must be safely re-runnable. A re-run
- should be a no-op when local and cloud are in sync. Test:
- run the push twice in a row; second run produces no row changes
- in cloud (verifiable via `xmax` or row-count diff).
+5. **Idempotency:** the tool must be safely re-runnable. A re-run is a
+ no-op once local and cloud are in sync — same `version`s collapse.
+ Test: run the push twice; `SELECT count() … FINAL` is identical before
+ and after the second run.
6. **Observability:** mirror `sdex-backfill`'s stdout-JSON tracing
pattern. Stable event names:
- - `push_started`, `assets_remapped` (with counts: new vs existing),
+ - `push_started`, `assets_remapped` (counts: new vs existing),
- `price_ohlcv_batch` (per-batch summary),
- `push_complete` (total counts + duration).
7. **Runbook section** added to `docs/runbooks/backfill-sdex.md`
- (created by task 0027): Cloud push step, when to run, what to
- verify post-push.
+ (created by task 0027): cloud-push step, when to run, what to verify
+ post-push.
-8. **Smoke test:** spin up a local "cloud stand-in" Postgres via
- docker-compose, run `sdex-backfill` against a 10k-ledger range
- to populate local, then run `sdex-cloud-push` against the
- stand-in. Diff `SELECT COUNT(*), MIN(timestamp), MAX(timestamp)
- FROM price_ohlcv` between source and target — should match.
+8. **Smoke test:** spin up a local "cloud stand-in" ClickHouse via
+ docker-compose (plaintext, no mTLS), run `sdex-backfill` against a
+ 10k-ledger range to populate the source CH, then run `sdex-cloud-push`
+ against the stand-in. Diff `SELECT count(), min(timestamp),
+ max(timestamp) FROM price_ohlcv_1m FINAL` between source and target —
+ should match.
## Acceptance Criteria
- [ ] `sdex-cloud-push` bin crate added to the workspace.
-- [ ] `--source-url` / `--target-url` / `--tables` / `--since-ledger`
- CLI flags implemented per 0012 G-note §11.1.
+- [ ] `--source-ch-url` / `--target-db` / `--tables` / `--since-ledger`
+ CLI flags implemented; target cert/endpoint resolved via the 0052
+ client (`MTLS_SECRET_NAME` bundle).
- [ ] `assets` natural-key remap correctly handles three cases:
- (a) new row → INSERT + capture id, (b) existing row → reuse
- cloud id, (c) re-run after partial failure → idempotent.
-- [ ] `price_ohlcv` batched UPSERT preserves whole-row replacement
- semantics (task 0022 §5.4) on the new PK shape (ADR 0003).
-- [ ] Smoke test passes: backfill 10k-ledger range to local, push to
- stand-in, row counts and aggregates match between source and target.
-- [ ] Re-running the push on synced source+target is a no-op
- (verified by post-run row-count diff).
+ (a) new row → INSERT + capture id, (b) existing row → reuse cloud
+ id, (c) re-run after partial failure → idempotent.
+- [ ] `price_ohlcv_1m` batched INSERT lands in Hetzner `prices.*` with
+ the correct PK/`version` shape (ADR 0003/0004); duplicates collapse
+ under `ReplacingMergeTree(version)`.
+- [ ] Smoke test passes: backfill 10k-ledger range to local CH, push to
+ the CH stand-in, row counts and aggregates match between source and
+ target (`… FINAL`).
+- [ ] Re-running the push on a synced source+target is a no-op (verified
+ by `count() FINAL` diff).
- [ ] Runbook section in `docs/runbooks/backfill-sdex.md` covers
first-push and subsequent-push operator workflows.
## Blocked by
-- **0011** — Cloud RDS must exist (CDK bootstrap landing).
-- **0027** — Local backfill must produce data to push.
+- **0027** — local backfill must produce data to push (done).
+- **0063** — Hetzner `prices` DB, scoped users, and per-env mTLS
+ cert/endpoint provisioned (supersedes the old RDS dependency on 0011).
+- **0051** — target `prices.*` schema applied.
+- **0052** — shared mTLS CH client for the cloud hop.
diff --git a/lore/1-tasks/backlog/0050_FEATURE_be-side-prep-sns-mtls-prices-db-provisioning/README.md b/lore/1-tasks/backlog/0050_FEATURE_be-side-prep-sns-mtls-prices-db-provisioning/README.md
index 80d2fea..8b5b0e1 100644
--- a/lore/1-tasks/backlog/0050_FEATURE_be-side-prep-sns-mtls-prices-db-provisioning/README.md
+++ b/lore/1-tasks/backlog/0050_FEATURE_be-side-prep-sns-mtls-prices-db-provisioning/README.md
@@ -87,10 +87,11 @@ to land before any prices-api ingestion or schema work can begin:
change.
2. **mTLS client cert issuance** (§5.2, §7, ADR 0007 §3.5): BE
runs the self-signed CA and the per-AWS-service issuance
- script. Prices-api receives one client cert + key per env
- (dev / staging / prod), to be stored in AWS Secrets Manager
- (2 secrets per env). 1-year manual rotation cadence (BE
- Cluster C agreement). The issuance script invocation is the
+ script. Prices-api receives one client `{cert,key,ca}` per env
+ (dev / staging / prod), to be stored in AWS Secrets Manager as
+ a single JSON bundle secret per identity (named by
+ `MTLS_SECRET_NAME`, per task 0063). 1-year manual rotation
+ cadence (BE Cluster C agreement). The issuance script invocation is the
only BE-side operator step per cert lifecycle.
3. **`prices` database + user + quota + profile** (§3 intro,
§11.1, ADR 0007 §3.5): BE creates the empty `prices` database
diff --git a/lore/1-tasks/backlog/0053_FEATURE_soroban-amm-backfill-cli-stream-1-impl.md b/lore/1-tasks/backlog/0053_FEATURE_soroban-amm-backfill-cli-stream-1-impl.md
index 712bda4..ad870a0 100644
--- a/lore/1-tasks/backlog/0053_FEATURE_soroban-amm-backfill-cli-stream-1-impl.md
+++ b/lore/1-tasks/backlog/0053_FEATURE_soroban-amm-backfill-cli-stream-1-impl.md
@@ -4,7 +4,7 @@ title: "Soroban AMM Backfill CLI (`soroban-amm-backfill`) — Stream 1 implement
type: FEATURE
status: backlog
related_adr: ["0001", "0003", "0004", "0007"]
-related_tasks: ["0017", "0034", "0037", "0048", "0052", "0051"]
+related_tasks: ["0017", "0034", "0037", "0048", "0052", "0051", "0058", "0026"]
tags: [layer-indexing, priority-high, effort-large, milestone-M1, stream-1, rust, cli, workstation, clickhouse, soroban, amm, soroswap, aquarius, phoenix]
milestone: 1
links:
@@ -21,16 +21,27 @@ links:
history:
- date: 2026-05-21
status: backlog
- who: okarcz
+ who: operator
note: >
Spawned during Tranche 1 task-set creation. ADR 0001 commits
Stream 1 to a local-CH-sourced workstation CLI; 0017 covers
the CH instance setup; 0037 covers the dispatch kernel; 0034
covers Phoenix WASM tolerance; 0048 carries the decoder spec.
No task owns the actual `soroban-amm-backfill` binary —
- decode loop, bucket to 1-min OHLCV, write to local Postgres,
- run the one-shot completion push to Hetzner CH. This task
- fills the gap.
+ decode loop, bucket to 1-min OHLCV, write to the local prices.*
+ CH mirror, run the one-shot completion push to Hetzner CH. This
+ task fills the gap.
+ - date: 2026-06-19
+ status: backlog
+ who: claude
+ note: >
+ Corrected the local staging store from a local Postgres to a
+ local ClickHouse mirror of the Hetzner `prices.*` schema (same
+ local-CH→Hetzner shape BE uses) — Summary, Context, Steps
+ 1/5/6/7/8, acceptance criteria, deps. Added volume_quote (0058)
+ population + USD columns left DEFAULT 0 for the enrichment pass
+ (0026). Repointed provisioning blocker 0050→0063. Replaced
+ personal username with "operator".
---
# Soroban AMM Backfill CLI (`soroban-amm-backfill`)
@@ -42,8 +53,10 @@ binary that reads the operator's local ClickHouse `soroban_events`
(populated upfront by BE's `backfill-runner --target=clickhouse`
per 0017), decodes Soroswap / Aquarius / Phoenix swap events via
the `stellar-xdr` crate using the 0048 decoder spec, buckets the
-results into per-source 1-min OHLCV rows in a local Postgres, and
-runs a one-shot completion push to Hetzner ClickHouse `prices.*`
+results into per-source 1-min OHLCV rows in a **local ClickHouse
+mirror of the Hetzner `prices.*` schema** (the same local-CH→Hetzner
+shape BE uses, not a local Postgres), and runs a one-shot completion
+push to Hetzner ClickHouse `prices.*`
that lands the historical rows and flips
`prices.backfill_progress.soroban_amm` to `status='completed'`.
@@ -65,10 +78,13 @@ on the operator's workstation. The runtime flow is:
3. Each event's `topics_xdr` + `data_xdr` are decoded into a
`TradeTick` per the 0048 decoder spec.
4. Ticks are bucketed to 1-min OHLCV per ADR 0003 PK shape
- (timestamp, asset_id, quote_asset_id, source) and written to
- a local Postgres (Docker) on the workstation.
-5. On completion, the cloud-push step streams the local
- `price_ohlcv_*` rows into Hetzner CH `prices.*` via the 0052
+ (timestamp, asset_id, quote_asset_id, source) and INSERTed into
+ a local ClickHouse mirror of `prices.*` (Docker — the same local
+ CH that holds `soroban_events`; schema applied from
+ `packages/prices-clickhouse/schema/*` per 0017). `version` is set
+ per ADR 0004 so ReplacingMergeTree collapses re-runs.
+5. On completion, the cloud-push step copies the local `prices.*`
+ mirror rows into Hetzner CH `prices.*` via the 0052
shared mTLS client, then flips `backfill_progress.soroban_amm`
to `status='completed'`. The local CH instance is torn down
post-push.
@@ -83,11 +99,11 @@ Total wall-clock: a few hours, dominated by 0017's
Add `packages/soroban-amm-backfill/` as a binary crate.
Dependencies:
-- `clickhouse` (raw, not the 0052 wrapper — reads from the local
- plaintext CH, no mTLS).
+- `clickhouse` (raw, not the 0052 wrapper — reads `soroban_events`
+ from and writes the `prices.*` mirror to the local plaintext CH,
+ no mTLS).
- 0052 shared CH client for the cloud-push step (mTLS to Caddy).
- `stellar-xdr` — official SDF Rust XDR types for ScVal decoding.
-- `sqlx` (Postgres) — local Postgres sink.
- `clap` — CLI argument parsing.
- `tracing` + `tracing-subscriber` — structured logs.
@@ -96,7 +112,7 @@ CLI surface:
```
soroban-amm-backfill
--local-ch-url URL (default http://localhost:8123)
- --local-pg-url URL (default postgres://localhost/prices_backfill)
+ --mirror-db NAME (local prices.* mirror DB in the local CH; default 'prices')
--start-ledger SEQ
--end-ledger SEQ
--venues VENUES (comma-separated: soroswap,aquarius,phoenix; default all)
@@ -158,19 +174,26 @@ For each `TradeTick`:
2. Compute the per-tick price and per-tick `volume_quote /
volume_base` per 0048 §3.
3. Group ticks into the `(floor_minute(closed_at), asset_id,
- quote_asset_id, source)` bucket and merge into local Postgres
- `price_ohlcv_1m` using ADR 0004's incremental-merge semantics
- (preserve open, overwrite close, GREATEST(high), LEAST(low),
- sum volumes + trade_count, recompute vwap).
-4. Pre-roll higher granularities (15m, 1h, 4h, 1d, 1w, 1M) in
- the same pass so the cloud-push step writes already-aggregated
- rows directly to the target tables (matches design doc §3.2
+ quote_asset_id, source)` bucket and finalize each bucket
+ in-memory per ADR 0004 (preserve open, overwrite close,
+ GREATEST(high), LEAST(low), sum volumes + trade_count, recompute
+ vwap). Populate `volume_quote` with native quote-asset volume
+ (per 0058); leave `volume_quote_usd` and `close_usd` at their
+ `DEFAULT 0` — the USD enrichment pass fills them later from
+ `oracle_prices` (0026; see archival-USD open question in 0017).
+ INSERT the finalized rows into the local CH
+ `prices.price_ohlcv_1m` mirror; ReplacingMergeTree(version)
+ collapses any re-run.
+4. Pre-roll higher granularities (15m, 1h, 4h, 1d, 1w, 1M) in the
+ same pass (math mirrors the `rollups.sql` MV chain) and INSERT
+ them into their local CH mirror tables, so the cloud-push step
+ copies already-aggregated rows (matches design doc §3.2
"Backfill scripts produce already-aggregated rows").
### Step 6: Cloud-push (`push` subcommand)
-Streams all `price_ohlcv_*` rows + new `assets` rows from local
-Postgres to Hetzner CH `prices.*` via the 0052 mTLS client:
+Copies all `price_ohlcv_*` rows + new `assets` rows from the local
+CH `prices.*` mirror to Hetzner CH `prices.*` via the 0052 mTLS client:
1. Open a CH connection per granularity table.
2. Stream rows in chunks (e.g. 10k rows per INSERT) using the
@@ -188,9 +211,10 @@ Postgres to Hetzner CH `prices.*` via the 0052 mTLS client:
- Unit: decoder paths covered by 0037 / 0048; here, test the
bucketing + pre-roll math for at least two venue-pair scenarios.
-- Integration: end-to-end against a Docker CH + Docker Postgres,
- seeded with a small recorded `soroban_events` fixture; assert
- the produced 1-min rows match a hand-computed gold file.
+- Integration: end-to-end against a Docker CH holding both the
+ `soroban_events` input and the `prices.*` mirror, seeded with a
+ small recorded `soroban_events` fixture; assert the produced
+ 1-min rows match a hand-computed gold file.
- Smoke: cloud-push step against a local Docker CH (no mTLS) +
a stubbed `prices.backfill_progress` row.
@@ -201,7 +225,7 @@ A short `RUNBOOK.md` documenting the end-to-end run sequence:
1. Run 0017's CH prep (`backfill-runner --target=clickhouse`).
2. Apply 0051's schema to Hetzner CH `prices` if not done.
3. Run `soroban-amm-backfill decode --start-ledger=… --end-ledger=…`.
-4. Inspect local Postgres row counts; spot-check via SQL.
+4. Inspect local CH `prices.*` mirror row counts; spot-check via SQL.
5. Run `soroban-amm-backfill push --target-ch-url=…`.
6. Confirm `GET /backfill/status` shows
`soroban_amm.status: "completed"` (0055).
@@ -210,11 +234,13 @@ A short `RUNBOOK.md` documenting the end-to-end run sequence:
## Acceptance Criteria
- [ ] `packages/soroban-amm-backfill` binary builds and runs
- end-to-end against a Docker local CH + Docker local Postgres
+ end-to-end against a Docker local CH (soroban_events input +
+ prices.* mirror)
- [ ] Decoder paths produce `TradeTick` records that match the
0048 spec gold-file fixtures for all three venues
-- [ ] Pre-rolled higher-granularity rows in local Postgres are
- consistent with the 1-min rows under the §3.2 MV semantics
+- [ ] Pre-rolled higher-granularity rows in the local CH prices.*
+ mirror are consistent with the 1-min rows under the §3.2 MV
+ semantics
(`argMin(open)`, `max(high)`, `min(low)`, `argMax(close)`,
`sum(volumes)`)
- [ ] `push` subcommand streams local rows to Hetzner CH
@@ -245,8 +271,9 @@ A short `RUNBOOK.md` documenting the end-to-end run sequence:
workflow can run before 0052 lands.
- **0051** — the target Hetzner CH `prices.*` schema must exist
before the push lands data.
-- **0050** — the Hetzner CH credentials and endpoint must be
- provisioned before the push lands.
+- **0063** — the `prices` database, scoped users, and per-env mTLS
+ cert/endpoint must be provisioned on the Hetzner box before the
+ push lands (0050 narrowed to SNS only).
## Out of scope
diff --git a/lore/1-tasks/blocked/0052_FEATURE_clickhouse-mtls-client-shared-crate.md b/lore/1-tasks/blocked/0052_FEATURE_clickhouse-mtls-client-shared-crate.md
index bca10ad..5cd320d 100644
--- a/lore/1-tasks/blocked/0052_FEATURE_clickhouse-mtls-client-shared-crate.md
+++ b/lore/1-tasks/blocked/0052_FEATURE_clickhouse-mtls-client-shared-crate.md
@@ -95,8 +95,9 @@ dep is pinned at `0.13` with only the `inserter` feature — **no TLS**.
What is missing for talking to the production Hetzner box (ADR 0007
§3.5 / design §5.2 "mTLS write path"):
-1. Load the per-env client cert + key from AWS Secrets Manager
- (two secrets per env per ADR 0007 §3.5).
+1. Load the per-env client `{cert,key,ca}` from AWS Secrets Manager
+ (a single JSON bundle secret per identity, named by
+ `MTLS_SECRET_NAME`, per ADR 0007 §3.5 / task 0063).
2. Establish a warm TLS connection during Lambda global init so
the ~80–130 ms cross-cloud RTT for TLS handshake is amortised
across invocations.