From b460ee783a4804e2a21645fce7bf9115667a3176 Mon Sep 17 00:00:00 2001 From: karczuRF Date: Fri, 19 Jun 2026 10:53:14 +0200 Subject: [PATCH 1/3] docs(lore-0017): refresh backfill tasks to local ClickHouse mirror Align the Stream 1/2 backfill tasks with the post-ADR-0007 reality: the local staging store is a ClickHouse mirror of the Hetzner prices.* schema (same local-CH -> Hetzner shape BE uses), not a local Postgres. - 0017: dual-role local CH (soroban_events input + prices.* mirror); add the archival/historical USD asset price requirement (volume_quote / volume_quote_usd / close_usd + oracle_prices). - 0053: Postgres -> local CH mirror across summary/steps/AC; volume_quote per 0058, USD columns left DEFAULT 0 for enrichment (0026); repoint provisioning blocker 0050 -> 0063. - 0028: SDEX cloud-push source is the 0027 local CH (not Postgres), target is Hetzner CH prices.* (not RDS), push via 0052 mTLS client, ReplacingMergeTree(version) idempotency. Replace personal username with "operator" throughout. --- ...RE_local-clickhouse-for-prices-backfill.md | 199 ++++++++++++------ .../backlog/0028_FEATURE_sdex-cloud-push.md | 175 ++++++++------- ..._soroban-amm-backfill-cli-stream-1-impl.md | 93 +++++--- 3 files changed, 300 insertions(+), 167 deletions(-) diff --git a/lore/1-tasks/backlog/0017_FEATURE_local-clickhouse-for-prices-backfill.md b/lore/1-tasks/backlog/0017_FEATURE_local-clickhouse-for-prices-backfill.md index 06f78ab..fce7a1f 100644 --- a/lore/1-tasks/backlog/0017_FEATURE_local-clickhouse-for-prices-backfill.md +++ b/lore/1-tasks/backlog/0017_FEATURE_local-clickhouse-for-prices-backfill.md @@ -3,94 +3,175 @@ id: "0017" title: "Local ClickHouse instance setup and access for prices-api Tranche 1 backfill" type: FEATURE status: backlog -related_adr: ["0001"] -related_tasks: ["0015", "0018"] -tags: [layer-infra, priority-high, effort-small, milestone-M1, infra, backfill, clickhouse, block-explorer] +related_adr: ["0001", "0003", "0004", "0007"] +related_tasks: ["0015", "0018", "0051", "0052", "0053", "0058", "0026", "0061"] +tags: [layer-infra, priority-high, effort-small, milestone-M1, infra, backfill, clickhouse, hetzner, block-explorer] milestone: 1 links: - "../../2-adrs/0001_stream1-clickhouse-sourced-amm-backfill.md" + - "../../2-adrs/0007_live-data-sink-on-shared-hetzner-clickhouse.md" + - "../../../packages/prices-clickhouse/schema/init.sql" - "../archive/0015_RESEARCH_redefine-backfill-with-be-clickhouse-events/notes/S-redesigned-backfill-recommendation.md" - "../../../../soroban-block-explorer/lore/1-tasks/archive/0205_FEATURE_backfill-runner-clickhouse-target-flag.md" - - "../../../../soroban-block-explorer/lore/1-tasks/active/0206_FEATURE_clickhouse-persist-real-inserts/README.md" + - "../../../../soroban-block-explorer/lore/1-tasks/archive/0206_FEATURE_clickhouse-persist-real-inserts/README.md" + - "../../../../soroban-block-explorer/lore/2-adrs/0045_clickhouse-local-backfill-then-mirror-to-hetzner-via-freeze-rsync-attach.md" history: - date: 2026-05-12 status: backlog - who: okarcz + who: operator note: > Spawned from task 0015 closure. ADR 0001 commits Stream 1 to a - local-CH-sourced backfill on a developer laptop (okarcz's). This + local-CH-sourced backfill on the operator's workstation. This task is the operational landing: spin up CH locally, run BE's backfill-runner against it, document the access mechanism that - lets the prices-api Tranche 1 consumer query the laptop's CH. + lets the prices-api Tranche 1 consumer query the local CH. + - date: 2026-06-19 + status: backlog + who: claude + note: > + Refreshed against post-ADR-0007 architecture. Local store is now + ClickHouse mirroring the Hetzner `prices.*` schema (NOT a local + Postgres) — same local-CH→Hetzner shape BE uses. Added the + archival/historical USD asset price requirement (volume_quote / + volume_quote_usd / close_usd + oracle_prices; tasks 0058/0026/0061) + and the mTLS cloud-push path via the 0052 client. Replaced personal + username with "operator". --- # Local ClickHouse instance setup and access for prices-api Tranche 1 backfill ## Summary -Stand up a local ClickHouse instance on okarcz's developer laptop, -populated by running BE's `backfill-runner --target=clickhouse` -(BE task 0205) for the Soroban-activation-onward ledger range -(~ledger 48.5M to current tip, ~8.5M ledgers). Document the access -mechanism that lets the prices-api Tranche 1 consumer (separate -follow-up) query this CH instance to extract Soroban AMM swap events. +Stand up a local ClickHouse instance on the operator's workstation that +serves **two roles** for the Stream 1 historical AMM backfill: + +1. **Raw input** — populated by running BE's `backfill-runner + --target=clickhouse` (BE task 0205) for the Soroban-activation-onward + ledger range (~ledger 48.5M to current tip, ~8.5M ledgers), giving the + backfill CLI a local `soroban_events` source to query. +2. **Local `prices.*` mirror** — the same `prices` schema that lives on + the Hetzner CH box, applied locally from our canonical + `packages/prices-clickhouse/schema/init.sql`. The Stream 1 backfill + (task 0053) aggregates decoded swaps into this local mirror, then runs + a one-shot mTLS push to Hetzner `prices.*`. **This replaces the earlier + "aggregate into a local Postgres" design** — we now mirror Hetzner in a + local ClickHouse, the same local-CH→Hetzner shape BE uses (BE ADR 0045). -Tear-down trigger: once Tranche 1 backfill completes and the -extracted OHLCV trade points are persisted in prices-api PostgreSQL. +Document the access mechanism that lets the backfill CLI query the local +CH. Tear-down trigger: once Tranche 1 backfill completes and the OHLCV +rows are pushed to Hetzner `prices.*`. ## Context -ADR 0001 commits prices-api Stream 1 to local-CH-sourced backfill. -This task is the infrastructure side. It is gated on BE task 0206 -(real CH writer) reaching a quality bar that prices-api can consume — -BE task 0117 (local backfill benchmark) is the proxy signal. +ADR 0001 commits prices-api Stream 1 to local-CH-sourced backfill; this +task is the infrastructure side. Two architectural shifts since the +2026-05-12 draft change the shape: + +- **ADR 0007 (accepted 2026-05-20)** — the live data sink is BE's shared + Hetzner ClickHouse, **not** prices-owned RDS Postgres. All OHLCV / oracle + / asset / backfill-progress data lives in the `prices` database on the + Hetzner box. The backfill therefore stages into a *local CH mirror* of + `prices.*` and pushes to Hetzner — no Postgres anywhere in the path. +- **ADR 0003/0004** — OHLCV rows are per-source + (`source ∈ {soroswap, aquarius, phoenix, sdex, …}`) on + `ReplacingMergeTree(version)`, PK `(timestamp, asset_id, quote_asset_id, + source)`. Cross-source merge happens at read time. + +The `prices.*` schema is already authored and shipped (tasks 0059/0060/0061) +in `packages/prices-clickhouse/schema/` (`init.sql` + `seed.sql` + +`rollups.sql` + `views.sql`), applied locally by the +`packages/prices-clickhouse-init` binary — the exact same artifact that +applies to Hetzner. The cloud push uses the shared mTLS client from task +0052 (`packages/prices-clickhouse`, `aws-mtls` feature) to Caddy:443. + +The input side is gated on BE task 0206 (real CH writer) — **shipped** +(archived), as are BE 0205 (the `--target=clickhouse` flag) and BE 0117 +(the local backfill benchmark, the storage/throughput proxy signal). + +### Archival / historical USD asset price requirement + +The `prices.*` OHLCV tables now carry USD columns that the historical +backfill must account for: + +- `volume_quote Decimal(38,14)` — native quote-asset volume, restored by + task 0058. The backfill (0053) **must populate this** per bucket + (`Σ |quote_amount|`); it is the input to USD enrichment. +- `volume_quote_usd Decimal(38,14) DEFAULT 0` and + `close_usd Decimal(38,14) DEFAULT 0` (task 0061) — filled by enrichment + as `oracle_usd × volume_quote` and `oracle_usd × close`. +- `oracle_prices` table — the USD reference series enrichment ASOF-joins + against (tasks 0024/0026). + +For a **historical** backfill this needs **archival USD asset prices**: +USD references for *past* timestamps back across the Soroban range, not +just live oracle ticks. Whether `oracle_prices` actually carries history +that far back — or whether an archival USD price source must be loaded +into the local mirror before enrichment can run — is an **open question** +(see Notes / spawn point). The local mirror must at minimum include the +`oracle_prices` table and the USD columns so the enrichment path exists +end-to-end; sourcing the archival USD series is tracked with 0026. ## Implementation -- Docker compose service mirroring BE's task 0204 compose definition - (CH version, port mapping, healthcheck, volume mount). -- Run BE's `backfill-runner --target=clickhouse` against - `` for the Soroban range. Estimate disk: BE's task - 0117 benchmark is the input here. -- Decide access mechanism for the prices-api consumer: - - **Option a:** SSH tunnel from the consumer's host to laptop CH - HTTP port (8123). Lowest friction, requires laptop online. - - **Option b:** Cloudflare tunnel / tailscale-exposed CH for - multi-machine prices-api workflow. - - **Option c:** Read-only CH snapshot exported to S3, consumer - reads from S3 via clickhouse-local. -- Document the chosen mechanism in `lore/3-wiki/` (or inline in - prices-api repo docs). -- Capture the actual disk usage, completion time, and any - population errors as a closing note for ADR 0001's "consequences" - section. +- Docker-compose CH service mirroring BE's compose definition (CH version, + port mapping `127.0.0.1:8123`, healthcheck, volume mount). +- Apply **both** schemas to the local instance: + - BE's `init.sql` (`soroban_events` + friends) so `backfill-runner` has + its target tables — the raw input side. + - Our `packages/prices-clickhouse/schema/{init,seed,rollups,views}.sql` + via `prices-clickhouse-init` so the `prices.*` mirror matches Hetzner + exactly (USD columns + `oracle_prices` + per-source OHLCV + + `backfill_progress` seeded `soroban_amm`). +- Run BE's `backfill-runner --target=clickhouse` against the public ledger + archive for the Soroban range to populate `soroban_events`. Disk estimate: + BE task 0117 benchmark is the input (~100–150 GB order-of-magnitude for + the Soroban-only range; the `prices.*` mirror itself is tiny, ~0.45 GB/yr). +- Access mechanism for the backfill CLI: local plaintext HTTP + (`localhost:8123`) on the workstation — no mTLS for the local hop; + mTLS is only the Hetzner push (task 0052 client), not local reads. +- Capture actual disk usage, completion time, and any population errors as + a closing note for ADR 0001's "consequences" section. ## Acceptance Criteria -- [ ] Local CH instance running on okarcz's laptop with BE schema - applied (idempotent `init.sql` from BE task 0204). -- [ ] BE `backfill-runner --target=clickhouse` completes against - the Soroban-activation-onward ledger range with no - `parts_to_throw_insert` errors and zero parser-data loss - (verified against BE task 0206's coverage contract). -- [ ] Access mechanism chosen and documented; prices-api Tranche 1 - consumer (separate follow-up) can run a smoke query +- [ ] Local CH instance running on the operator's workstation with **BE's + schema** applied (idempotent `init.sql` from BE task 0204) for the + `soroban_events` input. +- [ ] Local CH also hosts the **`prices.*` mirror** of the Hetzner schema, + applied from `packages/prices-clickhouse/schema/*` via + `prices-clickhouse-init` — same artifact as the Hetzner apply, + including the USD columns (`volume_quote`, `volume_quote_usd`, + `close_usd`) and the `oracle_prices` table. +- [ ] BE `backfill-runner --target=clickhouse` completes against the + Soroban-activation-onward ledger range with no `parts_to_throw_insert` + errors and zero parser-data loss (verified against BE task 0206's + coverage contract). +- [ ] Access mechanism documented; the backfill CLI can run a smoke query (`SELECT count() FROM soroban_events WHERE signature = 'swap'`) - against the laptop CH. -- [ ] Disk usage, run time, and any anomalies captured as a closing - note appended to ADR 0001 or a `notes/G-backfill-run-log.md`. -- [ ] Tear-down checklist documented (when to nuke the volume, - how to do it cleanly). + against the local CH and write a smoke row into the local + `prices.price_ohlcv_1m` mirror. +- [ ] Archival USD price availability assessed and recorded: whether + `oracle_prices` covers the historical range, or an archival USD price + source is required (hand-off to 0026 if so). +- [ ] Disk usage, run time, and any anomalies captured as a closing note + appended to ADR 0001 or a `notes/G-backfill-run-log.md`. +- [ ] Tear-down checklist documented (when to nuke the volume, how to do + it cleanly). ## Notes -- BE's task 0206 must be merged before this task can run to completion. - If it is still active when this task starts, coordinate with BE - (fmazur) on whether a development-grade CH writer is good enough - for prices-api's Tranche 1 consumption, or whether we wait for - 0206's full landing. -- Storage estimate: BE has not published a hard number for the - Soroban-activation-onward window. Order-of-magnitude estimate - from extrapolating ADR 0044's ~550 GB full-mainnet backfill is - ~100–150 GB for the Soroban-only range; verify against BE task - 0117 benchmark output. +- **Local-only / prepare-only constraint**: the whole flow runs against + local Docker (CH on localhost); the only public network access is the + read-only `--no-sign-request` ledger fetch by `backfill-runner`. The + Hetzner push (0053) is a separate, approval-gated step — not part of + this setup task. +- BE's task 0206 must be merged before this task can run to completion — + **done** (archived). If a future BE writer change lands, re-verify + against 0206's coverage contract. +- **Stale references to fix elsewhere** (out of scope here, flag only): + task 0053 and `docs/database-schema/database-schema-overview.md` still + describe the backfill as aggregating into a *local Postgres*. Those need + the same local-CH-mirror correction this task adopts. +- Storage estimate for `soroban_events`: BE has not published a hard number + for the Soroban-activation-onward window; order-of-magnitude ~100–150 GB, + verify against BE task 0117 benchmark output. diff --git a/lore/1-tasks/backlog/0028_FEATURE_sdex-cloud-push.md b/lore/1-tasks/backlog/0028_FEATURE_sdex-cloud-push.md index a85b82d..cee86f4 100644 --- a/lore/1-tasks/backlog/0028_FEATURE_sdex-cloud-push.md +++ b/lore/1-tasks/backlog/0028_FEATURE_sdex-cloud-push.md @@ -1,133 +1,158 @@ --- id: "0028" -title: "SDEX cloud-push — stream local price_ohlcv + assets to cloud RDS after backfill" +title: "SDEX cloud-push — stream local price_ohlcv + assets to Hetzner ClickHouse prices.* after backfill" type: FEATURE status: backlog -related_adr: ["0003", "0005"] -related_tasks: ["0011", "0012", "0027"] -tags: [layer-indexing, priority-medium, effort-medium, milestone-M1, cloud-push, clickhouse, hetzner, postgres, sdex, stream-2, rust] +related_adr: ["0003", "0005", "0007"] +related_tasks: ["0012", "0027", "0052", "0063", "0051"] +tags: [layer-indexing, priority-medium, effort-medium, milestone-M1, cloud-push, clickhouse, hetzner, mtls, sdex, stream-2, rust] milestone: 1 links: - "../active/0012_FEATURE_design-prices-owned-backfill-fargate/notes/G-sdex-backfill-local-design.md" - "../../2-adrs/0005_stream2-sdex-local-workstation-backfill.md" - "../../2-adrs/0003_price-ohlcv-pk-includes-quote-asset-id.md" + - "../../2-adrs/0007_live-data-sink-on-shared-hetzner-clickhouse.md" - "../../../../soroban-block-explorer/lore/2-adrs/0040_multi-laptop-backfill-snapshot-merge-hazards.md" history: - date: 2026-05-14 status: backlog - who: okarcz + who: operator note: > Spawned from task 0012 future work alongside the ADR 0005 pivot. Implements the post-backfill cloud-push step sketched - in 0012 G-note §11. Blocked on task 0011 (cloud RDS exists) - and task 0027 (local backfill data exists). + in 0012 G-note §11. Blocked on task 0027 (local backfill data + exists) and the Hetzner CH target being provisioned. + - date: 2026-06-19 + status: backlog + who: claude + note: > + Refreshed to the post-ADR-0007 reality: source is the local + ClickHouse from 0027 (not local Postgres), target is the shared + Hetzner CH `prices.*` (not cloud RDS), push uses the 0052 mTLS + client, idempotency is ReplacingMergeTree(version) (not Postgres + ON CONFLICT / xmax). Repointed RDS blocker 0011 → provisioning + 0063 + schema 0051. Replaced personal username with "operator". --- -# SDEX cloud-push — stream local `price_ohlcv` + `assets` to cloud RDS +# SDEX cloud-push — stream local `price_ohlcv` + `assets` to Hetzner CH `prices.*` ## Summary Lands a small Rust CLI (`sdex-cloud-push`) that streams the finalised -prices tables (`price_ohlcv` + `assets`) from the operator's local -Postgres (task 0027 output) to the cloud RDS instance (task 0011 -output). The push is idempotent and re-runnable; it resolves -surrogate-id collisions on `assets` via natural-key matching so that -the cloud DB can already contain rows written by the live-ingestion -Lambda. +prices tables (`price_ohlcv_1m` + `assets`) from the operator's local +ClickHouse (task 0027 output) to the shared Hetzner ClickHouse `prices.*` +database via the 0052 mTLS client (Caddy:443). The push is idempotent and +re-runnable; it resolves surrogate-id collisions on `assets` via +natural-key matching so the cloud DB can already contain rows written by +the live-ingestion Lambda (0038). ## Context -ADR 0005 (supersedes ADR 0002) commits Stream 2 SDEX backfill to a -local workstation pattern; the cloud is exposed to API consumers via -a separate push step described in §11 of task 0012's design G-note. -This task is that push step. +ADR 0005 (supersedes ADR 0002) commits Stream 2 SDEX backfill to a local +workstation pattern; the data is exposed to API consumers via a separate +push step described in §11 of task 0012's design G-note. This task is that +push step. ADR 0007 then pivoted the live sink from RDS Postgres to BE's +shared Hetzner ClickHouse — so both the local store (0027, already shipped +on local ClickHouse) and the cloud target are now ClickHouse, and the push +is a CH→CH copy over mTLS rather than a Postgres→RDS UPSERT. -The shape mirrors a narrowed version of BE's `crates/db-merge` -(BE ADR 0040): natural-key remap on the only FK-source table -(`assets`), then a batched UPSERT into the downstream -`price_ohlcv` keyed by `(timestamp, asset_id, quote_asset_id, -granularity)` per ADR 0003. Two tables in scope, vs. BE's twelve — -significantly simpler. +The shape mirrors a narrowed version of BE's `crates/db-merge` (BE ADR +0040): natural-key remap on the only FK-source table (`assets`), then a +batched INSERT into `price_ohlcv_1m` keyed by `(timestamp, asset_id, +quote_asset_id, source)` per ADR 0003/0004. Idempotency comes from +`ReplacingMergeTree(version)` collapse, not row-level UPSERT. Blocked on: -- **Task 0011** — Cloud RDS must exist (CDK bootstrap landing). -- **Task 0027** — Local backfill must have produced data to push. +- **Task 0027** — local backfill must have produced data to push (done). +- **Task 0063** — the `prices` database, scoped users, and per-env mTLS + cert/endpoint must be provisioned on the Hetzner box. +- **Task 0051** — the target `prices.*` schema must be applied. +- **Task 0052** — the shared mTLS CH client used for the cloud hop. ## Implementation -1. **`sdex-cloud-push` bin crate** alongside `sdex-backfill` in the - Cargo workspace established by task 0027. Reuses the `db` lib - crate's sqlx pool. +1. **`sdex-cloud-push` bin crate** alongside `sdex-backfill` in the Cargo + workspace established by task 0027. Reads the local plaintext CH with + the raw `clickhouse` crate; writes the Hetzner target with the 0052 + shared mTLS client (`packages/prices-clickhouse`, `aws-mtls` feature). -2. **CLI flags** per 0012 G-note §11.1: +2. **CLI flags** (CH-flavoured update of 0012 G-note §11.1): ```bash sdex-cloud-push \ - --source-url postgres://...local... \ - --target-url postgres://...cloud... \ - --tables price_ohlcv,assets \ - --since-ledger # optional; defaults to all + --source-ch-url http://localhost:8123 # local CH (0027 output) \ + --target-db prices # Hetzner prices.* via Caddy:443 \ + --tables price_ohlcv_1m,assets \ + --since-ledger # optional; defaults to all ``` - - `--source-url` reads `DATABASE_URL_LOCAL` env. - - `--target-url` reads `DATABASE_URL_CLOUD` env. - - `--tables` defaults to `assets,price_ohlcv`. - - `--since-ledger` filters local rows by `MIN(ledger)` derived - from `price_ohlcv` row's source ledger range. Optional. - -3. **Assets remap** per 0012 G-note §11.2: - - For each local `assets` row, look up the cloud row by its - natural key (the same unique constraint columns the - live-ingestion Lambda upserts on). + - `--source-ch-url` reads the local CH (no mTLS). + - The target endpoint + per-env cert come from the 0052 client + (`MTLS_SECRET_NAME` → AWS Secrets Manager `{cert,key,ca}` bundle). + - `--tables` defaults to `assets,price_ohlcv_1m`. + - `--since-ledger` filters local rows by the source ledger range. + +3. **Assets remap** per 0012 G-note §11.2 (still required because 0027 + assigns local surrogate ids `max+1`, which can diverge from cloud ids + the live Lambda already wrote): + - For each local `assets` row, look up the cloud row by its natural key + (the same unique columns the live-ingestion path keys on). - Build a `local_id → cloud_id` map in memory. - - INSERT new assets into cloud, capturing the returned `id`s - into the map. + - INSERT genuinely-new assets into the cloud table. -4. **`price_ohlcv` push:** - - Stream rows from local in batches (5-10k rows / round-trip). +4. **`price_ohlcv_1m` push:** + - Stream rows from local CH in batches (5–10k rows / round-trip). - Rewrite `asset_id` and `quote_asset_id` via the map from step 3. - - `INSERT … ON CONFLICT (timestamp, asset_id, quote_asset_id, granularity) - DO UPDATE SET …` matching the whole-row replacement contract - from task 0022 decode-and-bucket §5.4. + - INSERT into Hetzner `prices.price_ohlcv_1m`. `version` (per ADR 0004) + carries the dedup key; `ReplacingMergeTree(version)` collapses any + overlap with live-ingested or previously-pushed rows on background + merge. No `ON CONFLICT` — CH has no row-level UPSERT. -5. **Idempotency:** the tool must be safely re-runnable. A re-run - should be a no-op when local and cloud are in sync. Test: - run the push twice in a row; second run produces no row changes - in cloud (verifiable via `xmax` or row-count diff). +5. **Idempotency:** the tool must be safely re-runnable. A re-run is a + no-op once local and cloud are in sync — same `version`s collapse. + Test: run the push twice; `SELECT count() … FINAL` is identical before + and after the second run. 6. **Observability:** mirror `sdex-backfill`'s stdout-JSON tracing pattern. Stable event names: - - `push_started`, `assets_remapped` (with counts: new vs existing), + - `push_started`, `assets_remapped` (counts: new vs existing), - `price_ohlcv_batch` (per-batch summary), - `push_complete` (total counts + duration). 7. **Runbook section** added to `docs/runbooks/backfill-sdex.md` - (created by task 0027): Cloud push step, when to run, what to - verify post-push. + (created by task 0027): cloud-push step, when to run, what to verify + post-push. -8. **Smoke test:** spin up a local "cloud stand-in" Postgres via - docker-compose, run `sdex-backfill` against a 10k-ledger range - to populate local, then run `sdex-cloud-push` against the - stand-in. Diff `SELECT COUNT(*), MIN(timestamp), MAX(timestamp) - FROM price_ohlcv` between source and target — should match. +8. **Smoke test:** spin up a local "cloud stand-in" ClickHouse via + docker-compose (plaintext, no mTLS), run `sdex-backfill` against a + 10k-ledger range to populate the source CH, then run `sdex-cloud-push` + against the stand-in. Diff `SELECT count(), min(timestamp), + max(timestamp) FROM price_ohlcv_1m FINAL` between source and target — + should match. ## Acceptance Criteria - [ ] `sdex-cloud-push` bin crate added to the workspace. -- [ ] `--source-url` / `--target-url` / `--tables` / `--since-ledger` - CLI flags implemented per 0012 G-note §11.1. +- [ ] `--source-ch-url` / `--target-db` / `--tables` / `--since-ledger` + CLI flags implemented; target cert/endpoint resolved via the 0052 + client (`MTLS_SECRET_NAME` bundle). - [ ] `assets` natural-key remap correctly handles three cases: - (a) new row → INSERT + capture id, (b) existing row → reuse - cloud id, (c) re-run after partial failure → idempotent. -- [ ] `price_ohlcv` batched UPSERT preserves whole-row replacement - semantics (task 0022 §5.4) on the new PK shape (ADR 0003). -- [ ] Smoke test passes: backfill 10k-ledger range to local, push to - stand-in, row counts and aggregates match between source and target. -- [ ] Re-running the push on synced source+target is a no-op - (verified by post-run row-count diff). + (a) new row → INSERT + capture id, (b) existing row → reuse cloud + id, (c) re-run after partial failure → idempotent. +- [ ] `price_ohlcv_1m` batched INSERT lands in Hetzner `prices.*` with + the correct PK/`version` shape (ADR 0003/0004); duplicates collapse + under `ReplacingMergeTree(version)`. +- [ ] Smoke test passes: backfill 10k-ledger range to local CH, push to + the CH stand-in, row counts and aggregates match between source and + target (`… FINAL`). +- [ ] Re-running the push on a synced source+target is a no-op (verified + by `count() FINAL` diff). - [ ] Runbook section in `docs/runbooks/backfill-sdex.md` covers first-push and subsequent-push operator workflows. ## Blocked by -- **0011** — Cloud RDS must exist (CDK bootstrap landing). -- **0027** — Local backfill must produce data to push. +- **0027** — local backfill must produce data to push (done). +- **0063** — Hetzner `prices` DB, scoped users, and per-env mTLS + cert/endpoint provisioned (supersedes the old RDS dependency on 0011). +- **0051** — target `prices.*` schema applied. +- **0052** — shared mTLS CH client for the cloud hop. diff --git a/lore/1-tasks/backlog/0053_FEATURE_soroban-amm-backfill-cli-stream-1-impl.md b/lore/1-tasks/backlog/0053_FEATURE_soroban-amm-backfill-cli-stream-1-impl.md index 712bda4..ad870a0 100644 --- a/lore/1-tasks/backlog/0053_FEATURE_soroban-amm-backfill-cli-stream-1-impl.md +++ b/lore/1-tasks/backlog/0053_FEATURE_soroban-amm-backfill-cli-stream-1-impl.md @@ -4,7 +4,7 @@ title: "Soroban AMM Backfill CLI (`soroban-amm-backfill`) — Stream 1 implement type: FEATURE status: backlog related_adr: ["0001", "0003", "0004", "0007"] -related_tasks: ["0017", "0034", "0037", "0048", "0052", "0051"] +related_tasks: ["0017", "0034", "0037", "0048", "0052", "0051", "0058", "0026"] tags: [layer-indexing, priority-high, effort-large, milestone-M1, stream-1, rust, cli, workstation, clickhouse, soroban, amm, soroswap, aquarius, phoenix] milestone: 1 links: @@ -21,16 +21,27 @@ links: history: - date: 2026-05-21 status: backlog - who: okarcz + who: operator note: > Spawned during Tranche 1 task-set creation. ADR 0001 commits Stream 1 to a local-CH-sourced workstation CLI; 0017 covers the CH instance setup; 0037 covers the dispatch kernel; 0034 covers Phoenix WASM tolerance; 0048 carries the decoder spec. No task owns the actual `soroban-amm-backfill` binary — - decode loop, bucket to 1-min OHLCV, write to local Postgres, - run the one-shot completion push to Hetzner CH. This task - fills the gap. + decode loop, bucket to 1-min OHLCV, write to the local prices.* + CH mirror, run the one-shot completion push to Hetzner CH. This + task fills the gap. + - date: 2026-06-19 + status: backlog + who: claude + note: > + Corrected the local staging store from a local Postgres to a + local ClickHouse mirror of the Hetzner `prices.*` schema (same + local-CH→Hetzner shape BE uses) — Summary, Context, Steps + 1/5/6/7/8, acceptance criteria, deps. Added volume_quote (0058) + population + USD columns left DEFAULT 0 for the enrichment pass + (0026). Repointed provisioning blocker 0050→0063. Replaced + personal username with "operator". --- # Soroban AMM Backfill CLI (`soroban-amm-backfill`) @@ -42,8 +53,10 @@ binary that reads the operator's local ClickHouse `soroban_events` (populated upfront by BE's `backfill-runner --target=clickhouse` per 0017), decodes Soroswap / Aquarius / Phoenix swap events via the `stellar-xdr` crate using the 0048 decoder spec, buckets the -results into per-source 1-min OHLCV rows in a local Postgres, and -runs a one-shot completion push to Hetzner ClickHouse `prices.*` +results into per-source 1-min OHLCV rows in a **local ClickHouse +mirror of the Hetzner `prices.*` schema** (the same local-CH→Hetzner +shape BE uses, not a local Postgres), and runs a one-shot completion +push to Hetzner ClickHouse `prices.*` that lands the historical rows and flips `prices.backfill_progress.soroban_amm` to `status='completed'`. @@ -65,10 +78,13 @@ on the operator's workstation. The runtime flow is: 3. Each event's `topics_xdr` + `data_xdr` are decoded into a `TradeTick` per the 0048 decoder spec. 4. Ticks are bucketed to 1-min OHLCV per ADR 0003 PK shape - (timestamp, asset_id, quote_asset_id, source) and written to - a local Postgres (Docker) on the workstation. -5. On completion, the cloud-push step streams the local - `price_ohlcv_*` rows into Hetzner CH `prices.*` via the 0052 + (timestamp, asset_id, quote_asset_id, source) and INSERTed into + a local ClickHouse mirror of `prices.*` (Docker — the same local + CH that holds `soroban_events`; schema applied from + `packages/prices-clickhouse/schema/*` per 0017). `version` is set + per ADR 0004 so ReplacingMergeTree collapses re-runs. +5. On completion, the cloud-push step copies the local `prices.*` + mirror rows into Hetzner CH `prices.*` via the 0052 shared mTLS client, then flips `backfill_progress.soroban_amm` to `status='completed'`. The local CH instance is torn down post-push. @@ -83,11 +99,11 @@ Total wall-clock: a few hours, dominated by 0017's Add `packages/soroban-amm-backfill/` as a binary crate. Dependencies: -- `clickhouse` (raw, not the 0052 wrapper — reads from the local - plaintext CH, no mTLS). +- `clickhouse` (raw, not the 0052 wrapper — reads `soroban_events` + from and writes the `prices.*` mirror to the local plaintext CH, + no mTLS). - 0052 shared CH client for the cloud-push step (mTLS to Caddy). - `stellar-xdr` — official SDF Rust XDR types for ScVal decoding. -- `sqlx` (Postgres) — local Postgres sink. - `clap` — CLI argument parsing. - `tracing` + `tracing-subscriber` — structured logs. @@ -96,7 +112,7 @@ CLI surface: ``` soroban-amm-backfill --local-ch-url URL (default http://localhost:8123) - --local-pg-url URL (default postgres://localhost/prices_backfill) + --mirror-db NAME (local prices.* mirror DB in the local CH; default 'prices') --start-ledger SEQ --end-ledger SEQ --venues VENUES (comma-separated: soroswap,aquarius,phoenix; default all) @@ -158,19 +174,26 @@ For each `TradeTick`: 2. Compute the per-tick price and per-tick `volume_quote / volume_base` per 0048 §3. 3. Group ticks into the `(floor_minute(closed_at), asset_id, - quote_asset_id, source)` bucket and merge into local Postgres - `price_ohlcv_1m` using ADR 0004's incremental-merge semantics - (preserve open, overwrite close, GREATEST(high), LEAST(low), - sum volumes + trade_count, recompute vwap). -4. Pre-roll higher granularities (15m, 1h, 4h, 1d, 1w, 1M) in - the same pass so the cloud-push step writes already-aggregated - rows directly to the target tables (matches design doc §3.2 + quote_asset_id, source)` bucket and finalize each bucket + in-memory per ADR 0004 (preserve open, overwrite close, + GREATEST(high), LEAST(low), sum volumes + trade_count, recompute + vwap). Populate `volume_quote` with native quote-asset volume + (per 0058); leave `volume_quote_usd` and `close_usd` at their + `DEFAULT 0` — the USD enrichment pass fills them later from + `oracle_prices` (0026; see archival-USD open question in 0017). + INSERT the finalized rows into the local CH + `prices.price_ohlcv_1m` mirror; ReplacingMergeTree(version) + collapses any re-run. +4. Pre-roll higher granularities (15m, 1h, 4h, 1d, 1w, 1M) in the + same pass (math mirrors the `rollups.sql` MV chain) and INSERT + them into their local CH mirror tables, so the cloud-push step + copies already-aggregated rows (matches design doc §3.2 "Backfill scripts produce already-aggregated rows"). ### Step 6: Cloud-push (`push` subcommand) -Streams all `price_ohlcv_*` rows + new `assets` rows from local -Postgres to Hetzner CH `prices.*` via the 0052 mTLS client: +Copies all `price_ohlcv_*` rows + new `assets` rows from the local +CH `prices.*` mirror to Hetzner CH `prices.*` via the 0052 mTLS client: 1. Open a CH connection per granularity table. 2. Stream rows in chunks (e.g. 10k rows per INSERT) using the @@ -188,9 +211,10 @@ Postgres to Hetzner CH `prices.*` via the 0052 mTLS client: - Unit: decoder paths covered by 0037 / 0048; here, test the bucketing + pre-roll math for at least two venue-pair scenarios. -- Integration: end-to-end against a Docker CH + Docker Postgres, - seeded with a small recorded `soroban_events` fixture; assert - the produced 1-min rows match a hand-computed gold file. +- Integration: end-to-end against a Docker CH holding both the + `soroban_events` input and the `prices.*` mirror, seeded with a + small recorded `soroban_events` fixture; assert the produced + 1-min rows match a hand-computed gold file. - Smoke: cloud-push step against a local Docker CH (no mTLS) + a stubbed `prices.backfill_progress` row. @@ -201,7 +225,7 @@ A short `RUNBOOK.md` documenting the end-to-end run sequence: 1. Run 0017's CH prep (`backfill-runner --target=clickhouse`). 2. Apply 0051's schema to Hetzner CH `prices` if not done. 3. Run `soroban-amm-backfill decode --start-ledger=… --end-ledger=…`. -4. Inspect local Postgres row counts; spot-check via SQL. +4. Inspect local CH `prices.*` mirror row counts; spot-check via SQL. 5. Run `soroban-amm-backfill push --target-ch-url=…`. 6. Confirm `GET /backfill/status` shows `soroban_amm.status: "completed"` (0055). @@ -210,11 +234,13 @@ A short `RUNBOOK.md` documenting the end-to-end run sequence: ## Acceptance Criteria - [ ] `packages/soroban-amm-backfill` binary builds and runs - end-to-end against a Docker local CH + Docker local Postgres + end-to-end against a Docker local CH (soroban_events input + + prices.* mirror) - [ ] Decoder paths produce `TradeTick` records that match the 0048 spec gold-file fixtures for all three venues -- [ ] Pre-rolled higher-granularity rows in local Postgres are - consistent with the 1-min rows under the §3.2 MV semantics +- [ ] Pre-rolled higher-granularity rows in the local CH prices.* + mirror are consistent with the 1-min rows under the §3.2 MV + semantics (`argMin(open)`, `max(high)`, `min(low)`, `argMax(close)`, `sum(volumes)`) - [ ] `push` subcommand streams local rows to Hetzner CH @@ -245,8 +271,9 @@ A short `RUNBOOK.md` documenting the end-to-end run sequence: workflow can run before 0052 lands. - **0051** — the target Hetzner CH `prices.*` schema must exist before the push lands data. -- **0050** — the Hetzner CH credentials and endpoint must be - provisioned before the push lands. +- **0063** — the `prices` database, scoped users, and per-env mTLS + cert/endpoint must be provisioned on the Hetzner box before the + push lands (0050 narrowed to SNS only). ## Out of scope From 10bcf2c4a5bc705ae0b17c7c6e906e11fc899e96 Mon Sep 17 00:00:00 2001 From: karczuRF Date: Fri, 19 Jun 2026 10:53:27 +0200 Subject: [PATCH 2/3] docs(lore-0063): reconcile mTLS secret refs to single bundle Update the stale two-secret cert/key assumption to the decided shape: a single {cert,key,ca} JSON bundle secret per identity per env, named by MTLS_SECRET_NAME (0063 Design Decision #3 / 0052's client). - 0052: the cert-load step reads one bundle secret, not two. - 0050: prices-api receives one {cert,key,ca} bundle per env. --- .../README.md | 9 +++++---- .../0052_FEATURE_clickhouse-mtls-client-shared-crate.md | 5 +++-- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/lore/1-tasks/backlog/0050_FEATURE_be-side-prep-sns-mtls-prices-db-provisioning/README.md b/lore/1-tasks/backlog/0050_FEATURE_be-side-prep-sns-mtls-prices-db-provisioning/README.md index 80d2fea..8b5b0e1 100644 --- a/lore/1-tasks/backlog/0050_FEATURE_be-side-prep-sns-mtls-prices-db-provisioning/README.md +++ b/lore/1-tasks/backlog/0050_FEATURE_be-side-prep-sns-mtls-prices-db-provisioning/README.md @@ -87,10 +87,11 @@ to land before any prices-api ingestion or schema work can begin: change. 2. **mTLS client cert issuance** (§5.2, §7, ADR 0007 §3.5): BE runs the self-signed CA and the per-AWS-service issuance - script. Prices-api receives one client cert + key per env - (dev / staging / prod), to be stored in AWS Secrets Manager - (2 secrets per env). 1-year manual rotation cadence (BE - Cluster C agreement). The issuance script invocation is the + script. Prices-api receives one client `{cert,key,ca}` per env + (dev / staging / prod), to be stored in AWS Secrets Manager as + a single JSON bundle secret per identity (named by + `MTLS_SECRET_NAME`, per task 0063). 1-year manual rotation + cadence (BE Cluster C agreement). The issuance script invocation is the only BE-side operator step per cert lifecycle. 3. **`prices` database + user + quota + profile** (§3 intro, §11.1, ADR 0007 §3.5): BE creates the empty `prices` database diff --git a/lore/1-tasks/blocked/0052_FEATURE_clickhouse-mtls-client-shared-crate.md b/lore/1-tasks/blocked/0052_FEATURE_clickhouse-mtls-client-shared-crate.md index bca10ad..5cd320d 100644 --- a/lore/1-tasks/blocked/0052_FEATURE_clickhouse-mtls-client-shared-crate.md +++ b/lore/1-tasks/blocked/0052_FEATURE_clickhouse-mtls-client-shared-crate.md @@ -95,8 +95,9 @@ dep is pinned at `0.13` with only the `inserter` feature — **no TLS**. What is missing for talking to the production Hetzner box (ADR 0007 §3.5 / design §5.2 "mTLS write path"): -1. Load the per-env client cert + key from AWS Secrets Manager - (two secrets per env per ADR 0007 §3.5). +1. Load the per-env client `{cert,key,ca}` from AWS Secrets Manager + (a single JSON bundle secret per identity, named by + `MTLS_SECRET_NAME`, per ADR 0007 §3.5 / task 0063). 2. Establish a warm TLS connection during Lambda global init so the ~80–130 ms cross-cloud RTT for TLS handshake is amortised across invocations. From 64ce75607600d82bae3d1512d280ede528a9a82a Mon Sep 17 00:00:00 2001 From: karczuRF Date: Fri, 19 Jun 2026 10:53:39 +0200 Subject: [PATCH 3/3] docs: align overview docs to CH-mirror backfill and single-bundle mTLS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sweep both architecture overviews for staleness contradicting shipped reality (0027 shipped local ClickHouse; ADR 0007 pivoted off RDS). - Backfill sink: local Postgres -> local ClickHouse across the §7 two-stream section, data-flow diagrams, ASCII flow boxes, the workstation subgraph, resumability prose, and diagram legends. - Credentials: "2 secrets per env" -> single {cert,key,ca} JSON bundle secret per identity (named by MTLS_SECRET_NAME). Intentional historical notes ("prior Postgres design", "RDS removed", changelog rows) left intact. --- .../database-schema-overview.md | 103 +++++++++--------- docs/prices-api-general-overview.md | 32 +++--- 2 files changed, 69 insertions(+), 66 deletions(-) diff --git a/docs/database-schema/database-schema-overview.md b/docs/database-schema/database-schema-overview.md index 757d210..7dfefca 100644 --- a/docs/database-schema/database-schema-overview.md +++ b/docs/database-schema/database-schema-overview.md @@ -115,14 +115,15 @@ flowchart LR AD -->|HTTPS-mTLS| CH Cleanup -->|HTTPS-mTLS| CH - BFsdex[SDEX Backfill
Local Rust CLI on workstation
ADR 0005] -->|local Postgres| LPG[(workstation Postgres)] - BFamm[Soroban AMM Backfill
Local Rust CLI on workstation
ADR 0001] -->|local Postgres| LPG - LPG -->|sdex-cloud-push,
HTTPS-mTLS| CH - LCH[(local ClickHouse
populated by BE backfill-runner
Docker, workstation)] --> BFamm + BFsdex[SDEX Backfill
Local Rust CLI on workstation
ADR 0005] -->|write OHLCV
to local ClickHouse| LCHsdex[(local ClickHouse
SDEX backfill
Docker, workstation)] + LCHsdex -->|sdex-cloud-push,
HTTPS-mTLS| CH + LCH[(local ClickHouse
soroban_events input + prices.* mirror
Docker, workstation)] -->|read soroban_events| BFamm[Soroban AMM Backfill
Local Rust CLI on workstation
ADR 0001] + BFamm -->|write per-source OHLCV
to local prices.* mirror| LCH + LCH -->|amm-cloud-push,
HTTPS-mTLS| CH classDef store fill:#e8f1ff,stroke:#3a6ea5,stroke-width:2px; classDef external fill:#f3e8ff,stroke:#6a3a8a,stroke-width:1px; - class CH,CH15,CH1h,CHRollups,S3,LPG,LCH store; + class CH,CH15,CH1h,CHRollups,S3,LCHsdex,LCH store; class SNS external; ``` @@ -138,16 +139,16 @@ are pushed to the Hetzner cluster via separate post-backfill tools. ## 2. Database Tech Stack -| Component | Technology | -| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Database engine | **ClickHouse** on BE's shared Hetzner cluster (separate `prices` database, ADR 0007) | -| Storage engines | `ReplacingMergeTree(version)` for OHLCV; `ReplacingMergeTree(updated_at)` for `current_prices` / `assets` / `backfill_progress`; `ReplacingMergeTree` for `oracle_prices` | -| Rollups | Chain of CH materialised views: `price_ohlcv_1m → _15m → _1h → _4h → _1d → _1w → _1M` (replaces the OHLCV Rollup Lambda) | -| Partitioning | `PARTITION BY toYYYYMM(timestamp)` on every OHLCV/oracle table; cleanup via `ALTER TABLE … DROP PARTITION` | -| Database client (Rust) | [`clickhouse`](https://crates.io/crates/clickhouse) — async, native protocol over HTTPS-mTLS | -| Schema tooling | Plain SQL DDL applied by the prices-api schema applier on first deploy; prices-api owns `prices.*` migrations unilaterally (ADR 0007 §3.7) | -| Hosting | BE-managed Hetzner box behind Caddy:443; cross-cloud (AWS → Hetzner) hop, ~80–130 ms RTT mitigated by warm connection reuse and batched per-ledger writes | -| Credentials | AWS Secrets Manager — per-env client cert + key for Caddy mTLS (2 secrets per env) | +| Component | Technology | +| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Database engine | **ClickHouse** on BE's shared Hetzner cluster (separate `prices` database, ADR 0007) | +| Storage engines | `ReplacingMergeTree(version)` for OHLCV; `ReplacingMergeTree(updated_at)` for `current_prices` / `assets` / `backfill_progress`; `ReplacingMergeTree` for `oracle_prices` | +| Rollups | Chain of CH materialised views: `price_ohlcv_1m → _15m → _1h → _4h → _1d → _1w → _1M` (replaces the OHLCV Rollup Lambda) | +| Partitioning | `PARTITION BY toYYYYMM(timestamp)` on every OHLCV/oracle table; cleanup via `ALTER TABLE … DROP PARTITION` | +| Database client (Rust) | [`clickhouse`](https://crates.io/crates/clickhouse) — async, native protocol over HTTPS-mTLS | +| Schema tooling | Plain SQL DDL applied by the prices-api schema applier on first deploy; prices-api owns `prices.*` migrations unilaterally (ADR 0007 §3.7) | +| Hosting | BE-managed Hetzner box behind Caddy:443; cross-cloud (AWS → Hetzner) hop, ~80–130 ms RTT mitigated by warm connection reuse and batched per-ledger writes | +| Credentials | AWS Secrets Manager — per-env client `{cert,key,ca}` as a single JSON bundle secret per identity (one secret per identity per env, named by `MTLS_SECRET_NAME`; ADR 0007 / task 0063) | **Why ClickHouse on a BE-shared cluster (ADR 0007):** @@ -768,7 +769,7 @@ minutes. ADR 0005 made Stream 2 a local workstation CLI; ADR 0001 had already done the same for Stream 1. Neither stream has a continuously-running cloud-side process now, so none of those fields had a meaningful value to write. Operators inspect live CLI progress (rate, ETA) via direct SQL on the -workstation Postgres; the cloud row carries only the most recent push state. +local workstation ClickHouse; the cloud row carries only the most recent push state. **Freshness alarm (replaces heartbeat alarm).** A CloudWatch alarm watches `sdex.last_push_at`. If it is older than the configured push-cadence @@ -1092,10 +1093,10 @@ monthly partitions, plus heartbeat/status to `backfill_progress`. ### 7.1 Two-stream design (ADRs 0001, 0005, 0007) -| Stream | Data location | Era | Method | -| --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **SDEX trades** | `ClaimAtom` from the five trade-shaped op types in `LedgerCloseMeta` XDR | All-time (2015 → present, ~57M ledgers) | Local Rust CLI on operator workstation (anonymous reads against `s3://aws-public-blockchain`) → local Postgres → `sdex-cloud-push` lands rows in Hetzner CH `prices.*` (ADR 0005) | -| **Soroban AMM swaps** (Soroswap, Aquarius, Phoenix) | `soroban_events` in **local** ClickHouse, populated upfront by BE's `backfill-runner --target=clickhouse` against the same public S3 archive | Soroban activation (Nov 2023) → present (~8.5M ledgers) | Local Rust CLI on operator workstation; one-shot completion push lands rows in Hetzner CH `prices.*` (ADR 0001) | +| Stream | Data location | Era | Method | +| --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **SDEX trades** | `ClaimAtom` from the five trade-shaped op types in `LedgerCloseMeta` XDR | All-time (2015 → present, ~57M ledgers) | Local Rust CLI on operator workstation (anonymous reads against `s3://aws-public-blockchain`) → local ClickHouse → `sdex-cloud-push` lands rows in Hetzner CH `prices.*` (ADR 0005) | +| **Soroban AMM swaps** (Soroswap, Aquarius, Phoenix) | `soroban_events` in **local** ClickHouse, populated upfront by BE's `backfill-runner --target=clickhouse` against the same public S3 archive | Soroban activation (Nov 2023) → present (~8.5M ledgers) | Local Rust CLI on operator workstation; one-shot completion push lands rows in Hetzner CH `prices.*` (ADR 0001) | The Soroban AMM stream is handled first (Tranche 1). The operator runs BE's `backfill-runner --target=clickhouse` to populate a local CH instance @@ -1139,7 +1140,7 @@ Local ClickHouse (Docker) on operator workstation │ `stellar-xdr` crate │ │ - Extracts token pair + amounts │ │ - Buckets to per-source 1-minute rows │ -│ - Writes to local Postgres (Docker) on workstation │ +│ - Writes to local CH prices.* mirror (Docker) │ └──────────────────────────────────────────────────────────┘ │ ▼ one-shot completion push (only Hetzner-CH-touching step on Stream 1) @@ -1155,12 +1156,12 @@ Hetzner ClickHouse `prices.*` (HTTPS-mTLS to Caddy:443) flowchart LR S3archive[(s3://aws-public-blockchain
Stellar public history)] -->|backfill-runner --target=clickhouse| LocalCH[(Local ClickHouse
Docker, workstation
soroban_events)] LocalCH -->|signature='swap'
contract_id IN Soroswap/Aquarius/Phoenix
JOIN ledgers ON closed_at| CLI[soroban-amm-backfill
Local Rust CLI
ScVal decode via stellar-xdr] - CLI -->|per-source 1-min rows| LocalPG[(workstation Postgres)] - LocalPG -->|one-shot completion push
HTTPS-mTLS to Caddy:443| CH[(Hetzner ClickHouse
prices.price_ohlcv_*)] + CLI -->|per-source 1-min rows| LocalMirror[(local ClickHouse
prices.* mirror)] + LocalMirror -->|one-shot completion push
HTTPS-mTLS to Caddy:443| CH[(Hetzner ClickHouse
prices.price_ohlcv_*)] CLI -->|status=completed,
last_push_at| BP[(prices.backfill_progress)] classDef store fill:#e8f1ff,stroke:#3a6ea5,stroke-width:2px; - class S3archive,LocalCH,LocalPG,CH,BP store; + class S3archive,LocalCH,LocalMirror,CH,BP store; ``` | Metric | Value | Notes | @@ -1169,7 +1170,7 @@ flowchart LR | Ledger range | ~48.5M–57M (Nov 2023 to present) | ~8.5M ledgers worth of events | | Runtime | Local Rust CLI on operator workstation (`soroban-amm-backfill`) | No AWS infrastructure for the backfill itself; mirrors §7.4 Stream 2's local-CLI pattern | | Workstation prep step | BE `backfill-runner --target=clickhouse` populates local CH | One-shot; runs against `s3://aws-public-blockchain` anonymous reads | -| Sink during backfill | Local Postgres (Docker) on workstation | Hetzner ClickHouse is **not** written until the one-shot completion push | +| Sink during backfill | Local ClickHouse `prices.*` mirror (Docker) on workstation | Hetzner ClickHouse is **not** written until the one-shot completion push | | Estimated wall-clock | A few hours, dominated by `backfill-runner` archive ingestion | CH query + extraction + OHLCV write is fast against an indexed local store | | Cloud-push cadence | One-shot completion push only | `prices.backfill_progress.soroban_amm` advances from `running` to `completed` in a single transition | | Expected completion | During Tranche 1 (Week 2–3) | After the push, the local CH instance is torn down | @@ -1189,12 +1190,12 @@ s3://aws-public-blockchain/v1.1/stellar/ledgers/pubnet/ │ - Filter + extract per task 0022's spec: │ │ 5 trade-shaped op types → ClaimAtom → TradeTick │ │ - Buckets to per-source 1-minute rows │ -│ - Per-ledger atomic txn: row UPSERTs + │ -│ backfill_progress checkpoint commit together │ +│ - Per-ledger checkpoint: INSERT rows, then │ +│ record ledger in backfill_sdex_ledgers │ └─────────────────────────────────────────────────────────┘ │ ▼ - Local Postgres (Docker) on workstation + Local ClickHouse (Docker) on workstation (operator-owned; backfill writes here, not Hetzner CH) │ ▼ `sdex-cloud-push` (separate post-backfill tool, HTTPS-mTLS) @@ -1208,8 +1209,8 @@ s3://aws-public-blockchain/v1.1/stellar/ledgers/pubnet/ flowchart TD Arch[(s3://aws-public-blockchain
Stellar public history
anonymous --no-sign-request)] -->|aws s3 sync| LocalDisk[(local .zst files)] LocalDisk -->|xdr-parser crate
git Cargo dep| CLI[sdex-backfill
Local Rust CLI on workstation
~311 ledgers/s, ~1.12M/hr] - CLI -->|filter 5 trade-shaped op types
extract ClaimAtom
bucket to per-source rows| LocalPG[(workstation Postgres)] - LocalPG -->|sdex-cloud-push
tip-backward chunks
HTTPS-mTLS to Caddy:443| CH[(Hetzner ClickHouse
prices.price_ohlcv_*)] + CLI -->|filter 5 trade-shaped op types
extract ClaimAtom
bucket to per-source rows| LocalChStage[(local ClickHouse
price_ohlcv staging)] + LocalChStage -->|sdex-cloud-push
tip-backward chunks
HTTPS-mTLS to Caddy:443| CH[(Hetzner ClickHouse
prices.price_ohlcv_*)] CLI -->|per-ledger atomic checkpoint| BPlocal[(local backfill_progress)] CH -->|push updates row| BP[(prices.backfill_progress
sdex_archive)] BP -->|last_push_at > tranche threshold| Alarm[CloudWatch Alarm
→ SNS email + Slack] @@ -1218,7 +1219,7 @@ flowchart TD classDef store fill:#e8f1ff,stroke:#3a6ea5,stroke-width:2px; classDef alarm fill:#ffe5e5,stroke:#a53a3a,stroke-width:2px; - class Arch,LocalDisk,LocalPG,CH,BP,BPlocal,CH1m store; + class Arch,LocalDisk,LocalChStage,CH,BP,BPlocal,CH1m store; class Alarm alarm; ``` @@ -1227,18 +1228,19 @@ flowchart TD | Total ledgers | ~57 million | Ledger 1 (Nov 2015) to current tip | | Runtime | Local Rust CLI on operator workstation | No AWS infrastructure for the backfill itself; mirrors BE `backfill-runner` | | Source | `s3://aws-public-blockchain/v1.1/stellar/ledgers/pubnet/` | Anonymous `--no-sign-request`; no AWS account needed to read | -| Sink during backfill | Local Postgres (Docker) on workstation | Hetzner ClickHouse is **not** written during backfill — only by `sdex-cloud-push` | +| Sink during backfill | Local ClickHouse (Docker) on workstation | Hetzner ClickHouse is **not** written during backfill — only by `sdex-cloud-push` | | Measured CLI rate | ~311 ledgers/s (~1.12M ledgers/hour) | Per task 0022's measurement against the SDEX filter | | Effective wall-clock (network-bound) | ~12–16 days continuous on one laptop | Archive sync is the bottleneck; CPU rarely saturates | | Cloud-push cadence | Tip-backward chunks | The cloud `GET /backfill/status` view advances at push cadence, not CLI cadence | | Expected completion | Full historical coverage extends past Tranche 3 | Tranche 3 acceptance is "progressing", not "complete" | The `sdex-backfill` CLI is **resumable at per-ledger granularity**: each -ledger's row UPSERTs and its `backfill_progress` checkpoint advance commit -atomically in a single local-Postgres transaction. A crash mid-ledger leaves -`current_ledger` pointing at the last fully-processed ledger; restart -re-fetches and re-UPSERTs that ledger idempotently. Early ledgers (pre-2018) -have very few DEX trades and process faster. +processed ledger is recorded in the local `backfill_sdex_ledgers` checkpoint +table after its rows are INSERTed. A crash mid-ledger leaves `current_ledger` +pointing at the last fully-processed ledger; restart skips ledgers already +recorded and re-inserts the in-flight ledger idempotently (re-inserted rows +collapse under `ReplacingMergeTree(version)`). Early ledgers (pre-2018) have +very few DEX trades and process faster. ### 7.4a Backfill state machine (`prices.backfill_progress.status`) @@ -1404,9 +1406,10 @@ Scaled-up at high traffic (DB-relevant): - **ClickHouse endpoint reachable only via mTLS through Caddy:443** on the BE-managed Hetzner box. There is no other network surface to `prices.*`. -- **mTLS material in AWS Secrets Manager** (per-env client cert + key, two - secrets per env). Cert + key are loaded into the Lambda runtime on cold - start and held in memory for the container's lifetime; never in env vars +- **mTLS material in AWS Secrets Manager** (per-env client `{cert,key,ca}` + as a single JSON bundle secret per identity, named by `MTLS_SECRET_NAME`). + The bundle is loaded into the Lambda runtime on cold start and held in + memory for the container's lifetime; never in env vars or source. Rotation: 1-year manual cadence (BE Cluster C agreement); CloudWatch alarm on cert NotAfter approaching expiry; revocation = CA rotation on compromise. @@ -1801,7 +1804,7 @@ flowchart TB %% ============================================================ %% AWS — SECRETS MANAGER (mTLS material) %% ============================================================ - SM[("AWS Secrets Manager
per-env mTLS cert + key
(2 secrets per env)")] + SM[("AWS Secrets Manager
per-env mTLS {cert,key,ca}
(1 JSON bundle secret per identity)")] class SM store SM -.->|"loaded on cold start"| LIVE @@ -1814,21 +1817,21 @@ flowchart TB LocalCH[("Local ClickHouse
soroban_events
(Docker, torn down after push)")] AMM["soroban-amm-backfill
Rust CLI (Stream 1)
ScVal decode via stellar-xdr"] SDEX["sdex-backfill
Rust CLI (Stream 2)
~311 ledgers/s, ~1.12M/h"] - LocalPG[("Workstation Postgres
price_ohlcv staging
(Docker)")] + LocalStage[("Local ClickHouse
prices.* staging
(Docker)")] SDEXpush["sdex-cloud-push
tip-backward chunks"] AMMpush["AMM completion push
one-shot"] end class BERun,AMM,SDEX,SDEXpush,AMMpush writer - class LocalCH,LocalPG workstation + class LocalCH,LocalStage workstation Archives -->|"backfill-runner --target=clickhouse"| BERun BERun --> LocalCH LocalCH --> AMM Archives -->|"aws s3 sync"| SDEX - AMM --> LocalPG - SDEX --> LocalPG - LocalPG --> SDEXpush - LocalPG --> AMMpush + AMM --> LocalStage + SDEX --> LocalStage + LocalStage --> SDEXpush + LocalStage --> AMMpush %% ============================================================ %% HETZNER — CADDY + CLICKHOUSE (BE-managed) @@ -1954,13 +1957,13 @@ flowchart TB **How to read the diagram** - **Blue cylinders** are persistent stores (Hetzner CH tables, S3, archives, - workstation Postgres, AWS Secrets Manager). + workstation ClickHouse, AWS Secrets Manager). - **Green nodes** are writers (Lambdas + workstation CLIs + cloud-push tools). - **Yellow nodes** are public API endpoints (readers). - **Purple nodes** are external services and BE-managed infrastructure (Reflector, SNS topic, Caddy). -- **Orange nodes** are workstation-local components (Docker-hosted CH, local - Postgres) — outside the AWS / Hetzner production surface. +- **Orange nodes** are workstation-local components (Docker-hosted local + ClickHouse) — outside the AWS / Hetzner production surface. - **Red node** is the CloudWatch alarm fed by `prices.backfill_progress.sdex.last_push_at` (push freshness alarm; the heartbeat-style alarm from the prior design is gone). diff --git a/docs/prices-api-general-overview.md b/docs/prices-api-general-overview.md index 9eb04fb..0f0bb00 100644 --- a/docs/prices-api-general-overview.md +++ b/docs/prices-api-general-overview.md @@ -163,7 +163,7 @@ to their own infrastructure at any time if needed. | **Lambda — API handlers** | Public API | Individual functions per route group. Rust / axum via `lambda_runtime`, 256–512 MB, 15s timeout. No VPC; outbound HTTPS-mTLS to Caddy:443 | | **API Gateway** | Public API entry point | REST API, usage plans, API key auth, rate limiting (100 req/s per key), request validation. Built-in response cache (0.5 GB) with per-endpoint TTLs | | **EventBridge Scheduler** | Scheduled triggers | Cron/rate rules for all periodic Lambda workers | -| **Secrets Manager** | Credentials & mTLS material | Per-env client cert + key for Caddy:443 mTLS (2 secrets per env); Soroswap/Aquarius API keys; oracle contract address | +| **Secrets Manager** | Credentials & mTLS material | Per-env client `{cert,key,ca}` for Caddy:443 mTLS (single JSON bundle secret per identity, named by `MTLS_SECRET_NAME`); Soroswap/Aquarius API keys; oracle contract address | | **CloudWatch + X-Ray** | Observability | API latency, error rates, ingestion lag, Lambda duration/concurrency, backfill progress; mTLS cert NotAfter alarm | | **S3** (API docs) | Documentation hosting | OpenAPI spec + self-service onboarding portal, served via CloudFront | @@ -462,7 +462,7 @@ stream, heartbeating into the cloud row every 15 minutes). ADR 0005 made Stream 2 a local workstation CLI; ADR 0001 had already done the same for Stream 1. Neither stream has a continuously-running cloud-side process, so neither field has a meaningful value to write. Operators inspect live CLI -progress (rate, ETA) via direct SQL on the workstation Postgres; the cloud +progress (rate, ETA) via direct SQL on the local workstation ClickHouse; the cloud DB carries only the most recent push state. If a future stream reintroduces a continuous cloud-side writer, these columns can be added back at that time. @@ -713,7 +713,7 @@ Per ADRs 0001 and 0005, both canonical streams run as workstation-local processe cloud-side `prices.backfill_progress` row (ClickHouse, `ReplacingMergeTree(updated_at)`) advances only when each stream's push step runs. The response therefore reflects the most recent push state, not a live task heartbeat. Live CLI progress is visible to the operator -via direct SQL on the workstation Postgres but is not surfaced to API consumers. A +via direct SQL on the local workstation ClickHouse but is not surfaced to API consumers. A CloudWatch alarm fires if `last_push_at` falls behind the configured push-cadence threshold for the stream's tranche (operator-tunable; see §5.6 "GET /backfill/status Freshness"). @@ -810,8 +810,8 @@ table). The decoder writes the per-source bucket `vwap = volume_quote / volume_b nothing more. **mTLS write path.** The Lambda holds a `clickhouse` Rust client configured with the -per-env client certificate + key loaded from AWS Secrets Manager (two secrets per env per -ADR 0007 §3.5). Connections to Caddy:443 are warmed in the global init of the Lambda +per-env client `{cert,key,ca}` loaded from AWS Secrets Manager (a single JSON bundle +secret per identity, named by `MTLS_SECRET_NAME`, per ADR 0007 §3.5). Connections to Caddy:443 are warmed in the global init of the Lambda runtime and reused across invocations to amortise TLS handshake cost — important because the cross-cloud RTT (AWS → Hetzner, ~80-130 ms) dominates per-request latency if every INSERT opens a fresh connection. @@ -825,8 +825,8 @@ INSERT opens a fresh connection. | **Asset Discovery** | EventBridge rate(1 hour) | Ledger account entries in `LedgerCloseMeta` | New classic asset issuances; new SEP-41 contract deployments → `prices.assets` | | **Current Price Updater** | EventBridge rate(1 min) | `prices.price_ohlcv_1m` (after ingestion) | Cross-source VWAP per §5.5 → `prices.current_prices` | | **Cleanup Worker** | EventBridge cron(02:00 UTC) | `prices.*` | `ALTER TABLE … DROP PARTITION` for expired month-partitions of `price_ohlcv_1m`, `_15m`, `oracle_prices` | -| **SDEX Backfill CLI** (`sdex-backfill`, ADR 0005) | Local Rust CLI on operator workstation, run in tip-backward chunks during the project | `s3://aws-public-blockchain` (anonymous `--no-sign-request`) | Historical SDEX trades → per-source 1-min rows in **local Postgres** on the workstation | -| **SDEX Cloud Push** (`sdex-cloud-push`, ADR 0005) | Operator-invoked between chunks; only Hetzner-CH-touching component on the Stream 2 path | Local Postgres `price_ohlcv` + `assets` | Streams accumulated rows to `prices.price_ohlcv_*` over HTTPS-mTLS to Caddy:443; advances `prices.backfill_progress` row for `sdex_archive` (`current_ledger`, `last_push_at`) | +| **SDEX Backfill CLI** (`sdex-backfill`, ADR 0005) | Local Rust CLI on operator workstation, run in tip-backward chunks during the project | `s3://aws-public-blockchain` (anonymous `--no-sign-request`) | Historical SDEX trades → per-source 1-min rows in **local ClickHouse** on the workstation | +| **SDEX Cloud Push** (`sdex-cloud-push`, ADR 0005) | Operator-invoked between chunks; only Hetzner-CH-touching component on the Stream 2 path | Local ClickHouse `price_ohlcv` + `assets` | Streams accumulated rows to `prices.price_ohlcv_*` over HTTPS-mTLS to Caddy:443; advances `prices.backfill_progress` row for `sdex_archive` (`current_ledger`, `last_push_at`) | | **Soroban AMM Backfill CLI** (`soroban-amm-backfill`, ADR 0001) | One-shot Local Rust CLI on operator workstation, run once during Tranche 1 | Local ClickHouse `soroban_events` (populated upfront by BE's `backfill-runner --target=clickhouse`); ScVal decoding via the `stellar-xdr` crate | Historical Soroswap/Aquarius/Phoenix swaps → per-source 1-min rows; on completion runs the cloud push that lands all rows into `prices.*` on Hetzner CH and sets the `soroban_amm` `backfill_progress` row to `status='completed'` | **Worker removed: OHLCV Rollup Lambda** (eliminated by ADR 0007 §3.4). The 1m → 15m → 1h → @@ -896,7 +896,7 @@ which drives a two-stream backfill design: | Stream | Data location | Era | Method | | --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **SDEX trades** | `TransactionResultMeta` (`ClaimAtom` from the five trade-shaped op types) in `LedgerCloseMeta` XDR | All-time (2015 → present, ~57M ledgers) | Local Rust CLI on operator workstation (`s3://aws-public-blockchain` anonymous reads) → local Postgres → post-backfill cloud push to Hetzner ClickHouse `prices.*` (see ADR 0005) | +| **SDEX trades** | `TransactionResultMeta` (`ClaimAtom` from the five trade-shaped op types) in `LedgerCloseMeta` XDR | All-time (2015 → present, ~57M ledgers) | Local Rust CLI on operator workstation (`s3://aws-public-blockchain` anonymous reads) → local ClickHouse → post-backfill cloud push to Hetzner ClickHouse `prices.*` (see ADR 0005) | | **Soroban AMM swaps** (Soroswap, Aquarius, Phoenix) | `SorobanTransactionMeta.events` (CAP-67), inlined `topics_xdr` + `data_xdr` per row in BE's ClickHouse `soroban_events` (BE ADRs [0033](../../soroban-block-explorer/lore/2-adrs/0033_soroban-events-appearances-read-time-detail.md), [0044](../../soroban-block-explorer/lore/2-adrs/0044_clickhouse-pilot-parallel-store.md)) | Soroban activation (Nov 2023) → present (~8.5M ledgers) | Local Rust CLI on operator workstation, consuming a local ClickHouse instance populated upfront by BE's `backfill-runner --target=clickhouse` (see [ADR 0001](../lore/2-adrs/0001_stream1-clickhouse-sourced-amm-backfill.md)) → one-shot push to Hetzner ClickHouse `prices.*` | The Soroban AMM stream is handled first (Tranche 1). A short BE-tooling prep step runs BE's @@ -932,7 +932,7 @@ Local ClickHouse (Docker) on operator workstation │ `stellar-xdr` crate (shared BE-authored library) │ │ - Extracts token pair + amounts │ │ - Buckets to 1-minute price_ohlcv (ADR 0003 PK shape) │ -│ - Writes to local Postgres (Docker) on workstation │ +│ - Writes to local CH prices.* mirror (Docker) │ └──────────────────────────────────────────────────────────┘ │ ▼ one-shot completion push (only Hetzner-CH-touching step on Stream 1) @@ -961,12 +961,12 @@ s3://aws-public-blockchain/v1.1/stellar/ledgers/pubnet/ │ - Filter + extract per task 0022's spec: │ │ 5 trade-shaped op types → ClaimAtom → TradeTick │ │ - Buckets to 1-minute price_ohlcv (ADR 0003 PK shape) │ -│ - Per-ledger atomic txn: price_ohlcv UPSERTs + │ -│ backfill_progress checkpoint commit together │ +│ - Per-ledger checkpoint: INSERT rows, then │ +│ record ledger in backfill_sdex_ledgers │ └─────────────────────────────────────────────────────────┘ │ ▼ - Local Postgres (Docker) on workstation + Local ClickHouse (Docker) on workstation (operator-owned; backfill writes here, not Hetzner CH) │ ▼ `sdex-cloud-push` (separate post-backfill tool, HTTPS-mTLS) @@ -1008,7 +1008,7 @@ becomes a plain `xdr-parser = "X.Y.Z"` pin — no design change.) | Runtime | Local Rust CLI on operator workstation (`soroban-amm-backfill`) | No AWS infrastructure for the backfill itself; mirrors §5.6 Stream 2's local-CLI pattern | | Workstation prep step | BE `backfill-runner --target=clickhouse` populates local CH | One-shot; runs against `s3://aws-public-blockchain` anonymous reads — no AWS account required | | Local CH footprint | Hundreds of GB pre-compression; substantially smaller post-ZSTD on disk | Sized for an operator workstation per ADR 0001 §Rationale (dev-laptop, not Fargate/EC2) | -| Sink during backfill | Local Postgres (Docker) on workstation | Hetzner ClickHouse `prices.*` is **not** written until the one-shot completion push | +| Sink during backfill | Local ClickHouse (Docker) on workstation | Hetzner ClickHouse `prices.*` is **not** written until the one-shot completion push | | Estimated wall-clock (including CH prep) | A few hours, dominated by `backfill-runner` archive ingestion | CH query + extraction + OHLCV write is fast against an indexed local store | | Cloud-push cadence | One-shot completion push only | `prices.backfill_progress` row for `soroban_amm` advances from `running` to `completed` in a single transition | | Expected completion | During Tranche 1 (Week 2–3) | After the push, the local CH instance is torn down | @@ -1020,7 +1020,7 @@ becomes a plain `xdr-parser = "X.Y.Z"` pin — no design change.) | Total ledgers | ~57 million | Ledger 1 (Nov 2015) to current tip | | Runtime | Local Rust CLI on operator workstation | No AWS infrastructure for the backfill itself; mirrors BE `backfill-runner` | | Source | `s3://aws-public-blockchain/v1.1/stellar/ledgers/pubnet/` | Anonymous `--no-sign-request`; no AWS account needed to read | -| Sink during backfill | Local Postgres (Docker) on workstation | Hetzner ClickHouse `prices.*` is **not** written during backfill — only by the post-backfill `sdex-cloud-push` step | +| Sink during backfill | Local ClickHouse (Docker) on workstation | Hetzner ClickHouse `prices.*` is **not** written during backfill — only by the post-backfill `sdex-cloud-push` step | | Measured CLI rate | ~311 ledgers/s (~1.12M ledgers/hour) | Per task 0022's measurement against the SDEX filter | | Effective wall-clock (network-bound) | ~12–16 days continuous on one laptop | Archive sync is the bottleneck; CPU rarely saturates | | Cloud-push cadence | Tip-backward chunks (e.g. `--start=tip-1.1M --end=tip` for the first Tranche 1 chunk, then older ranges) | The cloud `GET /backfill/status` view advances at push cadence, not CLI cadence | @@ -1068,7 +1068,7 @@ because push cadence is driven by tip-backward chunk size, not by a continuous h A laptop-side staleness check is **not** wired into AWS alarms — workstation uptime is an operator-managed concern (BE accepts the same trade in BE ADR 0010). Operators inspect local -CLI progress via direct SQL on the workstation Postgres. +CLI progress via direct SQL on the local workstation ClickHouse. --- @@ -1132,7 +1132,7 @@ on ClickHouse, that machinery is not part of the prices-api budget at any traffi local CH instance - **Trust:** BE-managed self-signed CA; prices-api receives per-env client cert + key issued via BE's per-AWS-service issuance script - - **Storage:** cert + key live in AWS Secrets Manager (2 secrets per env), loaded into the + - **Storage:** `{cert,key,ca}` live in AWS Secrets Manager as a single JSON bundle secret per identity (named by `MTLS_SECRET_NAME`), loaded into the Lambda runtime on cold start, held in memory for the container's lifetime - **Rotation:** 1-year manual rotation cadence (BE Cluster C agreement); CloudWatch alarm on cert NotAfter approaching expiry; re-issuance is a single operator step on