Antalya-26.3 Added support for TTL EXPORT#1810
Conversation
|
With the new commit 04206 (syntax check) is now passing |
0f90fd6 to
1d0c1a7
Compare
Tests describe the contract for the upcoming `TTL ... EXPORT TO db.table` action. They are added before the C++ implementation so they double as the acceptance criteria. Stateless (tests/queries/0_stateless): - 04206_ttl_export_partition_syntax: parser/metadata round-trip and rejection of (a) two EXPORT TTLs to the same destination and (b) EXPORT TTL on a table without a partition key. - 04207_ttl_export_partition_basic: happy path, plus an in-line assertion that a future-dated partition is not exported. - 04208_ttl_export_partition_skip_already_exported: re-triggering after a partition has been exported does not duplicate it. Integration (tests/integration/test_ttl_export_partition): - test_basic_to_iceberg, test_only_one_replica_submits, test_failure_and_backoff, test_serial_across_partitions, test_replica_restart_mid_export, test_modify_ttl_picks_up_with_materialize, test_disabled_replica, test_dedup_via_high_water_mark. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce parser, AST, and metadata plumbing for `TTL ... EXPORT TO db.table`. No background scheduler or per-part TTL info yet — those land in follow-up commits. The clause is recognised, round-trips through `SHOW CREATE TABLE`, and the resulting `TTLDescription` is collected into `TTLTableDescription`'s new `export_ttl` list (exposed via `StorageInMemoryMetadata::getExportTTLs`). Validation in `TTLTableDescription::getTTLForTableFromAST`: * reject two `EXPORT` clauses to the same destination, * reject `EXPORT` TTL on a table with no partition key. The destination-specific override of `TTLDescription::result_column` (`"_export_" + db + "." + table`) is required so that future per-part TTL info (keyed by `result_column`) keeps separate clocks per destination. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stores per-part TTL info under `MergeTreeDataPartTTLInfos::export_ttl`,
keyed by `TTLDescription::result_column`. The map is:
* populated at write time in `MergeTreeDataWriter` for every TTL
returned by `getExportTTLs`,
* recomputed during `MATERIALIZE TTL` and merge-time TTL recompute via
`TTLCalcTransform` / `TTLTransform` (a new `TTLUpdateField::EXPORT_TTL`
finalizes into the right map),
* serialized in JSON under the `"export"` key (mirroring the
`recompression` entry),
* propagated across merges through the existing `update` aggregation,
* surfaced through `hasAnyNonFinishedTTLs` and `checkAllTTLCalculated`
so old parts that predate the TTL are flagged for `MATERIALIZE TTL`.
Adds the partition-wide helper `getPartitionExportTTLMax`: returns the
max `export_ttl.max` across all parts of a partition, or `nullopt` if
any part is missing the entry (with optional `missing_parts_out` for
the scheduler to log). Deliberate: no on-the-fly evaluation — the user
runs `ALTER TABLE ... MATERIALIZE TTL` to backfill, same UX as moves
and recompression TTLs.
Also pulls `export_ttl` into `hasAnyTableTTL` / `hasOnlyRowsTTL` and the
`getColumnDependencies` TTL-column-set walk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Align the TTL syntax with `ALTER ... EXPORT PARTITION TO TABLE`: the keyword is now `EXPORT TO TABLE <db.table>` instead of `EXPORT TO <db.table>`. Parser, formatter, exception messages, and tests updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an `ExportOrigin` enum (`alter` | `ttl`) to the manifest body so
manifests submitted manually (`ALTER ... EXPORT PARTITION`) can be told
apart from manifests submitted by the upcoming TTL scheduler. Surfaced
as `system.replicated_partition_exports.export_origin
Enum8('alter' = 0, 'ttl' = 1)`. Existing manifests in ZooKeeper that
don't carry the field read back as `alter` for backwards compatibility.
`ttl`-origin manifests are skipped by manifest-TTL eviction: the
background cleanup in `ExportPartitionManifestUpdatingTask` and the
overwrite path in `StorageReplicatedMergeTree::exportPartitionToTable`
both refuse to consider them expired. The existing
`export_merge_tree_partition_force_export` setting still overrides via
the unchanged gate.
The write site keeps the default `ExportOrigin::alter`; the TTL
submitter writes `ExportOrigin::ttl` in a follow-up commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new query-level setting `export_merge_tree_partition_mark_as_ttl` (default false). When set on `ALTER ... EXPORT PARTITION`, the resulting manifest is written with `export_origin = ttl` (same as what the TTL scheduler will write in a follow-up). The TTL scheduler always sets this implicitly when it submits. Enforces the "at most one ttl-origin manifest per (src, dest)" invariant at submission time: when a ttl-origin manifest is being created, scan siblings under `<zk_root>/exports/` for an existing ttl-origin marker at a different `partition_id`. If found at `P_old`, reject the submission as a back-fill (`new < P_old`) unless `export_merge_tree_partition_force_export` is set; otherwise best-effort `tryRemoveRecursive` of the old marker before creating the new one. Same-key collisions continue to be handled by the existing block. A plain `alter` over a ttl marker at a different partition is allowed without friction — alter manifests coexist with the ttl marker, and the TTL scheduler will filter by `export_origin = ttl` when reading its own state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts the readable-vs-insertable column diff and the partition-key AST compare from `StorageReplicatedMergeTree::exportPartitionToTable` into `ExportPartitionUtils::verifyExportDestinationCompatibility`, and calls it from `TTLTableDescription::getTTLForTableFromAST` for every `TTLMode::EXPORT` clause when not attaching. The destination is resolved through `DatabaseCatalog::getTable`, matching the manual `ALTER ... EXPORT PARTITION` flow (throws `UNKNOWN_TABLE` if missing). The check is skipped under `is_attach=true` because the destination table may not yet be loaded at server startup; submission-time validation in `exportPartitionToTable` still covers that path. Iceberg destinations skip the partition-key AST compare here; the existing `verifyIcebergPartitionCompatibility` runs against the runtime iceberg metadata at submission time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces `TTLExportScheduler`, a per-`StorageReplicatedMergeTree` background driver that submits partition exports for tables with `TTL ... EXPORT TO TABLE db.table`. The scheduler is stateless across restarts: it reads the latest `export_origin = ttl` manifest from ZooKeeper on every tick and acts on its status — no manifest → submit the smallest eligible partition; PENDING → wait; COMPLETED → walk forward to `partition_id > completed`; FAILED → resubmit with `force_export=1` after per-partition exponential backoff; KILLED → idle with a `LOG_WARNING` carrying the recovery recipe. `submit` classifies outcomes as `Submitted | Transient | Failure` so ZK CAS races and `UNKNOWN_TABLE` (destination dropped post-DDL) do not bump backoff, while genuine submission errors do. Adds three table-level settings used by the scheduler: `export_merge_tree_partition_ttl_poll_interval_seconds` (default 5), `export_merge_tree_partition_ttl_min_backoff_seconds` (default 1), `export_merge_tree_partition_ttl_max_backoff_seconds` (default 60). The scheduler is not yet wired into the background task pool; that follows in a separate commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Declare the scheduler and its background task next to the other `export_merge_tree_partition_*` task holders. Under the `allow_experimental_export_merge_tree_partition` server gate: - Construct the scheduler and create the `TTLExport` task, logging any exceptions from `run` via `tryLogCurrentException`. - `ReplicatedMergeTreeRestartingThread::tryStartup` activates the task alongside the other export tasks; `partialShutdown` deactivates it. - `alter` calls `ttl_export_task->schedule` when any `MODIFY TTL` command is in the alter so newly added EXPORT TTLs take effect immediately. - `TTLExportScheduler::run` reschedules itself with `export_merge_tree_partition_ttl_poll_interval_seconds * jitter25` on the polling path. Early returns (shutdown, readonly, no EXPORT TTL, experimental gate off) intentionally skip the reschedule — deactivation, the `alter` hook, and server-level config drive those paths instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1d0c1a7 to
f7d8a27
Compare
In `exportPartitionToTable`, the cross-partition swap used to delete the
existing ttl-origin marker before the `tryMulti` that creates the new
manifest. Any throw between the delete and the multi (parts.empty,
pending mutations, iceberg compatibility, or a ZK error mid-multi) left
the scheduler with no ttl marker at all, so the next tick of
`TTLExportScheduler` would treat the table as fresh and restart from the
oldest expired partition.
Now:
- The sibling cache walk collects every stale ttl-origin entry for the
destination, not just the first one encountered. The freshest by
`create_time` is used for the back-fill check; the others are kept
for cleanup. This makes the walk deterministic (no break-on-first-
match in unspecified iteration order) and self-healing for any
stragglers a previous cleanup may have missed.
- The `tryRemoveRecursive` of those stale markers runs after the
`tryMulti` succeeds. Failures during validation or in the multi
itself now leave the existing marker intact, so the scheduler keeps
its high-water mark.
- The post-multi cleanup is best-effort; a ZK error there at worst
leaves dead nodes, which the next ttl submission will reap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `export_merge_tree_partition_mark_as_ttl` to the Antalya 26.3 session block, and the three TTL EXPORT scheduler MergeTree settings (`export_merge_tree_partition_ttl_poll_interval_seconds`, `export_merge_tree_partition_ttl_min_backoff_seconds`, `export_merge_tree_partition_ttl_max_backoff_seconds`) to the 26.3 MergeTree block, so `02995_new_settings_history` passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
instead of context's `current_database` This should fix the stateless tests.
|
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1ebd61019e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Partition IDs are arbitrary strings derived from the partition expression; lex comparison can permanently strand newer partitions when widths differ (`PARTITION BY year` with values 9, 10, 11 lex-orders as "10", "11", "9", so after exporting "9" the scheduler would skip "10" and "11" forever). `pickPartition` now forward-walks by expiration max, with the floor's expiration max recomputed from the source's current parts. The floor's exact `partition_id` is still excluded by string equality so we don't re-pick the partition we just completed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous representation flattened `dst_database` and `dst_table` into a single `destination_name` string joined by a dot. That made the boundary ambiguous for quoted table names that legitimately contain a dot — e.g. ``TTL ... EXPORT TO TABLE `a.b` `` round-tripped through the formatter as `` `a`.`b` `` (database `a`, table `b`), changing semantics on metadata reattach and sending exports to the wrong destination via `QualifiedTableName::parseFromString` in the scheduler and the DDL-time schema-compat check. Mirror the convention already used by `ASTAlterQuery` for the matching `ALTER ... EXPORT PARTITION ... TO TABLE` syntax: store `destination_database` and `destination_name` as two separate strings on `ASTTTLElement` and `TTLDescription`. The parser fills both via `parseDatabaseAndTableName`; the formatter emits `backQuoteIfNeed(db) + "." + backQuoteIfNeed(table)` when qualified, else just the table. Both `parseFromString` call sites use the fields directly. Per-part `export_ttl` map is keyed by `TTLDescription::getExportKey`, which uses `backQuoteIfNeed` for both parts so qualified `(a, b)` and unqualified single-table `a.b` produce distinct keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48097f2db9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
`exportPartitionToTable` rejected a `mark_as_ttl=1` submission when the new `partition_id` was lex-less than the existing ttl marker's `partition_id`. With variable-width partition IDs (e.g. `PARTITION BY year` yielding "9", "10"), the scheduler's natural forward step picks "10" after completing "9", the guard throws BAD_ARGUMENTS because `"10" < "9"` lex, the scheduler classifies that as `Failure`, bumps backoff, and retries with the same partition — stalling TTL exports for that destination. Mirror the `pickPartition` fix: locate the EXPORT TTL targeting this destination, compute the expiration max for both the new partition and the existing marker's partition from the source's current parts, and throw only when the new expiration max is strictly less than the old. Skip the guard when no matching TTL is found (orphaned marker after the TTL was dropped) or when either partition's parts are gone (DELETE TTL already cleaned up). `export_merge_tree_partition_force_export` continues to bypass the whole guard, including legitimate backward moves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`TTLExportScheduler` is only constructed and activated by `StorageReplicatedMergeTree`; the base `MergeTreeData::exportPartitionToTable` throws `NOT_IMPLEMENTED`. Without a DDL-time guard, `CREATE TABLE ... ENGINE = MergeTree ... TTL ... EXPORT TO TABLE` and the matching `ALTER` succeeded but no background exporter ever submitted manifests — silent no-op. Throw `NOT_IMPLEMENTED` at parse time: - CREATE: in `registerStorageMergeTree.cpp` right after `getTTLForTableFromAST`, using the local `replicated` bool. - ALTER: in `MergeTreeData::checkAlterIsPossible`, using virtual `supportsReplication`. The check can't live in `checkTTLExpressions` because that method is also called from `MergeTreeData`'s constructor, where virtual dispatch returns the base-class value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the full `std::map<String, DataPartsVector>` build + two lookups with a single pass that fills two `DataPartsVector`s for the new and the existing-marker partitions directly. `getPartitionExportTTLMax` returns `nullopt` for an empty vector, so the `find`/`end` guards collapse into the function call. No semantic change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dd3a2ac2ef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
`StorageInMemoryMetadata::getExportTTLs` returns `TTLDescriptions` by value. The range-based for at the matching-TTL search bound the loop to that temporary, and the temporary's lifetime ends at the closing brace of the range-for statement. The `matching_ttl` pointer set inside the loop then dangled by the time `getPartitionExportTTLMax` dereferenced it below — use-after-free on every `mark_as_ttl=1` submission that exercised the backward-marker guard. Hoist the returned container into a local so it outlives the pointer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ALTER ... EXPORT PARTITION ... TO TABLE` resolves `to_database` at submit
time, and `CREATE MATERIALIZED VIEW ... TO db.t` resolves the TO target at
parse time via `setCurrentDatabase` in `InterpreterCreateQuery`. Two
different conventions exist:
- CREATE MV target: filled with the user's session `current_database`.
- ALTER expressions: filled with the source table's database via
`AddDefaultDatabaseVisitor` in `InterpreterAlterQuery`.
TTL EXPORT had no analogous step, so the AST stored whatever the user
wrote, and downstream code had to re-resolve at every use. A previous
attempt to resolve only inside `TTLDescription::getTTLFromAST` produced
inconsistent map keys for the per-part `export_ttl` info between
CREATE-time and ATTACH-time `TTLDescription` instances, which broke the
`test_replica_restart_mid_export` integration test.
Follow the existing patterns: teach `AddDefaultDatabaseVisitor` about
`ASTTTLElement` so ALTER fills `destination_database` with the source
table's database, and add an analogous walk to
`InterpreterCreateQuery::createTable` so CREATE fills it with the user's
`current_database`. From this point on, every code path sees a
fully-qualified destination in the AST and downstream `TTLDescription`,
and the per-part map key (`TTLDescription::getExportKey`) is consistent
across CREATE / ATTACH / restart.
The 04206 syntax test extracts the formatted TTL clause from
`system.tables.create_table_query`, which now contains the qualified
destination; strip the `currentDatabase()` prefix so the reference output
stays stable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 153b4a1677
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The scheduler previously recovered the floor partition's expiration max by looking up the marker's `partition_id` in the source's currently active parts. With the canonical paired-TTL recipe — `EXPORT after 7 days, DELETE after 30 days` — the marker's partition is dropped by retention as a matter of course. At that point the lookup misses, `floor_max` stays unset, the ordering filter in `TTLExportScheduler::pickPartition` is silently skipped, and any older expired partition with surviving parts becomes eligible. The result is a silent replay and a backwards step for the `(source, destination)` stream. Add `export_ttl_max` to `ExportReplicatedMergeTreePartitionManifest` and populate it at submit time from the matching `EXPORT TTL` clause and the exact parts being exported. The contract is: a manifest with `export_origin = ttl` always carries a non-zero `export_ttl_max`, and together with `status = COMPLETED` it guarantees that the partition has been successfully exported up to that expiration max. `pickPartition` now reads the marker's stored watermark directly — no source-parts lookup, so the ordering survives the marker's partition being dropped. Enforcing the invariant at the write site lets the submit-time backward-marker guard fold its work too: the new partition's max comes from the same `getPartitionExportTTLMax` call that populates the manifest, and the existing marker's max comes from its stored `export_ttl_max` (read out of the in-memory `export_merge_tree_partition_task_entries` cache). The duplicate matching-TTL lookup and the separate `old_parts/new_parts` bucketing scan are gone. Two new throws at the write site surface bad inputs explicitly: `mark_as_ttl=1` without a matching `EXPORT TTL` clause, and a partition whose parts pre-date the TTL (the user is told to run `ALTER TABLE ... MATERIALIZE TTL`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f6cbe79cb2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The watermark-based forward walk used a strict-less comparison on `expiration_max` alone. When two partitions share the same partition-wide expiration max — which happens whenever the EXPORT TTL expression is not monotonic by partition key, e.g. `PARTITION BY user_id` or hash partitioning — the scheduler would ping-pong indefinitely: - Marker at `(A, T)`; `pickPartition` sees `B` with max `T`, `T < T` is false → `B` is eligible → submit `B`. - Marker becomes `(B, T)`; same logic flips → submit `A` again. - The submit-time backward-marker guard used the same one-key check, so it did not stop the cycle either, and the stale-marker cleanup removed each prior manifest at the end of the new submission. Each cycle replays data to the destination. Tie-break the walk by `partition_id` lex order as a secondary key: - `TTLExportScheduler::pickPartition` rejects partitions with `(*max_ttl, pid) <= (floor->expiration_max, floor->partition_id)` and picks the smallest tuple as the best candidate. - `StorageReplicatedMergeTree::exportPartitionToTable` mirrors the comparison in the backward-marker guard. Partition IDs are arbitrary strings (lex `"10" < "9"`), so they cannot serve as the primary key for forward progress — but as a deterministic tie-breaker for equal expiration_max they're fine, since the ordering among tied partitions has no semantic meaning. Any stable order will do; lex is the cheapest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ation The local `alter` path already schedules `ttl_export_task` when a `MODIFY TTL` adds an EXPORT TTL, but replicas that did not initiate the ALTER apply the metadata change through `setTableStructure` / `checkAndFindDiff` and never enter that branch. `TTLExportScheduler::run` returns without rescheduling when there is no EXPORT TTL, so a replica that started up without one and later receives one via replication stays dormant indefinitely. In a failover where the initiating replica goes down, TTL exports stop cluster-wide. Hook `setTableStructure` to call `ttl_export_task->schedule()` exactly on the transition from "no EXPORT TTL" to "has EXPORT TTL" — the canonical landing point for replicated metadata changes, and the precise check avoids spurious wakes on unrelated replicated ALTERs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Had to argue with the AI a bit about what is considered an issue (and spent $10 just on this audit alone 🥲), but eventually got a clean report AI audit note: This review comment was generated by AI (gpt-5.3-codex). Audit update for PR #1810 ( No confirmed defects in reviewed scope. Coverage summary:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0469250927
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| || entry.manifest.destination_table != dest_table) | ||
| continue; | ||
|
|
||
| if (!latest || entry.manifest.create_time > latest->create_time) |
There was a problem hiding this comment.
Select TTL marker by durable order, not wall-clock time
This picks the current TTL marker using manifest.create_time, which comes from each replica’s local time(nullptr) and is not globally ordered. When stale ttl-origin manifests coexist (which this change explicitly allows as a transient state after best-effort cleanup), clock skew can make an older marker look newer, so the scheduler may follow the wrong status/partition (e.g., idling on a stale KILLED marker or replaying older partitions) and lose forward progress guarantees. The marker should be chosen by a replica-independent durable order (for example the persisted (export_ttl_max, partition_id) watermark or ZooKeeper ordering metadata), not per-host wall clock.
Useful? React with 👍 / 👎.
arthurpassos
left a comment
There was a problem hiding this comment.
I'll continue it tomorrow morning
| addSettingsChanges(merge_tree_settings_changes_history, "26.3", | ||
| { | ||
| {"export_merge_tree_partition_ttl_poll_interval_seconds", 5, 5, "New setting for the TTL EXPORT scheduler poll interval."}, | ||
| {"export_merge_tree_partition_ttl_min_backoff_seconds", 1, 1, "New setting for the TTL EXPORT scheduler minimum backoff."}, |
There was a problem hiding this comment.
Do you plan to merge it with the backoff policy?
There was a problem hiding this comment.
Yeah, I was planning to remove the backoff policy from the TTL scheduler itself and rely strictly on export partition backoff. Since it's not yet a thing, I thought about keeping it for a while. But considering the fact it actually introduces settings and changes in settings history, I think I'll drop it now
|
|
||
| namespace ExportPartitionUtils | ||
| { | ||
| void verifyExportDestinationCompatibility( |
There was a problem hiding this comment.
Haven't checked the rest of the code yet, but we must also check for partition expression compatbility for data lakes
There was a problem hiding this comment.
Yeah true, thanks. I haven't moved that part of the code yet. Will do!
| std::optional<time_t> getPartitionExportTTLMax( | ||
| const TTLDescription & desc, | ||
| const DataPartsVector & parts_in_partition, | ||
| std::vector<String> * missing_parts_out = nullptr); |
There was a problem hiding this comment.
I searched for occurences of getPartitionExportTTLMax and none seem to actually use the last argument. Can't we just remove it?
There was a problem hiding this comment.
Thanks, good catch!
| std::optional<TTLExportScheduler::TtlMarker> TTLExportScheduler::findTtlMarker( | ||
| const String & dest_database, const String & dest_table) | ||
| { | ||
| std::lock_guard lock(storage.export_merge_tree_partition_mutex); |
There was a problem hiding this comment.
oh boy, everywhere we r grabbing this lock. I am so freakin afraid of a deadlock or contention problems
The schema-validation extraction in 77df5e6 moved the column-diff and the non-data-lake partition-key compare to DDL time, but left iceberg partition compatibility verified only at submission time. A user who declares an `EXPORT TO TABLE` TTL against an Iceberg destination with an incompatible partition spec gets the rejection only when the scheduler eventually fires, which can be days or weeks after the ALTER. Add `verifyIcebergPartitionCompatibilityAtDestination` to `ExportPartitionUtils` (fetch the destination iceberg metadata + delegate to the existing `verifyIcebergPartitionCompatibility`) and call it from `TTLTableDescription::getTTLForTableFromAST` for data-lake destinations, gated on `!is_attach` like the sibling column-diff check. The submission-time call at `StorageReplicatedMergeTree.cpp:8638` is left in place as the moving-target defense: Iceberg supports schema and partition spec evolution (`ALTER TABLE ... ADD/DROP PARTITION FIELD` from Spark/Trino), and TTL EXPORT runs on a horizon where such evolution is realistic. The post-DDL drift case would otherwise slip past — Iceberg's optimistic concurrency only catches divergence between submission and commit, not between DDL and submission, so a partition-spec change before the scheduler fires could land data under the wrong partition layout with no error. The `EXPORT PART` path in `MergeTreeData.cpp:6744` is unchanged: it has no DDL-time hook (manual single-part export), and submission is the only entry point so there is no DDL→submit gap to bridge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. Stateless test that checks that altering source table to an incompatible schema fails on DDL-time check 2. Integration test that checks that altering _destination_ table to an incompatible schema leads to export partition-level error and doesn't crash the server
…a-26.3/ttl-export-partition
Fixes #1793
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
ReplicatedMergeTree gains support for TTL EXPORT TO TABLE
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: