Skip to content

Antalya 26.3: Fix empty partition_key and sorting_key in system.tables for Iceberg tables without data snapshots#1819

Open
il9ue wants to merge 1 commit into
Altinity:antalya-26.3from
il9ue:fix/antalya/1235-glue-partition-sorting-keys
Open

Antalya 26.3: Fix empty partition_key and sorting_key in system.tables for Iceberg tables without data snapshots#1819
il9ue wants to merge 1 commit into
Altinity:antalya-26.3from
il9ue:fix/antalya/1235-glue-partition-sorting-keys

Conversation

@il9ue
Copy link
Copy Markdown

@il9ue il9ue commented May 21, 2026

Closes #1235.

Summary

SELECT partition_key, sorting_key FROM system.tables returned empty strings for Iceberg tables that had no data snapshot. This was reliably observable for tables accessed via the Glue catalog (since Glue's metadata_location more frequently points at a snapshot-free metadata file), but also reproduced for any empty Iceberg table regardless of catalog (REST, Glue, or direct IcebergS3).

Root cause

IcebergMetadata::partitionKey() and IcebergMetadata::sortingKey() (introduced in #959, refined in #1026, ported to 25.8 in #1095) gated their work on the existence of a data snapshot:

auto [actual_data_snapshot, actual_table_state_snapshot] = getRelevantState(context);
if (!actual_data_snapshot)
    return std::nullopt;

This is semantically wrong. Partition spec and sort order are table-level properties recorded at the top level of the Iceberg metadata file (default-spec-id, default-sort-order-id, partition-specs, sort-orders) and exist independently of whether any data snapshot has been written. Code inspection of getState() confirms that actual_table_state_snapshot is fully populated (schema_id, metadata_file_path, metadata_version) regardless of whether a snapshot exists; only actual_table_state_snapshot.snapshot_id is std::nullopt, and that field is never read by getPartitionKey() or getSortingKey().

The gate was therefore dead-gating valid data. The fix removes it.

Change list

  • src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp
    • partitionKey(): removed the if (!actual_data_snapshot) early return.
    • sortingKey(): removed the if (!actual_data_snapshot) early return.
  • src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp
    • getSortingKeyDescriptionFromMetadata(): added a defensive has() guard for sort-orders and default-sort-order-id. This is a pre-existing null-deref that was previously unreachable in practice (always behind the snapshot gate); after removing the gate, empty Iceberg V1 tables without sort-orders would have hit it. The guard mirrors the shape already present in getSortingKeyDisplayStringFromMetadata.
  • Test added (see below).

No header changes. No StorageSystemTables.cpp changes — the existing null/exception guards added for #1210 (Glue segfault) remain untouched.

Behavior preservation

  • REST path output for already-working (non-empty) tables is byte-identical, including the PR Follow-up for #959: proper spaces, remove nulls order info #1026 formatting (proper spaces, no NULLS FIRST/NULLS LAST).
  • Tables with no partitioning still return empty string (getPartitionKeyStringFromMetadata already guards on missing partition-specs).
  • Non-Iceberg data lakes (Hudi, Paimon, Delta Lake) are untouched — the virtual defaults in IDataLakeMetadata are unchanged.
  • The defensive guards added for Segfault when getting info from Glue catalog #1210 in StorageSystemTables.cpp remain in place.

Out of scope

Glue's metadata_location pointer can lag schema-evolution events, which could cause partition_key / sorting_key to reflect a stale spec. This is orthogonal to the snapshot gate and is not addressed by this PR.

Test plan

New regression test reproduces Root Cause A without needing any catalog mock: creates an Iceberg table with a non-trivial partition spec and sort order, asserts system.tables.partition_key and system.tables.sorting_key are non-empty before any data is inserted.

Existing test_system_tables_partition_sorting_keys in tests/integration/test_storage_iceberg_with_spark/test_system_iceberg_metadata.py continues to pass with byte-identical output.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fixed system.tables.partition_key and system.tables.sorting_key returning empty strings for Iceberg tables that have no data snapshot, including all empty tables and (more frequently) tables accessed via the Glue catalog. Also added a defensive guard against Iceberg V1 metadata files missing sort-orders.

Documentation entry for user-facing changes

Not required — this is a bug fix to existing system.tables columns; no new user-facing surface.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

    system.tables for Iceberg tables without data snapshots

    Changelog category: Bug Fix
    Changelog entry: Fixed `system.tables.partition_key` and
    `system.tables.sorting_key` returning empty strings for
    Iceberg tables that have no data snapshot, including all
    empty tables and (more frequently) tables accessed via the
    Glue catalog. The snapshot-existence gate in
    IcebergMetadata::partitionKey() / sortingKey() was
    semantically wrong: partition spec and sort order are
    table-level properties recorded at the top level of the
    Iceberg metadata file (`default-spec-id`,
    `default-sort-order-id`) and exist independently of
    whether any data snapshot has been written. Also adds a
    defensive guard in getSortingKeyDescriptionFromMetadata
    against Iceberg V1 metadata files missing `sort-orders`,
    which becomes reachable for empty tables after this fix.
    Closes ClickHouse#1235.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants