Skip to content

Difficult to track/debug export info #1824

@DimensionWieldr

Description

@DimensionWieldr

It is difficult to debug issues with exports because information is scattered across various system tables and storage locations.

For example, to see parquet files for a certain table on S3, I need to know its storage location and use something like the following to list them (applies to metadata.json files as well where **.parquet needs to be changed to **.metadata.json:

SELECT _path
FROM s3('s3://protvjoe-1lfyj7kp-iceberg/default/destination12**.parquet', One)
ORDER BY _path ASC

And for information about each written iceberg snapshot, I query system.iceberg_history:

SELECT *
FROM system.iceberg_history
WHERE `table` = 'default.destination12'
ORDER BY made_current_at ASC

For per-partition info (export status, source replica, etc.):

SELECT *
FROM system.replicated_partition_exports
WHERE (source_database = 'default') AND (source_table = 'source14') AND (destination_database = 'export-test') AND (destination_table = 'default.destination14')
ORDER BY partition_id ASC

And there's also system.part_log.

The point is, it is confusing for users to debug anything related to exports. It feels a bit like a scavenger hunt for information. I'm not sure what the best solution would be for this, but it would be nice to expose table, parquet, and metadata file locations in some way that doesn't flood system tables with too much info, but also doesn't involve guessing if there was an issue with storage setup, ice version, or something else by counting parquet files.

Maybe someone has an idea about how to better organize information?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions