It is difficult to debug issues with exports because information is scattered across various system tables and storage locations.
For example, to see parquet files for a certain table on S3, I need to know its storage location and use something like the following to list them (applies to metadata.json files as well where **.parquet needs to be changed to **.metadata.json:
SELECT _path
FROM s3('s3://protvjoe-1lfyj7kp-iceberg/default/destination12**.parquet', One)
ORDER BY _path ASC
And for information about each written iceberg snapshot, I query system.iceberg_history:
SELECT *
FROM system.iceberg_history
WHERE `table` = 'default.destination12'
ORDER BY made_current_at ASC
For per-partition info (export status, source replica, etc.):
SELECT *
FROM system.replicated_partition_exports
WHERE (source_database = 'default') AND (source_table = 'source14') AND (destination_database = 'export-test') AND (destination_table = 'default.destination14')
ORDER BY partition_id ASC
And there's also system.part_log.
The point is, it is confusing for users to debug anything related to exports. It feels a bit like a scavenger hunt for information. I'm not sure what the best solution would be for this, but it would be nice to expose table, parquet, and metadata file locations in some way that doesn't flood system tables with too much info, but also doesn't involve guessing if there was an issue with storage setup, ice version, or something else by counting parquet files.
Maybe someone has an idea about how to better organize information?
It is difficult to debug issues with exports because information is scattered across various system tables and storage locations.
For example, to see parquet files for a certain table on S3, I need to know its storage location and use something like the following to list them (applies to metadata.json files as well where **.parquet needs to be changed to **.metadata.json:
And for information about each written iceberg snapshot, I query system.iceberg_history:
For per-partition info (export status, source replica, etc.):
And there's also system.part_log.
The point is, it is confusing for users to debug anything related to exports. It feels a bit like a scavenger hunt for information. I'm not sure what the best solution would be for this, but it would be nice to expose table, parquet, and metadata file locations in some way that doesn't flood system tables with too much info, but also doesn't involve guessing if there was an issue with storage setup, ice version, or something else by counting parquet files.
Maybe someone has an idea about how to better organize information?