Explore integration with Icechunk data engine

My vision for this package is that would work seamlessly in cooperation with a local and/or remote high performance data catalog and store (i.e. data engine). Presently, the [Icechunk](https://icechunk.io/en/latest/) cloud-native transactional tensor storage engine is the most promising option, as it was recently open-sourced by EarthMover as the source code behind their ArrayLake services.

An ideal work flow would be to:
- User requests a dataset from a well-known data repository for a specific area of interest.
  - These well-known data repos will be cataloged here in a yaml file, and optionally referenced with [Kerchunk](https://fsspec.github.io/kerchunk/index.html) or [VirtualiZarr](https://virtualizarr.readthedocs.io/).
- This package first checks if the specific dataset has already been fetched and saved to a local Icechunk instance.
- If not, it fetches the specific dataset from the source repository, saving it locally in it's native format.
- If the user expects to reuse the data, they can choose to convert the dataset into a cloud-optimized, analysis-ready (ARCO) zarr3 dataset within Icechunk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore integration with Icechunk data engine #5

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Explore integration with Icechunk data engine #5

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions