Skip to content

MRB-720 Add option to read ICON baselines directly from GRIBs#153

Merged
dnerini merged 8 commits into
mainfrom
MRB-720-Read-baselines-directly-from-GRIB
May 21, 2026
Merged

MRB-720 Add option to read ICON baselines directly from GRIBs#153
dnerini merged 8 commits into
mainfrom
MRB-720-Read-baselines-directly-from-GRIB

Conversation

@dnerini
Copy link
Copy Markdown
Member

@dnerini dnerini commented May 17, 2026

This PR adds the option to read ICON-CH1/2-EPS surface GRIB files directly from the operational archive. It also removes the legacy zarr reader for baselines and consequently all cosmo-based config files.

Results

Quick test shows no difference in results between the existing zarr and the new grib readers
image

Performance-wise, it doesn't seem to make a big difference, which I find a bit odd, so I'll need to have a closer look.

2026-05-13 23:12:00,289 - data_input - INFO - Loading baseline forecasts from ICON GRIB archive...
2026-05-13 23:12:00,292 - data_input - INFO - Reading ICON archive from /store_new/mch/msopr/osm/ICON-CH1-EPS/FCST25/25030100_638
2026-05-13 23:12:39,291 - __main__ - INFO - Loaded forecast data in 39.007409 seconds
2026-05-13 23:12:00,284 - data_input - INFO - Loading baseline forecasts from zarr dataset...
2026-05-13 23:13:17,642 - __main__ - INFO - Loaded forecast data in 77.357972 seconds
image

Open questions

Follow-up PRs

  • extend method to read any arbitrary member
  • extend method to read the pre-computed median
  • extend method to compute ensemble mean

@dnerini dnerini requested review from frazane, jonasbhend and teobuz and removed request for frazane May 17, 2026 19:00
@dnerini dnerini changed the title Add option to read ICON baselines directly from GRIBs MRB-720 Add option to read ICON baselines directly from GRIBs May 18, 2026
@jonasbhend
Copy link
Copy Markdown
Contributor

@dnerini thanks heaps for this. here's my two cents:

  1. yes, I would deprecate meteodata-lab. From what I can tell we only use it to find the ICON grid lat/lon on balfrin, which probably could be hard-coded / configured.
  2. don't know how hard it would be to raise to earthkit-data v1, if this is rather straight-forward why not.
  3. My suspicion is that zarr is really slow due to the lack of consolidated metadata. If consolidated metadata are indeed available, this is indeed a bit curious.
  4. Should we also remove the dataset creation scripts, or do you want to leave these in for now?

Copy link
Copy Markdown
Contributor

@jonasbhend jonasbhend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dnerini

One final thought. I would expect that there is quite some redundancy between load_fct_data_from_grib and load_baseline_from_grib. In essence the only thing that should be necessary is to generalize the way grib files are found, then one could use load_fct_data_from_grib for both ML forecasts and baseline gribs. I would much prefer to do it that way to facilitate future developments and reduce redundancy. Or is there a blocker I am overlooking?

dnerini added 2 commits May 19, 2026 22:05
And consequently remove legacy cosmo configs that are no longer supported
@dnerini
Copy link
Copy Markdown
Member Author

dnerini commented May 19, 2026

note that commit 8619fea removes all legacy cosmo configs

Comment thread src/data_input/__init__.py Outdated
@jonasbhend
Copy link
Copy Markdown
Contributor

jonasbhend commented May 20, 2026

@dnerini very nice, I like it a lot! I suggest to tackle harmonization of data_input across experiments and showcases in a separate PR to benefit from the work being done here.

@dnerini dnerini marked this pull request as ready for review May 20, 2026 11:05
@dnerini
Copy link
Copy Markdown
Member Author

dnerini commented May 20, 2026

Need to fix some tests, but after that I would maybe suggest to merge and leave the TODOs to future PRs

@dnerini dnerini requested review from cosunae and jonasbhend May 21, 2026 06:36
@jonasbhend
Copy link
Copy Markdown
Contributor

Need to fix some tests, but after that I would maybe suggest to merge and leave the TODOs to future PRs

fine with me. Just out of curiosity, does the current implementation already support reading ensembles?

@dnerini
Copy link
Copy Markdown
Member Author

dnerini commented May 21, 2026

Just out of curiosity, does the current implementation already support reading ensembles?

not yet, no, let's add that in a separate PR

Copy link
Copy Markdown
Contributor

@jonasbhend jonasbhend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I didn't have time to run the example notebook. Let me know if I should give it a spin.

@frazane
Copy link
Copy Markdown
Contributor

frazane commented May 21, 2026

Performance-wise, it doesn't seem to make a big difference, which I find a bit odd, so I'll need to have a closer look.

It's really hard to do a proper comparison, especially when submitting jobs on Balfrin, because scheduling is not optimal and your jobs can randomly contend resources with others. Did you try running a quick comparison on the login node?

@dnerini dnerini merged commit ce3591a into main May 21, 2026
4 checks passed
@dnerini dnerini deleted the MRB-720-Read-baselines-directly-from-GRIB branch May 21, 2026 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants