Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
5f38216
Add prototype paramdb api that allows conversion of ECMWF shortname t…
awarde96 Mar 19, 2026
d64ad19
Add tests for new paramDB api
awarde96 Mar 19, 2026
c2b7ee8
Merge branch 'develop' into feature/paramdb-api
awarde96 Mar 19, 2026
a8b8e31
Move generated yaml files from pymetkit to share
awarde96 Apr 8, 2026
dabc222
Add initial code for caching params when using online mode
awarde96 Apr 8, 2026
192e642
Allow users to pass in a yaml file to be used as paramDB instead of o…
awarde96 Apr 8, 2026
9c374d6
Address Copilot PR #185 review comments
awarde96 Apr 10, 2026
63f51f7
Allow context when converting from shortname to id, using center or t…
awarde96 May 18, 2026
0276e49
Merge branch 'develop' into feature/paramdb-api
awarde96 May 22, 2026
5bfdf4e
Allow use of origin to avoid parameter id clashes from shortname, def…
awarde96 May 22, 2026
8998c5f
Merge branch 'feature/paramdb-api' of github.com:ecmwf/metkit into fe…
awarde96 May 22, 2026
a5370ef
Add schema for parameter entries, add workflow for clashing variables…
awarde96 Jun 1, 2026
abbc236
feat: symlink parameter_metadata.yaml from share/metkit into package
HCookie Jun 12, 2026
53d5ccc
Make ParamDB importable when C library is unavailable; add benchmark …
awarde96 Jun 16, 2026
81f5c18
Suppress noisy print() on missing symbols; collect count into CFFIMod…
awarde96 Jun 16, 2026
9885271
Add offline vs online per-method comparison table to benchmark when u…
awarde96 Jun 16, 2026
e622f7f
Add json version of parameters for quicker loading, update paramdb to…
awarde96 Jun 17, 2026
917d33b
Merge pull request #225 from ecmwf/feature/paramdb-yaml-symlink
awarde96 Jun 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,10 @@ requires-python = ">=3.10"
dependencies = [
"cffi",
"metkitlib",
"findlibs"
"findlibs",
"pyyaml",
"requests",
"platformdirs",
]

[tool.setuptools.dynamic]
Expand All @@ -39,7 +42,8 @@ zip-safe = false
[tool.setuptools.package-data]
"pymetkit" = [
"VERSION",
"metkit_c.h"
"metkit_c.h",
"parameter_metadata.yaml"
]

[project.optional-dependencies]
Expand Down
200 changes: 186 additions & 14 deletions python/pymetkit/README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,200 @@
# pymetkit

This repository contains an Python interface to the MetKit library for parsing MARS requests.
Python interface to the MetKit library for parsing MARS requests and looking up
ECMWF parameter metadata.

## Example
---

The function for parsing a MARS request is `metkit.parse_mars_request` which accepts a string or file-like object
as inputs. A list of `metkit.mars.Request` instances are returned, which is a dictionary containing the keys and
values in the MARS request and the attribute `verb` for the verb in the MARS request.
## MARS request parsing

### From String
```
from metkit import parse_mars_request
`parse_mars_request` accepts a string or file-like object and returns a list of
`MarsRequest` instances.

### From a string

```python
from pymetkit import parse_mars_request

request_str = "retrieve,class=od,date=20240124,time=12,param=129,step=12,target=test.grib"
requests = parse_mars_request(requests)
requests = parse_mars_request(request_str)

print(requests[0])
# verb: retrieve, request: {'class': ['od'], 'date': ['20240124'], 'time': ['1200'], 'param': ['129'], 'step': ['12'], 'target': ['test.grib'], 'domain': ['g'], 'expver': ['0001'], 'levelist': ['1000', '850', '700', '500', '400', '300'], 'levtype': ['pl'], 'stream': ['oper'], 'type': ['an']}
# verb: retrieve, request: {'class': ['od'], 'date': ['20240124'], ...}
```

### From File
If the MARS request is contained inside a file, e.g. test_requests.txt:
```
from metkit import parse_mars_request
### From a file

```python
from pymetkit import parse_mars_request

requests = parse_mars_request(open("test_requests.txt", "r"))
```

---

## ParamDB

`ParamDB` provides parameter ID ↔ shortname ↔ longname lookups backed by the
ECMWF parameter database.

### Quick start (offline, bundled data)

```python
from pymetkit import ParamDB

db = ParamDB()

db.shortname_to_param_id("t") # → 130
db.param_id_to_shortname(130) # → "t"
db.shortname_to_longname("t") # → "Temperature"
db.param_id_to_longname(130) # → "Temperature"
db.get_units(130) # → "K"
db.get_metadata(130) # → full metadata dict
```

### Collision resolution

Some shortnames appear in more than one GRIB table or originating centre.
Pass `table=`, `origin=`, or `access=` to disambiguate:

```python
# Default: prefers dissemination params → ECMWF origin → lowest id
db.shortname_to_param_id("tp") # → 228

# Explicit table override
db.shortname_to_param_id("tp", table=228) # → 228228

# Explicit origin (98 = ECMWF)
db.shortname_to_param_id("t", origin=98) # → 130

# Access filter
db.shortname_to_param_id("tp", access="dissemination") # → 228

# Inspect all candidates for a colliding shortname
db.get_all_by_shortname("tp")
# [{'id': 228, 'shortname': 'tp', ...}, {'id': 228228, 'shortname': 'tp', ...}]

db.shortname_has_collisions("tp") # → True
db.shortname_has_collisions("t") # → False (only one candidate)
```

### Online mode (live API + local cache)

```python
db = ParamDB(mode="online") # fetches from codes.ecmwf.int
db = ParamDB(mode="online", cache_ttl=timedelta(hours=6)) # custom TTL
db = ParamDB(mode="online", cache_path="/tmp/myapp") # custom cache dir
```

---

## Custom parameter YAML

You can extend or replace the bundled database with your own YAML file.

### Loading a custom YAML

```python
db = ParamDB(yaml_path="my_params.yaml")
```

The file is loaded instead of the bundled `parameter_metadata.yaml`.
You can mix custom parameters alongside the bundled ones by loading in two
passes, but `ParamDB` does not merge files automatically — for that, concatenate
your YAML list with the bundled data before passing it in.

### YAML schema

Each entry is a YAML mapping. The fields are:

| Field | Type | Required | Default | Description |
|--------------|----------------|----------|-------------|--------------------------------------------------|
| `id` | integer | ✓ | | Unique numeric param ID |
| `shortname` | string | ✓ | | Short identifier (e.g. `"t"`, `"myvar"`) |
| `longname` | string | ✓ | | Human-readable description |
| `units` | string | | `"unknown"` | Physical units (e.g. `"K"`, `"m s**-1"`) |
| `origin_ids` | list of int | | `[]` | WMO originating centre IDs (98 = ECMWF, 0 = WMO) |
| `access_ids` | list of string | | `[]` | Access tags (e.g. `"dissemination"`, `"research"`) |

Extra fields are allowed and are preserved in `get_metadata()` output.

The following raw API spellings are also accepted and normalised automatically:
`shortName`, `short_name`, `longName`, `long_name`, `name`.

**Avoid ID collisions** with the official ECMWF database by using IDs above
`900000` for your own parameters.

### Minimal example

```yaml
# my_params.yaml
- id: 900001
shortname: myvar
longname: My Custom Variable

- id: 900002
shortname: myflux
longname: My Custom Surface Flux
units: W m**-2
origin_ids: [98]
access_ids: [research]
```

A fully-annotated starter file is provided at
`share/metkit/custom_param_example.yaml`.

### Validating entries against the schema

Use `ParameterEntry` (a Pydantic v2 model) to validate your YAML before loading:

```python
import pydantic
import yaml
from pymetkit import ParameterEntry

entries = yaml.safe_load(open("my_params.yaml"))

for raw in entries:
try:
ParameterEntry.model_validate(raw)
except pydantic.ValidationError as exc:
print(f"Invalid entry (id={raw.get('id')}): {exc}")
```

`ParameterEntry.model_validate` raises `pydantic.ValidationError` if:
- `id`, `shortname`, or `longname` is missing or empty
- `id` cannot be coerced to an integer
- `origin_ids` contains non-integer values

Valid entries are coerced silently (e.g. a string `"130"` for `id` becomes
`130`, `None` for `units` becomes `"unknown"`).

### Machine-readable JSON schema

A JSON Schema file is published at `share/metkit/parameter_entry_schema.json`.
You can use it with any JSON Schema–compliant validator:

```python
import json
import jsonschema
import yaml

schema = json.load(open("share/metkit/parameter_entry_schema.json"))
entries = yaml.safe_load(open("my_params.yaml"))

for entry in entries:
jsonschema.validate(instance=entry, schema=schema)
```

The schema file is regenerated automatically when you run
`generate_parameter_metadata.py`.

VS Code users: add a `# yaml-language-server: $schema=...` comment at the top
of your YAML file to get inline validation and autocompletion:

```yaml
# yaml-language-server: $schema=../../share/metkit/parameter_entry_schema.json
- id: 900001
shortname: myvar
longname: My Custom Variable
```
Loading
Loading