This repository contains experimental tooling for transforming legacy WMDR 1.0 XML records into a simplified WMDR10 JSON representation and then into the draft WMDR2 v0.2.4.2 JSON representation.
The WMDR2 output is a facility-centric full record encoded as a GeoJSON-like Feature. WMDR2-specific content is stored in properties. Output files use the .json extension.
- OGC API - Records - Part 1: Core
- WMO-No. 1192 WIGOS Metadata Standard
- WMDR 1.0 schemas
- WMDR2 draft UML aligned to OMS
The workflow has two active stages.
python convert_wmdr10_xml_to_wmdr10_json.pyor explicitly:
python convert_wmdr10_xml_to_wmdr10_json.py \
--config config.yaml \
--source resources/wmdr10_xml_examples \
--target resources/wmdr10_json_examplesThis stage simplifies the XML representation while preserving relevant WMDR content. XML/GML bookkeeping identifiers are stripped from container and descriptive objects, but source identifiers are preserved for referenceable WMDR entities such as deployments, contacts, equipment, or instruments when present.
python convert_wmdr10_json_to_wmdr2_json.pyor explicitly:
python convert_wmdr10_json_to_wmdr2_json.py \
--config config.yaml \
--source resources/wmdr10_json_examples \
--target results/wmdr2_json_examplesThis stage writes one WMDR2 full-record JSON file per facility. The converter accepts observationSeries input and also accepts older intermediate inputs that still use observations; output is always written as properties.observationSeries.
Each WMDR2 output file is a facility-centric JSON Feature.
{
"type": "Feature",
"id": "facility:0-20000-0-06725",
"geometry": {
"type": "Point",
"coordinates": [7.8232, 46.4204, 1540]
},
"temporalGeometry": {
"type": "MovingPoint",
"coordinates": [
[7.823197, 46.420453, 1538],
[7.8232, 46.4204, 1540]
],
"dates": ["2000-08-17", "2024-01-17"],
"methods": [[], ["gps"]]
},
"time": {
"interval": ["2000-08-17", "2025-05-28"]
},
"conformsTo": [
"http://wigos.wmo.int/spec/wmdr/2/conf/core"
],
"properties": {
"type": "facility",
"title": "Blatten",
"created": "2025-07-29T00:00:00Z",
"updated": "2025-07-29T00:00:00Z",
"facilitySets": ["facilitySet:gaw"],
"keywords": ["0-20000-0-06725", "Blatten"],
"programAffiliation": [],
"territory": [],
"environment": [],
"observationSeries": [],
"deployments": [],
"reporting": [],
"schedules": [],
"instruments": []
}
}type: alwaysFeature.id: facility identifier, usually based on the WIGOS station identifier.geometry: current GeoJSON point geometry, derived from the most recent known coordinates.temporalGeometry: optional WMDR2MovingPointcoordinate history. It is the only temporal object that uses alignedcoordinates,dates, and optionalmethodsarrays.time: facility lifecycle interval using date resolution only. Unknown bounds are represented with...conformsTo: declares the WMDR2 core conformance class. The only allowed value ishttp://wigos.wmo.int/spec/wmdr/2/conf/core.properties: contains facility-level metadata and reusable registries for observation series, deployments, reporting definitions, schedules, instruments, contacts, environment, territory, and programme affiliation.
externalIds is not emitted because it repeats the feature id.
temporalGeometry is the special trajectory object and remains an aligned-array MovingPoint.
"temporalGeometry": {
"type": "MovingPoint",
"coordinates": [
[7.823197, 46.420453, 1538],
[7.8232, 46.4204, 1540]
],
"dates": ["2000-08-17", "2024-01-17"],
"methods": [[], ["gps"]]
}Other time-varying concepts use arrays of dated objects. Examples include environment, programAffiliation, territory, deployments, reporting, observingProcedures, and officialStatus.
A facility record references one or more logical facility groupings with facilitySets.
"facilitySets": ["facilitySet:oscar-station-cluster-5"]The referenced facility-set records are kept in a separate facility-set catalogue and are validated by schemas/wmdr2-facility-sets.schema.json. A facility set is useful when several facilities need to be treated as one logical entity, for example when a dataset or observing record spans nearby sites or when downstream discovery metadata should point to a group rather than enumerate each facility separately.
A real example is OSCAR/Surface station-cluster report 5, which groups facilities in the Arosa and Davos area. In WMDR2 this can be represented as a facility set to highlight the continuity of the total column ozone observations that started in Arosa and are being continued in Davos.
{
"facilitySets": [
{
"id": "facilitySet:oscar-station-cluster-5",
"title": "Arosa/Davos total column ozone facilities",
"description": "Grouping of facilities in either the Arosa or Davos area, and to highlight the link between the total column ozone observations started in Arosa and being continued in Davos.",
"links": [
{
"rel": "canonical",
"type": "text/html",
"title": "OSCAR/Surface station cluster report 5",
"href": "https://oscar.wmo.int/surface/#/search/stationClusterReport/5"
}
]
}
]
}The facility-set identifier is the value used by facility records. The external OSCAR/Surface URL remains a link on the facility-set catalogue entry. The singular facilitySet property is obsolete.
Facility-level programme affiliation is represented by properties.programAffiliation, an array of dated objects.
"programAffiliation": [
{
"date": "2000-08-17",
"programAffiliation": "GOSGeneral",
"reportingStatus": "operational",
"programSpecificFacilityId": "GOS-06725"
},
{
"date": "2022-09-08",
"programAffiliation": "GBON",
"reportingStatus": "operational"
}
]programSpecificFacilityId is retained when present.
Facility territory history is represented by properties.territory.
"territory": [
{
"date": "2000-08-17",
"territory": "CHE"
}
]Facility environmental context is represented by properties.environment, an array of dated Environment objects.
"environment": [
{
"date": "1980-01-01",
"climateZone": "Cfb",
"surfaceCover": "urbanBuiltup"
},
{
"date": "1990-01-01",
"population": [1000.0, null],
"perimeter_km": [10.0, 50.0]
},
{
"date": "1991-01-01",
"surfaceRoughness": "rough"
},
{
"date": "..",
"topographyBathymetry": {
"localTopography": "flat",
"relativeElevation": "middle",
"topographicContext": "rises",
"altitudeOrDepth": "veryHighAltitude"
}
}
]Obsolete names such as historicalEnvironment, temporalPopulation, temporalPopulationDensities, and temporalTopographyBathymetry are not emitted.
Observation-series records are stored under properties.observationSeries.
{
"id": "observationSeries:12006",
"title": "domain: atmosphere; geometry: point; variable: 12006",
"time": {"interval": ["2016-04-29", ".."]},
"observedProperty": 12006,
"observedGeometry": "point",
"observedFeature": {
"domain": "atmosphere",
"domainFeature": "near-surface-air",
"featureName": "2 m air"
},
"applicationArea": ["weatherForecasting"],
"programAffiliation": ["GAWregional"],
"sourceOfObservation": "automaticReading",
"referenceSurface": "localGround",
"representativeness": "local",
"verticalDistanceFromReferenceSurface": {
"value": 2.0,
"uom": "m"
},
"observingMethods": [
{"date": "1980-01-01", "observingMethod": 266},
{"date": "2001-01-01", "observingMethod": 267}
],
"officialStatus": [
{
"date": "2020-01-01",
"officialStatus": "primary"
}
],
"deployments": ["deployment:dep-1"],
"reporting": [
{
"date": "2020-01-01",
"strategy": "unknown",
"reporting": "reporting:hourly-level1-open",
"uom": "K"
}
],
"observingProcedures": [
{
"date": "2020-01-01",
"strategy": "continuous",
"observingSchedules": ["schedule_8fd3e0f1094a"]
}
]
}Important conventions:
- The identifier prefix is
observationSeries:. observedFeature.domainis used for the broad domain such asatmosphere,terrestrial,ocean, orearth.observedFeature.domainFeatureandobservedFeature.featureNameare optional enrichment fields.- Observation-series programme affiliation uses
programAffiliation, notprogramAffiliations. officialStatusis an array of dated objects. WMDR10 booleanofficialStatusvalues map toprimaryfortrueandadditionalforfalse.deploymentsis an array of references toproperties.deployments[*].id.reportingis an array of datedReportingProcedureobjects that reference reusable reporting definitions.observingMethodsis the dated observing-method history for the observation series. Each entry hasdateand mandatoryobservingMethod; use a nil-reason object when the method is unknown.observingProceduresis an array of datedObservingProcedureobjects. Each observing procedure carries astrategyand references one or more reusable schedules throughobservingSchedules.
Obsolete observation-series names such as observations, observedVariable, observedDomain, domain, domainName, historicalDeployments, historicalReporting, historicalOfficialStatus, observingSchedules, and programAffiliations are not emitted.
Reporting definitions are reusable facility-level objects stored under properties.reporting.
"reporting": [
{
"id": "reporting:hourly-level1-open",
"internationalExchange": true,
"temporalAggregate": "PT1H",
"levelOfData": "level1",
"dataPolicy": {
"dataPolicy": "noLimitation",
"attribution": {
"originator": {
"role": null
}
}
},
"timeliness": "PT30M"
}
]Observation-series reporting history is stored in observationSeries[*].reporting as dated ReportingProcedure objects.
"reporting": [
{
"date": "2020-01-01",
"strategy": "unknown",
"reporting": "reporting:hourly-level1-open",
"uom": "K"
}
]This split allows one reporting definition to be reused by multiple observation series. Observation-series-specific values such as uom and links remain on the dated ReportingProcedure object. The v0.2.4.2 model requires a reporting procedure strategy; the converter uses a source strategy when one is available and otherwise emits unknown.
Deployments are reusable, dated instrument-instance/state objects stored under properties.deployments.
"deployments": [
{
"id": "deployment:dep-1",
"date": "2020-01-01",
"instrument": "instrument:aws-001-temperature-sensor",
"serialNumber": "SN-001",
"operatingStatus": "operational",
"exposure": "good",
"geometry": {
"type": "Point",
"coordinates": [7.0, 46.0, 502]
}
}
]The same deployment may be referenced by several observation series.
"observationSeries": [
{
"id": "observationSeries:12006",
"deployments": ["deployment:dep-1"]
},
{
"id": "observationSeries:12001",
"deployments": ["deployment:dep-1"]
}
]Important conventions:
deployment.instrumentis a scalar reference to one instrument identifier, not a one-element array.deployment.dateis required.serialNumber,operatingStatus,exposure, andgeometrydescribe the dated deployment state.- Deployment records do not carry
title,type,manufacturer, ormodelproperties. - Obsolete names such as
temporalSerialNumber,serialNumbers,temporalOfficialStatus, andtemporalGeometryare not emitted on deployments.
Instruments are reusable catalogue objects stored under properties.instruments.
"instruments": [
{
"id": "instrument:aws-001-temperature-sensor",
"title": "Temperature sensor",
"description": "Automatic air-temperature sensor.",
"manufacturer": "Vaisala",
"model": "HMP155",
"observingMethods": [266],
"verticalRange": {
"min": 0,
"max": 30
},
"observedProperty": [12006],
"observedGeometry": "point"
}
]Manufacturer, model, optional title, optional description, optional vertical range, and optional instrument capability information belong on the instrument. Serial number belongs on the deployment. Instrument observingMethods is a compact list of method values only when the catalogue instrument type is known and the method capability is known:
"observingMethods": [266, 267]Do not create an instrument catalogue entry merely to carry an observing method. If make/model are unknown and the serial number is not documented, the observing method can be represented on the observation series alone.
The authoritative observing-method information for users is the dated history on ObservationSeries:
"observingMethods": [
{"date": "1980-01-01", "observingMethod": 266},
{"date": "2001-01-01", "observingMethod": 267}
]This answers the important question of how the observation series was generated and when the method changed. In v0.2.4.2 the observingMethod value in each history entry is mandatory. When the method is unknown for a known period, use a nil-reason object:
"observingMethods": [
{"date": "1980-01-01", "observingMethod": {"nilReason": "unknown"}}
]ObservingMethod is not carried by Deployment. A deployment is the act/state of placing an instrument instance and making it contribute to one or more observation series. The same deployment can support two observation series that use different methods for different observed properties. ObservingProcedure is also not used to carry the observing method; it is the dated procedure/strategy object that links an observation series to one or more observing schedules.
Schedules are reusable JSCalendar-like objects stored under properties.schedules. Observation series use them through dated ObservingProcedure objects under observationSeries[*].observingProcedures.
"schedules": [
{
"@type": "Event",
"uid": "schedule_8fd3e0f1094a",
"start": "0001-01-01T00:00:00",
"timeZone": "UTC",
"duration": "PT1H",
"recurrenceRules": [
{
"@type": "RecurrenceRule",
"frequency": "hourly"
}
],
"wmi.int:samplingFrequency": "PT1H",
"wmo.int:aggregationInterval": "PT1H"
}
]The v0.2.4.2 XMI names the sampling extension wmi.int:samplingFrequency; the converter follows that spelling. Aggregation interval is emitted as wmo.int:aggregationInterval. The older nested wmo.int:aggregation object is not emitted.
"observingProcedures": [
{
"date": "2020-01-01",
"strategy": "continuous",
"observingSchedules": ["schedule_8fd3e0f1094a"]
}
]ObservingProcedure.strategy is required by the model. The converter uses source samplingStrategy / observingStrategy when available and otherwise emits unknown.
Contacts are stored in properties.contacts. A contact may include an id, organization, name, position, emails, phones, links, and roles.
{
"id": "contact:owner:rmi",
"organization": "Royal Meteorological Institute of Belgium",
"roles": ["owner"]
}When catalogue post-processing is enabled, contacts and instruments can be externalized to catalogue files. The rewritten facility records retain minimal inline contact references and remove inline properties.instruments. Deployment instrument references remain scalar strings.
The WMDR2 JSON output stores compact code-list values where possible, not full code-list URLs.
{
"observedProperty": 12006,
"observedFeature": {"domain": "atmosphere"},
"facilityType": "landFixed",
"wmoRegion": "europe"
}Validation against WMO code lists is expected to be handled by validators that know which code list applies to each property.
keywords are retained as lightweight discovery text only when configured. If the converter section has no discovery block, built-in defaults emit facility keywords from identifier and name, and deployment keywords from selected instrument/deployment fields.
As soon as a discovery block is present in config.yaml, it is authoritative: omitted buckets and empty lists suppress extraction.
convert_wmdr10_json_to_wmdr2_json:
source: resources/wmdr10_json_examples
target: results/wmdr2_json_examples
discovery:
facility:
keywords: []
links: []
observation:
keywords: []
links: []
deployment:
keywords: []
links: []To retain the default facility keywords explicitly, use:
convert_wmdr10_json_to_wmdr2_json:
discovery:
facility:
keywords: [identifier, name]themes are intentionally not emitted in the current WMDR2 core representation.
The active schema files live under schemas/.
schemas/
wmdr2-common.schema.json
wmdr2-record-feature.schema.json
wmdr2-facility-sets.schema.json
Run schema tests with:
pytest -q tests/test_wmdr2_schemas.pyRun all tests with:
pytest -qThe current WMDR2 workflow no longer uses the previous Records Part 1 GeoJSON conversion path. The following root-level files can be removed if they are not referenced by local branches or pending work:
convert_wmdr10_json_to_records_part1.pysettings.geojsonversion.geojson- root-level
wmdr2-common.schema.json - root-level
wmdr2-feature-collection.schema.json - root-level
wmdr2-record-feature.schema.json
Do not remove the active schema files under schemas/.