release(v1.5.3): Open-Meteo forecast-join correctness (#67) + research() docs (#52)#71
Merged
Conversation
- Resolve the latent GFS precipitation twin bug by adding a `_pick_record` helper to forecast_nwp.py. It filters duplicates by prioritizing instantaneous records and breaking ties using lowest `record_no`. - Implement `cloud_cover_pct`, `visibility_m`, and `cloud_ceiling_m` for GFS and HRRR. - Define and integrate physics-bounds QC rules for the three new fields. - Regenerate schema.forecast_nwp.v1 JSON schemas. - Add comprehensive test coverage in test_qc_rules_nwp.py, test_forecast_nwp.py, and test_forecast_nwp_multi_cycle.py. - Include planning artifacts and research briefs in .briefs/ directory.
Reverted accidental inclusion of all extras in root dependency list. The [nwp] extra pulls eccodes which is not available in CI without the full extra install, causing test detection guards to pass while the actual eccodes binary fails to load.
Phase 20+ Open-Meteo rows carry a derived issued_at, so the old issued_at-presence split in build_pairs_row() misrouted them into the IEM MOS aggregation path — silently nulling forecast temps and polluting IEM run-selection when both sources are combined. Discriminate by the authoritative source field instead (open_meteo* -> Open-Meteo, else IEM), matching the contract research._fetch_open_meteo_range already documents. Also teach _aggregate_fcst_temps_openmeteo to fall back to a pre-converted temperature_f so source-discriminated rows from the research() wrapper (which emits temperature_f, not temperature_c) aggregate correctly instead of returning null. TDD: 3 new regression tests in test_pairs.py cover the derived-issued_at classification, IEM-still-preferred, and the research-wrapper temperature_f shape.
…#52) A user asked why research() returns daily means and how to access hourly station observations. Clarify in the Returns section that research() yields one daily settlement-summary row per date (obs_* are window aggregates), not sub-daily rows, and point to weather.obs() and the Sprint 0.5+ raw_metar roadmap.
Codex review caught a regression: research._fetch_open_meteo_range emits OM rows carrying pop_6hr_pct / qpf_6hr_in (not precipitation_probability_pct and no QPF read). Pre-#67 those rows flowed through the IEM branch, which set both fields; after source-routing they hit the OM branch, which only read precipitation_probability_pct and never set QPF — silently nulling precip columns for research(forecast_source="open_meteo"). OM branch now accepts the pop_6hr_pct alias (explicit None-checks keep a valid 0.0) and sums qpf_6hr_in over the window, matching IEM semantics. +2 regression tests.
…odex P2) Codex flagged that the pure source-prefix split regressed the previously documented Open-Meteo shape (no source, no issued_at, temperature_c) to the IEM branch -> null temps. Extract _is_open_meteo_record(): source prefixed open_meteo OR (no source AND no issued_at). Real IEM rows always carry an issued_at, so a record missing both can only be legacy OM. +1 regression test.
… codex P2) Codex flagged leakage-safety provenance loss: Phase 20+ OM rows carry a derived issued_at; pre-#67 the IEM branch set fcst_issued via _select_best_run, but after source routing the OM branch left fcst_issued_at None for forecast_source="open_meteo". Restore it as the most-recent OM issued_at at-or-before market close — never leaking a run issued after settlement. +2 regression tests.
…P1 leakage) Codex P1: my provenance fix filtered issued_at only for the displayed timestamp; the OM aggregation still summed window rows issued AFTER market close, leaking their temp/POP/QPF into the training pair (lookahead). Pre-#67 _select_best_run applied this cutoff; the OM branch did not. Now filter window_om by issued_at <= market_close before aggregating temps/POP/QPF/ issued_at. Legacy source-less rows (no issued_at) are kept — nothing to leak. Strengthened test asserts the after-close temp is excluded, not just hidden.
…h() docs (#52) Bump all three packages 1.5.2 -> 1.5.3. - #67 (#69): build_pairs_row() now discriminates OM/IEM forecasts by source instead of issued_at presence, fixing silent null temps + IEM run-selection pollution for Phase 20+ multi-source callers; closes a lookahead-leakage path for after-close OM runs. Parity gate unaffected. - #52 (#70): research() docstring clarifies daily-row return granularity.
|
✅ Docs-required check: PASS API-surface change includes docs updates — no reminder needed. API-surface files changed: Docs files changed: |
|
Parity ticket gate: PASSED See |
, codex P2) Codex P2 on #66: _fetch_open_meteo_range clamped the fetch span to the request's [start, end], so a subrange request wrote an incomplete monthly forecast partition; a later same-month window then read it as a hit and silently dropped the uncached days. Fetch on full-month boundaries (clamped only to today to avoid future dates) so every written elapsed-month partition is complete. Current UTC month is never read-served nor written, so its partialness stays harmless. +1 regression test.
… P2) Codex P2 on #68: scripts/export_schemas.py exported schema.forecast_nwp.v1 but the TS codegen SCHEMA_FILES list omitted it, so pnpm codegen never emitted a ForecastNwpV1 type/validator and TS consumers stayed unaware of the new public NWP columns (dual-SDK parity gap). Add the schema to the TS codegen list and regenerate: new generated forecast_nwp.v1.ts + ajv validator + barrel/format-map exports. Codegen + core validator tests green.
…-join fix (#67) + docs (#52) Minor release (features added -> 1.6.0, not a patch). Bump all three packages 1.5.3 -> 1.6.0; regenerate uv.lock (codex P2 on #71: lock was stale at 1.5.2/1.5.3). CHANGELOG: fold the never-published 1.5.3 entry into 1.6.0 and add the #63 NWP-fields and #64 OM-cache/throttle feature entries. Bundles PRs #67/#52 (already in merged-vision) + #66/#64 + #68/#63, each with its codex P2 resolved on this integration branch.
…#64, codex P2) Codex P2 on updated #71: the >14-day Previous-Runs chunking path concatenates per-chunk DataFrames, and pd.concat drops df.attrs (same for the Single-Runs boolean clip). Chunked/clipped frames then lacked attrs[source]/[retrieved_at], failing validate_dataframe's source_attr_required for long windows. Restamp the combined frame's provenance from the per-chunk frames (latest retrieved_at). +1 regression test on a 3-chunk window.
, codex P2) Codex P2 on updated #71: the #63 disambiguation silently picked lowest record_no for ANY ambiguous (variable, level), including multiple DISTINCT aggregation windows with no instantaneous record — populating a column from an arbitrary window (silently-wrong NWP data). Restore the loud-fail guard issue #63 intended: _pick_record resolves only the known cases (instantaneous preferred; identical-window twins like the GFS APCP pair) and returns None otherwise, so _extract_records raises GribIntegrityError. Corrected the PR-68 test that had encoded the silently-pick behavior + added a raise-path test.
…dow (#64, codex P2) Codex P2 on updated #71: the weight-aware polite delay scaled num_days by the requested window even for Single-Runs, which ignores start_date/end_date and returns a fixed ~168h horizon from run= in one call. An exact-cycle multi-month or year request therefore slept tens of seconds after a single response. Use a fixed 7-day horizon for the Single-Runs cost; Previous-Runs chunks unchanged. +1 regression test (year window -> single 0.2s sleep, not ~5s).
#63, codex P2) Codex P2 on updated #71: the new ForecastNwpV1 TS export shipped but the four packages-ts package.json files still reported 1.5.2, so release-ts-preflight would reject a vts-1.6.0 publish and npm consumers couldn't receive the new type. Bump @mostlyrightmd/{core,weather,markets} + meta to 1.6.0 (workspace:* cross-deps need no change) and note the dual PyPI+npm bump in the CHANGELOG.
… codex P1) Codex P1: research() now calls fetch_open_meteo(variables=...), a kwarg added in weather 1.6.0. The research extra still allowed weather>=1.0.0, so a core-1.6.0 + weather-1.5.x install would TypeError on research(forecast_source="open_meteo") before fetching. Bump the floor to >=1.6.0,<2.0 and relock.
…uns (#64, codex P2) Codex P2: only Previous-Runs was chunked, but seamless/live also bill by start_date/end_date, so a >14-day seamless window still went out as one high-weight call — the exact rate-limit behavior #64 fixes. Chunk every endpoint except Single-Runs (which sends run= for a fixed horizon). +1 test on a 30-day seamless window.
…codex P1) Codex P1: narrowing the research-extra weather pin to >=1.6.0,<2.0 broke test_core_research_extra_pins_weather_to_active_major, which only accepted >=1.0.0,<2.0. Update the contract to require an active-major (<2.0) pin with a >=1.6.0 floor (the fetch_open_meteo variables= kwarg requirement), accepting any 1.x>=6 floor.
…are (#67, codex P2) Codex P2: rows from fetch_open_meteo(...).to_dict('records') carry pandas Timestamp valid_at/issued_at. The source-routed OM branch compares them to ISO string window bounds -> TypeError for direct callers of build_pairs_row (the research() wrapper stringifies, but the internal builder should be robust). Add _to_iso_z() and normalize the OM rows up front (shallow copies; caller dicts untouched). +1 regression test with pd.Timestamp fields.
…ex P2)
Codex P2: raw fetch_open_meteo(...).to_dict('records') rows name the Celsius
temp column temp_c (not temperature_c/temperature_f). The source-routed OM
aggregation read only the latter two, returning null highs/lows for direct
callers. Accept temp_c (°C->F) alongside temperature_c and the temperature_f
fallback. Updated the Timestamp regression test to use the canonical temp_c.
This was referenced Jun 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Integration PR from
merged-vision→mainfor the v1.5.3 patch release. Bundles two merged feature PRs.What's in 1.5.3
Fixed (#67):
build_pairs_row()split forecast records byissued_atpresence, but Phase 20+ Open-Meteo rows carry a derivedissued_atand were misrouted into the IEM MOS aggregation path — silently nulling forecast temps and polluting IEM run-selection for multi-source callers. Now split by the authoritativesourcefield, with:issued_at-less rows kept as Open-Meteo (backward compat),temperature_ffallback +pop_6hr_pct/qpf_6hr_in+fcst_issued_atprovenance preserved through the source-routed path,Docs (#52):
research()Returnsdocstring clarifies the daily-row return granularity.Verification
uv run pytest -m "not live"— green on integratedmerged-vision(incl. the HARD parity gatetests/test_parity.py).uv build— all three wheels build at1.5.3(mostlyrightmd,mostlyrightmd-weather,mostlyrightmd-markets).Version
1.5.2→1.5.3across all threepyproject.tomlfiles; CHANGELOG entry added.TS Parity
#67 is a Python-internal forecast-join fix (no public column change); the TS twin's
buildPairsRowshould get a parity ticket per CROSS-SDK-SYNC.md. #70 is docs-only (N/A).