diff --git a/.claude/sweep-metadata-state.csv b/.claude/sweep-metadata-state.csv index 5047db05..b543689e 100644 --- a/.claude/sweep-metadata-state.csv +++ b/.claude/sweep-metadata-state.csv @@ -1,5 +1,6 @@ module,last_inspected,issue,severity_max,categories_found,notes geotiff,2026-05-18,1909,HIGH,4;5,"Re-audit 2026-05-15 (agent-a55b69cec1ef2a092 worktree, branch deep-sweep-metadata-geotiff-2026-05-15). 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity reverified after the #1813 modular refactor: full reads, windowed reads, multi-band, band=N selection, no-georef integer pixel coords, crs/crs_wkt/transform/nodata/x_resolution/y_resolution/resolution_unit/image_description/gdal_metadata all agree across backends. DataArray .name and dims agree (y, x for 2D; y, x, band for 3D). NEW HIGH finding #1909: GDS chunked GPU path (_read_geotiff_gpu_chunked_gds) declared the dask graph dtype as float64 when source had an in-range integer nodata sentinel, matching the CPU dask path's #1597 contract, but the per-chunk _chunk_task did not cast its returned cupy array to declared_dtype -- chunks with no sentinel hit returned the raw uint16/int16 source dtype, producing a silent declared/actual dtype mismatch. Fix mirrors the #1597 + #1624 CPU dask pattern: compute declared_dtype before defining _chunk_task, cast inside the task only when arr.dtype != declared_dtype to skip the no-op astype(copy=True). 6 regression tests added in test_chunked_gpu_declared_dtype_1909.py covering declared vs computed parity, CPU/GPU dask declared-dtype agreement, eager paths preserve source dtype, no-nodata round-trip, explicit dtype= kwarg, and sentinel-hit float64 promotion. Pre-existing test failures in test_predictor2_big_endian_gpu_1517.py and test_size_param_validation_gpu_vrt_1776.py exist on main (read_to_array AttributeError after #1813 refactor, tile_size=4 rejected by stricter _validate_tile_size_arg) and are unrelated to this audit. | Re-audited 2026-05-18 (agent-a59a61958f181c31a worktree, branch deep-sweep-metadata-geotiff-2026-05-18). 4-backend (numpy / cupy / dask+numpy / dask+cupy) metadata parity reverified end-to-end: open_geotiff over a tiled uint16 fixture with crs + transform + GDAL_NODATA sentinel emits identical attrs across all 4 backends (crs=32633, crs_wkt, transform 6-tuple, nodata=5, masked_nodata=True, _xrspatial_geotiff_contract=2, extra_tags, image_description, resolution_unit, x_resolution, y_resolution). Multi-band 3D (y, x, band) with band coord, no-georef int64 pixel coords, windowed reads with transform origin shift, and mask_nodata=False keeping integer dtype all agree across the 4 backends. Write round-trip via to_geotiff (numpy, cupy, dask streaming) re-emits crs / transform / nodata / masked_nodata / contract version with byte-stable transform. Band-first (band, y, x) input correctly remaps to (y, x, band) on disk. _populate_attrs_from_geo_info, _set_nodata_attrs, and _extract_rich_tags centralise attrs emission across all read paths (_init_, _backends/dask, _backends/gpu, _backends/vrt) and write paths (_writers/eager, _writers/gpu, _writers/vrt). _ATTRS_CONTRACT_VERSION=2 is stamped on every path including the chunked GPU GDS and chunked VRT inline-attrs branches. No new CRITICAL/HIGH/MEDIUM/LOW findings." +interpolate,2026-05-21,,MEDIUM,4,"Audited 2026-05-21 (agent-aa6c8dedaff422e69 worktree, branch deep-sweep-metadata-interpolate-2026-05-21). cuda-unavailable. MEDIUM Cat 4 finding: idw(), kriging(), spline() previously copied template.attrs wholesale onto the float64 interpolation output, leaking the template's nodata sentinel (e.g. -9999 for an integer DEM template) onto a float surface where it no longer represents the actual sentinel. Default idw fill_value=NaN, kriging, and spline emit NaN as the no-data sentinel so inherited nodatavals/_FillValue/nodata were misleading; explicit idw(fill_value=-1) emitted no triplet at all. Fixed in sub-branch deep-sweep-metadata-interpolate-2026-05-21-01 via new _validation._build_output_attrs helper that strips inherited nodata keys and re-emits the (nodata, _FillValue, nodatavals) triplet only when fill_value is non-NaN -- same pattern as the rasterize() fix #2018. crs / res / transform / extra coords / dims continue to propagate as before. Kriging singular-matrix fallback now also returns a properly-named variance ({name}_variance) with attrs equal to the prediction's. 12 new tests in TestMetadataPropagation cover spatial-ref propagation, nodata-sentinel stripping for default fill, triplet emission for explicit fill, dims/coords preservation, no template-attrs mutation, dask-vs-numpy attrs parity, and kriging variance.attrs parity. CuPy and dask+cupy paths not executed (CUDA unavailable) but share the centralised public-API attrs construction so the fix applies uniformly. Cat 5 LOW (not fixed): kriging singular-matrix fallback's variance.name was 'kriging' instead of 'kriging_variance' -- folded into the MEDIUM fix because it was in the same code path. GitHub issue not filed: fork has issues disabled." polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)." rasterize,2026-05-17,2018,HIGH,1;2;4,"rasterize() drops like.attrs, rebuilds like.coords via linspace (not bit-identical), and never emits _FillValue/nodatavals even when fill is non-NaN. Cat 1 HIGH: chained pipelines like slope(rasterize(gdf, like=elevation)) silently lose crs/res/transform. Cat 2 MEDIUM: linspace round-trip from re-derived bounds breaks xr.align with like. Cat 4 MEDIUM: rasterize(..., fill=-9999, dtype=int32) emits no _FillValue. All 4 backends share the same final return so the fix is one place. Fixed in deep-sweep-metadata-rasterize-2026-05-17-01 (worktree agent-ab7a9aee97c1e4cdf): _extract_grid_from_like now returns coords/attrs; rasterize() reuses like.coords directly when grid matches, copies like.attrs, and emits _FillValue + nodatavals when fill is not NaN. 9 new tests in TestMetadataPropagation cover attrs propagation, bit-identical coord reuse, fill-value emission, isolation from template attrs, and parity across numpy/cupy/dask+numpy/dask+cupy backends. Full test suite (193 passing) clean." reproject,2026-05-10,1572;1573,HIGH,1;3;4,geoid_height_raster dropped input attrs and used dims[-2:] for 3D inputs (#1572). reproject/merge ignored nodatavals (rasterio convention) when rioxarray absent (#1573). Fixed in same branch. diff --git a/xrspatial/interpolate/_idw.py b/xrspatial/interpolate/_idw.py index de9aece4..2481569b 100644 --- a/xrspatial/interpolate/_idw.py +++ b/xrspatial/interpolate/_idw.py @@ -16,7 +16,7 @@ ngjit, ) -from ._validation import extract_grid_coords, validate_points +from ._validation import _build_output_attrs, extract_grid_coords, validate_points try: import cupy @@ -335,5 +335,5 @@ def idw(x, y, z, template, power=2.0, k=None, out, name=name, coords=template.coords, dims=template.dims, - attrs=template.attrs, + attrs=_build_output_attrs(template.attrs, fill_value), ) diff --git a/xrspatial/interpolate/_kriging.py b/xrspatial/interpolate/_kriging.py index 12c2d08f..e25f8cb0 100644 --- a/xrspatial/interpolate/_kriging.py +++ b/xrspatial/interpolate/_kriging.py @@ -13,7 +13,7 @@ _validate_scalar, ) -from ._validation import extract_grid_coords, validate_points +from ._validation import _build_output_attrs, extract_grid_coords, validate_points try: import cupy @@ -474,14 +474,21 @@ def vario_func(h): if K_inv is None: pred = np.full(template.shape, np.nan) + out_attrs = _build_output_attrs(template.attrs, np.nan) prediction = xr.DataArray( pred, name=name, coords=template.coords, dims=template.dims, - attrs=template.attrs, + attrs=out_attrs, ) if return_variance: - return prediction, prediction.copy() + variance = xr.DataArray( + pred.copy(), name=f'{name}_variance', + coords=template.coords, + dims=template.dims, + attrs=out_attrs, + ) + return prediction, variance return prediction mapper = ArrayTypeFunctionMapping( @@ -496,11 +503,13 @@ def vario_func(h): vario_func, K_inv, return_variance, template.data, ) + out_attrs = _build_output_attrs(template.attrs, np.nan) + prediction = xr.DataArray( pred_arr, name=name, coords=template.coords, dims=template.dims, - attrs=template.attrs, + attrs=out_attrs, ) if return_variance: @@ -508,7 +517,7 @@ def vario_func(h): var_arr, name=f'{name}_variance', coords=template.coords, dims=template.dims, - attrs=template.attrs, + attrs=out_attrs, ) return prediction, variance diff --git a/xrspatial/interpolate/_spline.py b/xrspatial/interpolate/_spline.py index 28e161fd..9ef24cba 100644 --- a/xrspatial/interpolate/_spline.py +++ b/xrspatial/interpolate/_spline.py @@ -16,7 +16,7 @@ ngjit, ) -from ._validation import extract_grid_coords, validate_points +from ._validation import _build_output_attrs, extract_grid_coords, validate_points try: import cupy @@ -338,5 +338,5 @@ def spline(x, y, z, template, smoothing=0.0, name='spline'): out, name=name, coords=template.coords, dims=template.dims, - attrs=template.attrs, + attrs=_build_output_attrs(template.attrs, np.nan), ) diff --git a/xrspatial/interpolate/_validation.py b/xrspatial/interpolate/_validation.py index c8bc9570..7102cfea 100644 --- a/xrspatial/interpolate/_validation.py +++ b/xrspatial/interpolate/_validation.py @@ -41,6 +41,57 @@ def validate_points(x, y, z, *, func_name): return x, y, z +def _build_output_attrs(template_attrs, fill_value): + """Return output attrs derived from *template_attrs* and *fill_value*. + + The interpolation functions reuse the input ``template`` only as a + grid donor: the output is always a ``float64`` raster, never the + template's dtype. Copying ``template.attrs`` wholesale therefore + leaks the template's nodata sentinel (often an integer like + ``-9999`` for an integer DEM) onto a float surface where it no + longer matches the actual no-data value the interpolator emits. + Downstream code that reads ``result.attrs['nodatavals']`` or + ``result.attrs['_FillValue']`` for masking would then get the wrong + sentinel. + + This helper preserves spatial-reference attrs (``crs``, ``res``, + ``transform``, etc.), strips inherited nodata keys, and re-emits a + consistent triplet keyed off *fill_value* so chained pipelines can + mask the interpolation output correctly. Mirrors the rasterize() + fix (#2018). + + Parameters + ---------- + template_attrs : Mapping + ``template.attrs`` from the user-supplied grid donor. + fill_value : float + The sentinel value used for pixels with no contribution + (IDW's ``fill_value`` parameter, or ``np.nan`` for kriging + and spline which evaluate everywhere). + + Returns + ------- + dict + New attrs dict suitable for ``xr.DataArray(..., attrs=...)``. + """ + out_attrs = dict(template_attrs) if template_attrs else {} + for key in ('nodata', '_FillValue', 'nodatavals'): + out_attrs.pop(key, None) + + try: + fill_as_float = float(fill_value) + fill_is_nan = np.isnan(fill_as_float) + except (TypeError, ValueError): + fill_is_nan = False + + if not fill_is_nan: + out_attrs['nodata'] = fill_value + out_attrs['_FillValue'] = fill_value + out_attrs['nodatavals'] = (fill_value,) + + return out_attrs + + def extract_grid_coords(template, *, func_name): """Extract 1-D coordinate vectors from a 2-D template DataArray. diff --git a/xrspatial/tests/test_interpolation.py b/xrspatial/tests/test_interpolation.py index d0d54f5e..a12431ae 100644 --- a/xrspatial/tests/test_interpolation.py +++ b/xrspatial/tests/test_interpolation.py @@ -628,3 +628,152 @@ def test_check_helper_estimate_message(self, monkeypatch): # N=500 -> kernel_bytes = 4 * 500**2 * 8 = 8 MB > 1*0.8 MB. with pytest.raises(MemoryError, match='kernel block K'): _check_spline_memory(n_points=500, grid_pixels=100) + + +# =================================================================== +# Metadata propagation tests +# =================================================================== + +class TestMetadataPropagation: + """Verify attrs / coords / dims propagation across interpolators. + + The template is a grid donor: spatial-reference attrs (``crs``, + ``res``, ``transform``) must survive to the output, but the + template's nodata sentinel must NOT leak onto the float64 + interpolation surface where it no longer represents the actual + sentinel value. + """ + + @staticmethod + def _georef_template(): + """Template with integer-style nodata sentinel and spatial-ref attrs.""" + template = _make_template([0.0, 1.0, 2.0], [0.0, 1.0, 2.0]) + template.attrs = { + 'res': (1.0, 1.0), + 'crs': 'EPSG:32633', + 'transform': (1.0, 0.0, 0.0, 0.0, 1.0, 0.0), + 'nodatavals': (-9999,), + '_FillValue': -9999, + 'nodata': -9999, + } + return template + + @staticmethod + def _grid_points_simple(): + x = np.array([0.0, 1.0, 2.0, 0.0, 1.0, 2.0, 0.0, 1.0, 2.0]) + y = np.array([0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0]) + z = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]) + return x, y, z + + # --- idw --- + + def test_idw_preserves_spatial_ref_attrs(self): + x, y, z = self._grid_points_simple() + template = self._georef_template() + result = idw(x, y, z, template) + assert result.attrs['crs'] == 'EPSG:32633' + assert result.attrs['res'] == (1.0, 1.0) + assert result.attrs['transform'] == (1.0, 0.0, 0.0, 0.0, 1.0, 0.0) + + def test_idw_default_fill_strips_template_nodata(self): + """Default fill_value=NaN: template's -9999 sentinel must not leak.""" + x, y, z = self._grid_points_simple() + template = self._georef_template() + result = idw(x, y, z, template) + assert 'nodatavals' not in result.attrs + assert '_FillValue' not in result.attrs + assert 'nodata' not in result.attrs + + def test_idw_explicit_fill_emits_nodata_triplet(self): + x, y, z = self._grid_points_simple() + template = _make_template([0.0, 1.0, 2.0], [0.0, 1.0, 2.0]) + template.attrs = {'crs': 'EPSG:4326'} + result = idw(x, y, z, template, fill_value=-1.0) + assert result.attrs['nodata'] == -1.0 + assert result.attrs['_FillValue'] == -1.0 + assert result.attrs['nodatavals'] == (-1.0,) + assert result.attrs['crs'] == 'EPSG:4326' + + def test_idw_dims_and_coords_preserved(self): + x, y, z = self._grid_points_simple() + template = self._georef_template() + result = idw(x, y, z, template) + assert result.dims == template.dims + np.testing.assert_array_equal( + result.coords['x'].values, template.coords['x'].values) + np.testing.assert_array_equal( + result.coords['y'].values, template.coords['y'].values) + + def test_idw_does_not_mutate_template_attrs(self): + """Helper must not alias / mutate the caller's attrs dict.""" + template = self._georef_template() + snapshot = dict(template.attrs) + idw([0.5], [0.5], [1.0], template, fill_value=-1.0) + assert template.attrs == snapshot + + @dask_array_available + def test_idw_dask_attrs_match_numpy(self): + x, y, z = self._grid_points_simple() + np_template = self._georef_template() + da_template = self._georef_template() + da_template.data = __import__('dask.array', fromlist=['from_array'])\ + .from_array(da_template.data, chunks=(2, 2)) + np_result = idw(x, y, z, np_template, fill_value=-1.0) + da_result = idw(x, y, z, da_template, fill_value=-1.0) + assert np_result.attrs == da_result.attrs + + # --- kriging --- + + def test_kriging_preserves_spatial_ref_attrs(self): + x = np.array([0.0, 1.0, 2.0, 3.0]) + y = np.array([0.0, 0.0, 1.0, 1.0]) + z = np.array([1.0, 2.0, 3.0, 4.0]) + template = self._georef_template() + result = kriging(x, y, z, template) + assert result.attrs['crs'] == 'EPSG:32633' + assert result.attrs['res'] == (1.0, 1.0) + + def test_kriging_strips_template_nodata(self): + x = np.array([0.0, 1.0, 2.0, 3.0]) + y = np.array([0.0, 0.0, 1.0, 1.0]) + z = np.array([1.0, 2.0, 3.0, 4.0]) + template = self._georef_template() + result = kriging(x, y, z, template) + assert 'nodatavals' not in result.attrs + assert '_FillValue' not in result.attrs + + def test_kriging_variance_attrs_match_prediction(self): + x = np.array([0.0, 1.0, 2.0, 3.0]) + y = np.array([0.0, 0.0, 1.0, 1.0]) + z = np.array([1.0, 2.0, 3.0, 4.0]) + template = self._georef_template() + pred, var = kriging(x, y, z, template, return_variance=True) + assert var.attrs == pred.attrs + assert var.name == 'kriging_variance' + + # --- spline --- + + def test_spline_preserves_spatial_ref_attrs(self): + x, y, z = self._grid_points_simple() + template = self._georef_template() + result = spline(x, y, z, template) + assert result.attrs['crs'] == 'EPSG:32633' + assert result.attrs['res'] == (1.0, 1.0) + + def test_spline_strips_template_nodata(self): + x, y, z = self._grid_points_simple() + template = self._georef_template() + result = spline(x, y, z, template) + assert 'nodatavals' not in result.attrs + assert '_FillValue' not in result.attrs + assert 'nodata' not in result.attrs + + def test_spline_dims_and_coords_preserved(self): + x, y, z = self._grid_points_simple() + template = self._georef_template() + result = spline(x, y, z, template) + assert result.dims == template.dims + np.testing.assert_array_equal( + result.coords['x'].values, template.coords['x'].values) + np.testing.assert_array_equal( + result.coords['y'].values, template.coords['y'].values)