Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/sweep-metadata-state.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
module,last_inspected,issue,severity_max,categories_found,notes
geotiff,2026-05-18,1909,HIGH,4;5,"Re-audit 2026-05-15 (agent-a55b69cec1ef2a092 worktree, branch deep-sweep-metadata-geotiff-2026-05-15). 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity reverified after the #1813 modular refactor: full reads, windowed reads, multi-band, band=N selection, no-georef integer pixel coords, crs/crs_wkt/transform/nodata/x_resolution/y_resolution/resolution_unit/image_description/gdal_metadata all agree across backends. DataArray .name and dims agree (y, x for 2D; y, x, band for 3D). NEW HIGH finding #1909: GDS chunked GPU path (_read_geotiff_gpu_chunked_gds) declared the dask graph dtype as float64 when source had an in-range integer nodata sentinel, matching the CPU dask path's #1597 contract, but the per-chunk _chunk_task did not cast its returned cupy array to declared_dtype -- chunks with no sentinel hit returned the raw uint16/int16 source dtype, producing a silent declared/actual dtype mismatch. Fix mirrors the #1597 + #1624 CPU dask pattern: compute declared_dtype before defining _chunk_task, cast inside the task only when arr.dtype != declared_dtype to skip the no-op astype(copy=True). 6 regression tests added in test_chunked_gpu_declared_dtype_1909.py covering declared vs computed parity, CPU/GPU dask declared-dtype agreement, eager paths preserve source dtype, no-nodata round-trip, explicit dtype= kwarg, and sentinel-hit float64 promotion. Pre-existing test failures in test_predictor2_big_endian_gpu_1517.py and test_size_param_validation_gpu_vrt_1776.py exist on main (read_to_array AttributeError after #1813 refactor, tile_size=4 rejected by stricter _validate_tile_size_arg) and are unrelated to this audit. | Re-audited 2026-05-18 (agent-a59a61958f181c31a worktree, branch deep-sweep-metadata-geotiff-2026-05-18). 4-backend (numpy / cupy / dask+numpy / dask+cupy) metadata parity reverified end-to-end: open_geotiff over a tiled uint16 fixture with crs + transform + GDAL_NODATA sentinel emits identical attrs across all 4 backends (crs=32633, crs_wkt, transform 6-tuple, nodata=5, masked_nodata=True, _xrspatial_geotiff_contract=2, extra_tags, image_description, resolution_unit, x_resolution, y_resolution). Multi-band 3D (y, x, band) with band coord, no-georef int64 pixel coords, windowed reads with transform origin shift, and mask_nodata=False keeping integer dtype all agree across the 4 backends. Write round-trip via to_geotiff (numpy, cupy, dask streaming) re-emits crs / transform / nodata / masked_nodata / contract version with byte-stable transform. Band-first (band, y, x) input correctly remaps to (y, x, band) on disk. _populate_attrs_from_geo_info, _set_nodata_attrs, and _extract_rich_tags centralise attrs emission across all read paths (_init_, _backends/dask, _backends/gpu, _backends/vrt) and write paths (_writers/eager, _writers/gpu, _writers/vrt). _ATTRS_CONTRACT_VERSION=2 is stamped on every path including the chunked GPU GDS and chunked VRT inline-attrs branches. No new CRITICAL/HIGH/MEDIUM/LOW findings."
interpolate,2026-05-21,,MEDIUM,4,"Audited 2026-05-21 (agent-aa6c8dedaff422e69 worktree, branch deep-sweep-metadata-interpolate-2026-05-21). cuda-unavailable. MEDIUM Cat 4 finding: idw(), kriging(), spline() previously copied template.attrs wholesale onto the float64 interpolation output, leaking the template's nodata sentinel (e.g. -9999 for an integer DEM template) onto a float surface where it no longer represents the actual sentinel. Default idw fill_value=NaN, kriging, and spline emit NaN as the no-data sentinel so inherited nodatavals/_FillValue/nodata were misleading; explicit idw(fill_value=-1) emitted no triplet at all. Fixed in sub-branch deep-sweep-metadata-interpolate-2026-05-21-01 via new _validation._build_output_attrs helper that strips inherited nodata keys and re-emits the (nodata, _FillValue, nodatavals) triplet only when fill_value is non-NaN -- same pattern as the rasterize() fix #2018. crs / res / transform / extra coords / dims continue to propagate as before. Kriging singular-matrix fallback now also returns a properly-named variance ({name}_variance) with attrs equal to the prediction's. 12 new tests in TestMetadataPropagation cover spatial-ref propagation, nodata-sentinel stripping for default fill, triplet emission for explicit fill, dims/coords preservation, no template-attrs mutation, dask-vs-numpy attrs parity, and kriging variance.attrs parity. CuPy and dask+cupy paths not executed (CUDA unavailable) but share the centralised public-API attrs construction so the fix applies uniformly. Cat 5 LOW (not fixed): kriging singular-matrix fallback's variance.name was 'kriging' instead of 'kriging_variance' -- folded into the MEDIUM fix because it was in the same code path. GitHub issue not filed: fork has issues disabled."
polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)."
rasterize,2026-05-17,2018,HIGH,1;2;4,"rasterize() drops like.attrs, rebuilds like.coords via linspace (not bit-identical), and never emits _FillValue/nodatavals even when fill is non-NaN. Cat 1 HIGH: chained pipelines like slope(rasterize(gdf, like=elevation)) silently lose crs/res/transform. Cat 2 MEDIUM: linspace round-trip from re-derived bounds breaks xr.align with like. Cat 4 MEDIUM: rasterize(..., fill=-9999, dtype=int32) emits no _FillValue. All 4 backends share the same final return so the fix is one place. Fixed in deep-sweep-metadata-rasterize-2026-05-17-01 (worktree agent-ab7a9aee97c1e4cdf): _extract_grid_from_like now returns coords/attrs; rasterize() reuses like.coords directly when grid matches, copies like.attrs, and emits _FillValue + nodatavals when fill is not NaN. 9 new tests in TestMetadataPropagation cover attrs propagation, bit-identical coord reuse, fill-value emission, isolation from template attrs, and parity across numpy/cupy/dask+numpy/dask+cupy backends. Full test suite (193 passing) clean."
reproject,2026-05-10,1572;1573,HIGH,1;3;4,geoid_height_raster dropped input attrs and used dims[-2:] for 3D inputs (#1572). reproject/merge ignored nodatavals (rasterio convention) when rioxarray absent (#1573). Fixed in same branch.
4 changes: 2 additions & 2 deletions xrspatial/interpolate/_idw.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
ngjit,
)

from ._validation import extract_grid_coords, validate_points
from ._validation import _build_output_attrs, extract_grid_coords, validate_points

try:
import cupy
Expand Down Expand Up @@ -335,5 +335,5 @@ def idw(x, y, z, template, power=2.0, k=None,
out, name=name,
coords=template.coords,
dims=template.dims,
attrs=template.attrs,
attrs=_build_output_attrs(template.attrs, fill_value),
)
19 changes: 14 additions & 5 deletions xrspatial/interpolate/_kriging.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
_validate_scalar,
)

from ._validation import extract_grid_coords, validate_points
from ._validation import _build_output_attrs, extract_grid_coords, validate_points

try:
import cupy
Expand Down Expand Up @@ -474,14 +474,21 @@ def vario_func(h):

if K_inv is None:
pred = np.full(template.shape, np.nan)
out_attrs = _build_output_attrs(template.attrs, np.nan)
prediction = xr.DataArray(
pred, name=name,
coords=template.coords,
dims=template.dims,
attrs=template.attrs,
attrs=out_attrs,
)
if return_variance:
return prediction, prediction.copy()
variance = xr.DataArray(
pred.copy(), name=f'{name}_variance',
coords=template.coords,
dims=template.dims,
attrs=out_attrs,
)
return prediction, variance
return prediction

mapper = ArrayTypeFunctionMapping(
Expand All @@ -496,19 +503,21 @@ def vario_func(h):
vario_func, K_inv, return_variance, template.data,
)

out_attrs = _build_output_attrs(template.attrs, np.nan)

prediction = xr.DataArray(
pred_arr, name=name,
coords=template.coords,
dims=template.dims,
attrs=template.attrs,
attrs=out_attrs,
)

if return_variance:
variance = xr.DataArray(
var_arr, name=f'{name}_variance',
coords=template.coords,
dims=template.dims,
attrs=template.attrs,
attrs=out_attrs,
)
return prediction, variance

Expand Down
4 changes: 2 additions & 2 deletions xrspatial/interpolate/_spline.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
ngjit,
)

from ._validation import extract_grid_coords, validate_points
from ._validation import _build_output_attrs, extract_grid_coords, validate_points

try:
import cupy
Expand Down Expand Up @@ -338,5 +338,5 @@ def spline(x, y, z, template, smoothing=0.0, name='spline'):
out, name=name,
coords=template.coords,
dims=template.dims,
attrs=template.attrs,
attrs=_build_output_attrs(template.attrs, np.nan),
)
51 changes: 51 additions & 0 deletions xrspatial/interpolate/_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,57 @@ def validate_points(x, y, z, *, func_name):
return x, y, z


def _build_output_attrs(template_attrs, fill_value):
"""Return output attrs derived from *template_attrs* and *fill_value*.

The interpolation functions reuse the input ``template`` only as a
grid donor: the output is always a ``float64`` raster, never the
template's dtype. Copying ``template.attrs`` wholesale therefore
leaks the template's nodata sentinel (often an integer like
``-9999`` for an integer DEM) onto a float surface where it no
longer matches the actual no-data value the interpolator emits.
Downstream code that reads ``result.attrs['nodatavals']`` or
``result.attrs['_FillValue']`` for masking would then get the wrong
sentinel.

This helper preserves spatial-reference attrs (``crs``, ``res``,
``transform``, etc.), strips inherited nodata keys, and re-emits a
consistent triplet keyed off *fill_value* so chained pipelines can
mask the interpolation output correctly. Mirrors the rasterize()
fix (#2018).

Parameters
----------
template_attrs : Mapping
``template.attrs`` from the user-supplied grid donor.
fill_value : float
The sentinel value used for pixels with no contribution
(IDW's ``fill_value`` parameter, or ``np.nan`` for kriging
and spline which evaluate everywhere).

Returns
-------
dict
New attrs dict suitable for ``xr.DataArray(..., attrs=...)``.
"""
out_attrs = dict(template_attrs) if template_attrs else {}
for key in ('nodata', '_FillValue', 'nodatavals'):
out_attrs.pop(key, None)

try:
fill_as_float = float(fill_value)
fill_is_nan = np.isnan(fill_as_float)
except (TypeError, ValueError):
fill_is_nan = False

if not fill_is_nan:
out_attrs['nodata'] = fill_value
out_attrs['_FillValue'] = fill_value
out_attrs['nodatavals'] = (fill_value,)

return out_attrs


def extract_grid_coords(template, *, func_name):
"""Extract 1-D coordinate vectors from a 2-D template DataArray.

Expand Down
149 changes: 149 additions & 0 deletions xrspatial/tests/test_interpolation.py
Original file line number Diff line number Diff line change
Expand Up @@ -628,3 +628,152 @@ def test_check_helper_estimate_message(self, monkeypatch):
# N=500 -> kernel_bytes = 4 * 500**2 * 8 = 8 MB > 1*0.8 MB.
with pytest.raises(MemoryError, match='kernel block K'):
_check_spline_memory(n_points=500, grid_pixels=100)


# ===================================================================
# Metadata propagation tests
# ===================================================================

class TestMetadataPropagation:
"""Verify attrs / coords / dims propagation across interpolators.

The template is a grid donor: spatial-reference attrs (``crs``,
``res``, ``transform``) must survive to the output, but the
template's nodata sentinel must NOT leak onto the float64
interpolation surface where it no longer represents the actual
sentinel value.
"""

@staticmethod
def _georef_template():
"""Template with integer-style nodata sentinel and spatial-ref attrs."""
template = _make_template([0.0, 1.0, 2.0], [0.0, 1.0, 2.0])
template.attrs = {
'res': (1.0, 1.0),
'crs': 'EPSG:32633',
'transform': (1.0, 0.0, 0.0, 0.0, 1.0, 0.0),
'nodatavals': (-9999,),
'_FillValue': -9999,
'nodata': -9999,
}
return template

@staticmethod
def _grid_points_simple():
x = np.array([0.0, 1.0, 2.0, 0.0, 1.0, 2.0, 0.0, 1.0, 2.0])
y = np.array([0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0])
z = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0])
return x, y, z

# --- idw ---

def test_idw_preserves_spatial_ref_attrs(self):
x, y, z = self._grid_points_simple()
template = self._georef_template()
result = idw(x, y, z, template)
assert result.attrs['crs'] == 'EPSG:32633'
assert result.attrs['res'] == (1.0, 1.0)
assert result.attrs['transform'] == (1.0, 0.0, 0.0, 0.0, 1.0, 0.0)

def test_idw_default_fill_strips_template_nodata(self):
"""Default fill_value=NaN: template's -9999 sentinel must not leak."""
x, y, z = self._grid_points_simple()
template = self._georef_template()
result = idw(x, y, z, template)
assert 'nodatavals' not in result.attrs
assert '_FillValue' not in result.attrs
assert 'nodata' not in result.attrs

def test_idw_explicit_fill_emits_nodata_triplet(self):
x, y, z = self._grid_points_simple()
template = _make_template([0.0, 1.0, 2.0], [0.0, 1.0, 2.0])
template.attrs = {'crs': 'EPSG:4326'}
result = idw(x, y, z, template, fill_value=-1.0)
assert result.attrs['nodata'] == -1.0
assert result.attrs['_FillValue'] == -1.0
assert result.attrs['nodatavals'] == (-1.0,)
assert result.attrs['crs'] == 'EPSG:4326'

def test_idw_dims_and_coords_preserved(self):
x, y, z = self._grid_points_simple()
template = self._georef_template()
result = idw(x, y, z, template)
assert result.dims == template.dims
np.testing.assert_array_equal(
result.coords['x'].values, template.coords['x'].values)
np.testing.assert_array_equal(
result.coords['y'].values, template.coords['y'].values)

def test_idw_does_not_mutate_template_attrs(self):
"""Helper must not alias / mutate the caller's attrs dict."""
template = self._georef_template()
snapshot = dict(template.attrs)
idw([0.5], [0.5], [1.0], template, fill_value=-1.0)
assert template.attrs == snapshot

@dask_array_available
def test_idw_dask_attrs_match_numpy(self):
x, y, z = self._grid_points_simple()
np_template = self._georef_template()
da_template = self._georef_template()
da_template.data = __import__('dask.array', fromlist=['from_array'])\
.from_array(da_template.data, chunks=(2, 2))
np_result = idw(x, y, z, np_template, fill_value=-1.0)
da_result = idw(x, y, z, da_template, fill_value=-1.0)
assert np_result.attrs == da_result.attrs

# --- kriging ---

def test_kriging_preserves_spatial_ref_attrs(self):
x = np.array([0.0, 1.0, 2.0, 3.0])
y = np.array([0.0, 0.0, 1.0, 1.0])
z = np.array([1.0, 2.0, 3.0, 4.0])
template = self._georef_template()
result = kriging(x, y, z, template)
assert result.attrs['crs'] == 'EPSG:32633'
assert result.attrs['res'] == (1.0, 1.0)

def test_kriging_strips_template_nodata(self):
x = np.array([0.0, 1.0, 2.0, 3.0])
y = np.array([0.0, 0.0, 1.0, 1.0])
z = np.array([1.0, 2.0, 3.0, 4.0])
template = self._georef_template()
result = kriging(x, y, z, template)
assert 'nodatavals' not in result.attrs
assert '_FillValue' not in result.attrs

def test_kriging_variance_attrs_match_prediction(self):
x = np.array([0.0, 1.0, 2.0, 3.0])
y = np.array([0.0, 0.0, 1.0, 1.0])
z = np.array([1.0, 2.0, 3.0, 4.0])
template = self._georef_template()
pred, var = kriging(x, y, z, template, return_variance=True)
assert var.attrs == pred.attrs
assert var.name == 'kriging_variance'

# --- spline ---

def test_spline_preserves_spatial_ref_attrs(self):
x, y, z = self._grid_points_simple()
template = self._georef_template()
result = spline(x, y, z, template)
assert result.attrs['crs'] == 'EPSG:32633'
assert result.attrs['res'] == (1.0, 1.0)

def test_spline_strips_template_nodata(self):
x, y, z = self._grid_points_simple()
template = self._georef_template()
result = spline(x, y, z, template)
assert 'nodatavals' not in result.attrs
assert '_FillValue' not in result.attrs
assert 'nodata' not in result.attrs

def test_spline_dims_and_coords_preserved(self):
x, y, z = self._grid_points_simple()
template = self._georef_template()
result = spline(x, y, z, template)
assert result.dims == template.dims
np.testing.assert_array_equal(
result.coords['x'].values, template.coords['x'].values)
np.testing.assert_array_equal(
result.coords['y'].values, template.coords['y'].values)