- "One of the problems with the dataset is the presence of many duplicate records for the same flood event. While Google did perform some spatio-temporal aggregation, the dataset still has overlapping records from thesame flood event that have different geographic extent (captured from different articles) and/or slightly varying dates. Our goal is to count the total unique flood events aggregated for each grid. Such duplicates would show up as spatially intersecting polygons with `start_date` values within a few days of each other. We use a vectorized `STRtree` bulk query to find all such candidate pairs efficiently."
0 commit comments