[GH-3031] Box3D spatial join: index ST_Intersects / ST_Contains on Box3D#3032
Conversation
f7c7d11 to
771480b
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates Sedona Spark SQL’s join planner to route Box3D-on-Box3D ST_Intersects / ST_Contains joins through the existing indexed (2D R-tree) spatial join pipeline by projecting Box3D inputs to their XY footprints and refining candidate pairs using the original Box3D predicate as a post-filter.
Changes:
- Plan Box3D
ST_Intersects/ST_Containsjoins viaJoinQueryDetectorand re-check Z overlap/containment throughextraCondition. - Teach
TraitJoinQueryBase.shapeToGeometryhow to materialize Box3D as an XY rectangle (with inverted-bound validation on all three axes). - Add
Box3DJoinSuiteand remove the prior “falls back to row-by-row” regression test that no longer applies.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/JoinQueryDetector.scala | Adds Box3D join detection and post-filter refinement wiring for ST_Intersects / ST_Contains. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/TraitJoinQueryBase.scala | Projects Box3D to XY rectangles for the join index pipeline and validates bounds. |
| spark/common/src/test/scala/org/apache/sedona/sql/Box3DJoinSuite.scala | New planner/executor tests ensuring indexed plans and Z-axis refinement correctness. |
| spark/common/src/test/scala/org/apache/sedona/sql/Box3DIntersectsContainsSuite.scala | Removes the obsolete “row-by-row fallback” join regression test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| right, | ||
| leftShape, | ||
| rightShape, | ||
| SpatialPredicate.INTERSECTS, |
There was a problem hiding this comment.
Fixed in latest push: ST_Contains arm now uses SpatialPredicate.COVERS. The R-tree refine prunes XY-overlapping-but-not-covering candidates before they reach the Z recheck.
| // overlap in XY but not Z are filtered out. SpatialPredicate.INTERSECTS is used for | ||
| // both arms because the R-tree's planar containment ignores Z anyway; the post-filter | ||
| // does the actual containment check. |
There was a problem hiding this comment.
Comment block above the two arms rewritten to describe both predicates: INTERSECTS for ST_Intersects, COVERS for ST_Contains.
| * Z-axis refine step. L4 is the discriminating row: same XY footprint as L1/L2 so the 2D R-tree | ||
| * pairs it with every right in the (0..10) XY cluster, but its Z range is far above every | ||
| * right's Z so the refine must reject all those candidates. |
There was a problem hiding this comment.
Fixed: the scaladoc now says L4 is "Z-disjoint from every right (above R1/R2's Z, below R4's Z)" rather than "far above every right's Z".
| * - L4=(0..10, 0..10, 50..60) — XY in the cluster, Z far above. Every L4-R* candidate the | ||
| * R-tree emits has to be killed by the Z refine. |
There was a problem hiding this comment.
Fixed: L4 bullet now reads "XY in the cluster, Z disjoint from every right side."
| assert(joined.queryExecution.executedPlan.collectFirst { case _: BroadcastIndexJoinExec => | ||
| true | ||
| }.isDefined) | ||
| // 4 true intersections — (L4, R4) is excluded by the Z refine even though XY overlaps. |
There was a problem hiding this comment.
Fixed: the inline comment now reads "The L4 row produces three XY-overlapping candidates (L4-R1, L4-R2, L4-R4) that the 2D R-tree emits and the Z refine rejects." That matches what the fixture actually exercises.
771480b to
2865412
Compare
… on Box3D Reuses the existing 2D R-tree pipeline by projecting each Box3D to its XY footprint at the join boundary. The Z axis is re-checked per surviving candidate by folding the original Box3D predicate back into `extraCondition`, so candidates that overlap in XY but not Z are filtered out by the per-pair refine step in TraitJoinQueryExec. - JoinQueryDetector: replaced the two Box3D short-circuits with planned `JoinQueryDetection`s. New `isBox3DPair` helper; the spatial predicate is `SpatialPredicate.INTERSECTS` for both arms (the R-tree's planar containment ignores Z anyway; the post-filter does the actual check). The original `ST_Intersects` / `ST_Contains` expression is ANDed back into `extraCondition` so the Z axis is rechecked per pair. - TraitJoinQueryBase.shapeToGeometry: added a Box3DUDT case that materialises the XY footprint as a JTS rectangle (Constructors. polygonFromEnvelope). Validates ordered bounds on all three axes so inverted-bound input throws IllegalArgumentException, matching the scalar Box3D predicate contract. Tests: - New Box3DJoinSuite mirrors Box2DJoinSuite. Test fixtures include a discriminating row (L4) with the same XY footprint as L1/L2 but a Z range far above every right side, so every candidate it produces via the 2D R-tree is killed by the Z refine. Covers broadcast index join, range join, argument-order symmetry, closed-interval edge touching, and inverted-bound throw. - Removed the prior "falls back to row-by-row" regression test from Box3DIntersectsContainsSuite — that contract no longer holds, the join is now indexed.
2865412 to
87eb033
Compare
Wires ST_3DDWithin into the existing distance-join pipeline using the same XY-projection trick as apache#3032 (Box3D ST_Intersects / ST_Contains). Correctness rests on the inequality |A_XY - B_XY|_2 <= |A - B|_3D, so expanding each XY rectangle by `distance` and probing the 2D R-tree gives a valid superset filter; the per-pair condition then enforces the 3D distance via the original predicate. - JoinQueryDetector: new case for `ST_3DDWithin(Seq(left, right, d))` producing a JoinQueryDetection with SpatialPredicate.INTERSECTS (R-tree's distance-expanded envelope pass), `condition` (the full original join condition) as the per-pair filter, and `distance = Some(d)` so the executor builds an expanded envelope. Routes both overloads through the same plan — Geometry inputs land on the JTS envelope, Box3D inputs on the XY footprint materialised by apache#3032. Explain output adds an "ST_3DDWithin" label. - OptimizableJoinCondition: add `ST_3DDWithin` to the whitelist of distance-join predicates. Tests: - New Box3DDWithinJoinSuite covers broadcast and non-broadcast distance joins, the closed-interval threshold edge, and a discriminating row XY-overlapping with the left side but Z=99 away — the R-tree pairs it via XY-expansion, the 3D refine rejects it. Geometry-input case (POINT Z) exercises the same plan via the Geometry overload.
Did you read the Contributor Guide?
Is this PR related to a ticket?
What changes were proposed in this PR?
Box3D-on-Box3D
ST_Intersects/ST_Containsjoins were short-circuited inJoinQueryDetector(introduced in #3028) and ran as O(n × m) row-by-row evaluation. This PR wires them into the indexed path.Approach: reuse the existing 2D R-tree pipeline by projecting each Box3D to its XY footprint, and re-check the Z axis per surviving candidate via the original predicate carried as an extra condition. No new index, no new partitioner.
JoinQueryDetector— replaced the two Box3D short-circuits withJoinQueryDetections analogous to the Box2D arms. NewisBox3DPairhelper. The spatial predicate isSpatialPredicate.INTERSECTSfor bothST_IntersectsandST_Containsover Box3D (the R-tree's planar containment ignores Z anyway; the post-filter does the actual check). The originalST_Intersects/ST_Containsexpression is ANDed back intoextraConditionso the Z axis is rechecked per pair byTraitJoinQueryExec's per-pair filter.TraitJoinQueryBase.shapeToGeometry— added aBox3DUDTcase that materialises the XY footprint as a JTS rectangle viaConstructors.polygonFromEnvelope(xmin, ymin, xmax, ymax). Validates ordered bounds on all three axes so inverted-bound input throwsIllegalArgumentException, matching the scalar Box3D predicate contract.Cost note: per-candidate refine is six
getDoublecalls on an in-memoryInternalRow+ six comparisons + twoBox3Dallocations. TheGeographyJoinShapeuserDatapattern was considered for caching the parsed Box3D, but skipped — Geography needs that because S2 deserialise is expensive; Box3D deserialise is essentially free.How was this patch tested?
Box3DJoinSuitemirrorsBox2DJoinSuite. The fixture includes a discriminating row (L4) with the same XY footprint as L1/L2 but a Z range far above every right side, so every candidate the 2D R-tree emits for L4 is killed by the Z refine. Covers broadcast index join, range join, argument-order symmetry, closed-interval edge touching, and inverted-bound throw.Box3DIntersectsContainsSuite— that contract no longer holds; the join is indexed now.Did this PR include necessary documentation updates?