Skip to content

[GH-3031] Box3D spatial join: index ST_Intersects / ST_Contains on Box3D#3032

Merged
jiayuasu merged 1 commit into
apache:masterfrom
jiayuasu:feature/box3d-join-planner
Jun 9, 2026
Merged

[GH-3031] Box3D spatial join: index ST_Intersects / ST_Contains on Box3D#3032
jiayuasu merged 1 commit into
apache:masterfrom
jiayuasu:feature/box3d-join-planner

Conversation

@jiayuasu

@jiayuasu jiayuasu commented Jun 7, 2026

Copy link
Copy Markdown
Member

Did you read the Contributor Guide?

  • Yes

Is this PR related to a ticket?

  • Yes — closes #3031; follow-up to the Box3D EPIC (#2973).

What changes were proposed in this PR?

Box3D-on-Box3D ST_Intersects / ST_Contains joins were short-circuited in JoinQueryDetector (introduced in #3028) and ran as O(n × m) row-by-row evaluation. This PR wires them into the indexed path.

Approach: reuse the existing 2D R-tree pipeline by projecting each Box3D to its XY footprint, and re-check the Z axis per surviving candidate via the original predicate carried as an extra condition. No new index, no new partitioner.

  • JoinQueryDetector — replaced the two Box3D short-circuits with JoinQueryDetections analogous to the Box2D arms. New isBox3DPair helper. The spatial predicate is SpatialPredicate.INTERSECTS for both ST_Intersects and ST_Contains over Box3D (the R-tree's planar containment ignores Z anyway; the post-filter does the actual check). The original ST_Intersects / ST_Contains expression is ANDed back into extraCondition so the Z axis is rechecked per pair by TraitJoinQueryExec's per-pair filter.
  • TraitJoinQueryBase.shapeToGeometry — added a Box3DUDT case that materialises the XY footprint as a JTS rectangle via Constructors.polygonFromEnvelope(xmin, ymin, xmax, ymax). Validates ordered bounds on all three axes so inverted-bound input throws IllegalArgumentException, matching the scalar Box3D predicate contract.

Cost note: per-candidate refine is six getDouble calls on an in-memory InternalRow + six comparisons + two Box3D allocations. The GeographyJoinShape userData pattern was considered for caching the parsed Box3D, but skipped — Geography needs that because S2 deserialise is expensive; Box3D deserialise is essentially free.

How was this patch tested?

  • New Box3DJoinSuite mirrors Box2DJoinSuite. The fixture includes a discriminating row (L4) with the same XY footprint as L1/L2 but a Z range far above every right side, so every candidate the 2D R-tree emits for L4 is killed by the Z refine. Covers broadcast index join, range join, argument-order symmetry, closed-interval edge touching, and inverted-bound throw.
  • Removed the prior "falls back to row-by-row" regression test from Box3DIntersectsContainsSuite — that contract no longer holds; the join is indexed now.
  • Full Box3D/Box2D/predicate/pushdown/dataFrameAPI sweep on spark-3.5: 331 tests pass.

Did this PR include necessary documentation updates?

  • No — Box3D documentation is tracked separately under the Box3D EPIC. This PR's behaviour change (planner now picks an indexed plan for Box3D joins) will be reflected when that lands.

@jiayuasu jiayuasu force-pushed the feature/box3d-join-planner branch from f7c7d11 to 771480b Compare June 7, 2026 07:16
@jiayuasu jiayuasu marked this pull request as draft June 7, 2026 07:25
@jiayuasu jiayuasu requested a review from Copilot June 7, 2026 07:25

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Sedona Spark SQL’s join planner to route Box3D-on-Box3D ST_Intersects / ST_Contains joins through the existing indexed (2D R-tree) spatial join pipeline by projecting Box3D inputs to their XY footprints and refining candidate pairs using the original Box3D predicate as a post-filter.

Changes:

  • Plan Box3D ST_Intersects / ST_Contains joins via JoinQueryDetector and re-check Z overlap/containment through extraCondition.
  • Teach TraitJoinQueryBase.shapeToGeometry how to materialize Box3D as an XY rectangle (with inverted-bound validation on all three axes).
  • Add Box3DJoinSuite and remove the prior “falls back to row-by-row” regression test that no longer applies.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/JoinQueryDetector.scala Adds Box3D join detection and post-filter refinement wiring for ST_Intersects / ST_Contains.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/TraitJoinQueryBase.scala Projects Box3D to XY rectangles for the join index pipeline and validates bounds.
spark/common/src/test/scala/org/apache/sedona/sql/Box3DJoinSuite.scala New planner/executor tests ensuring indexed plans and Z-axis refinement correctness.
spark/common/src/test/scala/org/apache/sedona/sql/Box3DIntersectsContainsSuite.scala Removes the obsolete “row-by-row fallback” join regression test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

right,
leftShape,
rightShape,
SpatialPredicate.INTERSECTS,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest push: ST_Contains arm now uses SpatialPredicate.COVERS. The R-tree refine prunes XY-overlapping-but-not-covering candidates before they reach the Z recheck.

Comment on lines +287 to +289
// overlap in XY but not Z are filtered out. SpatialPredicate.INTERSECTS is used for
// both arms because the R-tree's planar containment ignores Z anyway; the post-filter
// does the actual containment check.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment block above the two arms rewritten to describe both predicates: INTERSECTS for ST_Intersects, COVERS for ST_Contains.

Comment on lines +31 to +33
* Z-axis refine step. L4 is the discriminating row: same XY footprint as L1/L2 so the 2D R-tree
* pairs it with every right in the (0..10) XY cluster, but its Z range is far above every
* right's Z so the refine must reject all those candidates.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: the scaladoc now says L4 is "Z-disjoint from every right (above R1/R2's Z, below R4's Z)" rather than "far above every right's Z".

Comment on lines +37 to +38
* - L4=(0..10, 0..10, 50..60) — XY in the cluster, Z far above. Every L4-R* candidate the
* R-tree emits has to be killed by the Z refine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: L4 bullet now reads "XY in the cluster, Z disjoint from every right side."

assert(joined.queryExecution.executedPlan.collectFirst { case _: BroadcastIndexJoinExec =>
true
}.isDefined)
// 4 true intersections — (L4, R4) is excluded by the Z refine even though XY overlaps.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: the inline comment now reads "The L4 row produces three XY-overlapping candidates (L4-R1, L4-R2, L4-R4) that the 2D R-tree emits and the Z refine rejects." That matches what the fixture actually exercises.

@jiayuasu jiayuasu force-pushed the feature/box3d-join-planner branch from 771480b to 2865412 Compare June 7, 2026 20:46
… on Box3D

Reuses the existing 2D R-tree pipeline by projecting each Box3D to its
XY footprint at the join boundary. The Z axis is re-checked per
surviving candidate by folding the original Box3D predicate back into
`extraCondition`, so candidates that overlap in XY but not Z are
filtered out by the per-pair refine step in TraitJoinQueryExec.

- JoinQueryDetector: replaced the two Box3D short-circuits with planned
  `JoinQueryDetection`s. New `isBox3DPair` helper; the spatial predicate
  is `SpatialPredicate.INTERSECTS` for both arms (the R-tree's planar
  containment ignores Z anyway; the post-filter does the actual check).
  The original `ST_Intersects` / `ST_Contains` expression is ANDed back
  into `extraCondition` so the Z axis is rechecked per pair.
- TraitJoinQueryBase.shapeToGeometry: added a Box3DUDT case that
  materialises the XY footprint as a JTS rectangle (Constructors.
  polygonFromEnvelope). Validates ordered bounds on all three axes so
  inverted-bound input throws IllegalArgumentException, matching the
  scalar Box3D predicate contract.

Tests:
- New Box3DJoinSuite mirrors Box2DJoinSuite. Test fixtures include a
  discriminating row (L4) with the same XY footprint as L1/L2 but a
  Z range far above every right side, so every candidate it produces
  via the 2D R-tree is killed by the Z refine. Covers broadcast index
  join, range join, argument-order symmetry, closed-interval edge
  touching, and inverted-bound throw.
- Removed the prior "falls back to row-by-row" regression test from
  Box3DIntersectsContainsSuite — that contract no longer holds, the
  join is now indexed.
@jiayuasu jiayuasu force-pushed the feature/box3d-join-planner branch from 2865412 to 87eb033 Compare June 7, 2026 22:25
@jiayuasu jiayuasu marked this pull request as ready for review June 8, 2026 17:12
@jiayuasu jiayuasu added this to the sedona-1.9.1 milestone Jun 9, 2026
@jiayuasu jiayuasu merged commit a7fff26 into apache:master Jun 9, 2026
46 of 48 checks passed
jiayuasu added a commit to jiayuasu/sedona that referenced this pull request Jun 9, 2026
Wires ST_3DDWithin into the existing distance-join pipeline using the
same XY-projection trick as apache#3032 (Box3D ST_Intersects / ST_Contains).
Correctness rests on the inequality |A_XY - B_XY|_2 <= |A - B|_3D, so
expanding each XY rectangle by `distance` and probing the 2D R-tree
gives a valid superset filter; the per-pair condition then enforces
the 3D distance via the original predicate.

- JoinQueryDetector: new case for `ST_3DDWithin(Seq(left, right, d))`
  producing a JoinQueryDetection with SpatialPredicate.INTERSECTS
  (R-tree's distance-expanded envelope pass), `condition` (the full
  original join condition) as the per-pair filter, and
  `distance = Some(d)` so the executor builds an expanded envelope.
  Routes both overloads through the same plan — Geometry inputs land
  on the JTS envelope, Box3D inputs on the XY footprint materialised
  by apache#3032. Explain output adds an "ST_3DDWithin" label.
- OptimizableJoinCondition: add `ST_3DDWithin` to the whitelist of
  distance-join predicates.

Tests:
- New Box3DDWithinJoinSuite covers broadcast and non-broadcast distance
  joins, the closed-interval threshold edge, and a discriminating row
  XY-overlapping with the left side but Z=99 away — the R-tree pairs
  it via XY-expansion, the 3D refine rejects it. Geometry-input case
  (POINT Z) exercises the same plan via the Geometry overload.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spatial join planner: index ST_Intersects / ST_Contains on Box3D inputs

2 participants