Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Apache Sedona optimizes all join having spatial predicates as join conditions, including equi-joins with spatial predicates. For example, the following query will be optimized as a RangeJoin:
df1.join(df2, df1("id1") === df2("id2") && ST_Contains(df1("geom"), df2("geom")))
Where it may be more efficient to run sort-merge join or hash join using the equi-condition df1.id1 = df2.id2 on this query. This problem also arises when users want to perform a spatial join using the S2 cell IDs of both geometries and then use a spatial predicate to filter false positives.
We propose to add a configuration to SedonaConf named sedona.join.optimizationmode, it can be configured as one of the following values:
- all: optimize all joins having spatial predicate in join conditions. This is the current behavior of Apache Sedona.
- none: disable spatial join optimization.
- nonequi: only enable spatial join optimization on non-equi joins. This will be the default mode.
When sedona.join.optimizationmode is configured as nonequi, it won't optimize the aforementioned equi-join.
Attachments
Issue Links
- links to