Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-262

Don't optimize equi-join by default, add an option to configure when to optimize spatial joins

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0

    Description

      Apache Sedona optimizes all join having spatial predicates as join conditions, including equi-joins with spatial predicates. For example, the following query will be optimized as a RangeJoin:

      df1.join(df2, df1("id1") === df2("id2") && ST_Contains(df1("geom"), df2("geom")))
      

      Where it may be more efficient to run sort-merge join or hash join using the equi-condition df1.id1 = df2.id2 on this query. This problem also arises when users want to perform a spatial join using the S2 cell IDs of both geometries and then use a spatial predicate to filter false positives.

      We propose to add a configuration to SedonaConf named sedona.join.optimizationmode, it can be configured as one of the following values:

      • all: optimize all joins having spatial predicate in join conditions. This is the current behavior of Apache Sedona.
      • none: disable spatial join optimization.
      • nonequi: only enable spatial join optimization on non-equi joins. This will be the default mode.

      When sedona.join.optimizationmode is configured as nonequi, it won't optimize the aforementioned equi-join.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kontinuation Kristin Cowalcijk
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m