Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-19

Global indexing does not work with SQL joins

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.0.1

    Description

      According to the documentation, global indexing can be used with SQL joins and is enabled by default. But the code path here: 

      https://github.com/apache/incubator-sedona/blob/master/sql/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/TraitJoinQueryExec.scala#L125

      calls the JoinParams constructor here:

      https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L434

      which always sets useIndex to false. This prevents indexing from being possible via SQL queries, and the non-indexed join doesn't work well with large datasets (separate issue, loads all of the non-window objects of each partition into memory at once and quickly runs out of memory)

      Also, this python adapter uses the arguments incorrectly as well here:

      https://github.com/apache/incubator-sedona/blob/master/python-adapter/src/main/scala/org.apache.sedona.python.wrapper/adapters/JoinParamsAdapter.scala#L29

      Need to update the signature of JoinParams, probably just add one with all four parameters

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kimahriman Adam Binford
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: