Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-560

Spatial join involving dataframe containing 0 partition throws exception

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.6.1

    Description

      Sedona cannot handle dataframes containing 0 partitions properly when performing spatial join. For example:

      schema = StructType([
          StructField("id", IntegerType(), True)
      ])
      
      # Create empty RDD
      empty_rdd = spark.sparkContext.emptyRDD()
      
      # Create empty DataFrame
      empty_df = spark.createDataFrame(empty_rdd, schema)
      
      df_point = spark.range(0, 10).toDF("id").withColumn('geom', expr("ST_Point(id, id)"))
      df_poly = empty_df.withColumn("poly", expr("ST_Buffer(ST_Point(id, id), 2)")).drop("geom")
      
      spark.conf.set("sedona.join.autoBroadcastJoinThreshold", "-1")
      df_point.join(broadcast(df_poly), expr("ST_Intersects(poly, geom)")).count()
      

      failed with the following error message:

      Py4JJavaError: An error occurred while calling o107.showString.
      : java.util.NoSuchElementException: next on empty iterator
          at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
          at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
          at scala.collection.IterableLike.head(IterableLike.scala:109)
          at scala.collection.IterableLike.head$(IterableLike.scala:108)
          at scala.collection.AbstractIterable.head(Iterable.scala:56)
          at org.apache.spark.sql.sedona_sql.strategy.join.SpatialIndexExec.doExecuteBroadcast(SpatialIndexExec.scala:63)
      

      This does not only happen to broadcast join, range join also has problems:

      df_point.join(df_poly, expr("ST_Intersects(poly, geom)")).count()
      
      24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Join dominant side partition number 8 is larger than 1/2 of the dominant side count 10
      24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Try to use follower side partition number 0
      Number of partitions must be >= 0
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kontinuation Kristin Cowalcijk
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m