Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-560

Spatial join involving dataframe containing 0 partition throws exception

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.6.1

    Description

      Sedona cannot handle dataframes containing 0 partitions properly when performing spatial join. For example:

      schema = StructType([
          StructField("id", IntegerType(), True)
      ])
      
      # Create empty RDD
      empty_rdd = spark.sparkContext.emptyRDD()
      
      # Create empty DataFrame
      empty_df = spark.createDataFrame(empty_rdd, schema)
      
      df_point = spark.range(0, 10).toDF("id").withColumn('geom', expr("ST_Point(id, id)"))
      df_poly = empty_df.withColumn("poly", expr("ST_Buffer(ST_Point(id, id), 2)")).drop("geom")
      
      spark.conf.set("sedona.join.autoBroadcastJoinThreshold", "-1")
      df_point.join(broadcast(df_poly), expr("ST_Intersects(poly, geom)")).count()
      

      failed with the following error message:

      Py4JJavaError: An error occurred while calling o107.showString.
      : java.util.NoSuchElementException: next on empty iterator
          at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
          at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
          at scala.collection.IterableLike.head(IterableLike.scala:109)
          at scala.collection.IterableLike.head$(IterableLike.scala:108)
          at scala.collection.AbstractIterable.head(Iterable.scala:56)
          at org.apache.spark.sql.sedona_sql.strategy.join.SpatialIndexExec.doExecuteBroadcast(SpatialIndexExec.scala:63)
      

      This does not only happen to broadcast join, range join also has problems:

      df_point.join(df_poly, expr("ST_Intersects(poly, geom)")).count()
      
      24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Join dominant side partition number 8 is larger than 1/2 of the dominant side count 10
      24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Try to use follower side partition number 0
      Number of partitions must be >= 0
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            kontinuation Kristin Cowalcijk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                Slack

                  Issue deployment