Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.0
Description
Sedona cannot handle dataframes containing 0 partitions properly when performing spatial join. For example:
schema = StructType([ StructField("id", IntegerType(), True) ]) # Create empty RDD empty_rdd = spark.sparkContext.emptyRDD() # Create empty DataFrame empty_df = spark.createDataFrame(empty_rdd, schema) df_point = spark.range(0, 10).toDF("id").withColumn('geom', expr("ST_Point(id, id)")) df_poly = empty_df.withColumn("poly", expr("ST_Buffer(ST_Point(id, id), 2)")).drop("geom") spark.conf.set("sedona.join.autoBroadcastJoinThreshold", "-1") df_point.join(broadcast(df_poly), expr("ST_Intersects(poly, geom)")).count()
failed with the following error message:
Py4JJavaError: An error occurred while calling o107.showString.
: java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.IterableLike.head(IterableLike.scala:109)
at scala.collection.IterableLike.head$(IterableLike.scala:108)
at scala.collection.AbstractIterable.head(Iterable.scala:56)
at org.apache.spark.sql.sedona_sql.strategy.join.SpatialIndexExec.doExecuteBroadcast(SpatialIndexExec.scala:63)
This does not only happen to broadcast join, range join also has problems:
df_point.join(df_poly, expr("ST_Intersects(poly, geom)")).count() 24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Join dominant side partition number 8 is larger than 1/2 of the dominant side count 10 24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Try to use follower side partition number 0 Number of partitions must be >= 0
Attachments
Issue Links
- links to