[SEDONA-560] Spatial join involving dataframe containing 0 partition throws exception - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: 1.6.1
Labels:
- pull-request-available

Description

Sedona cannot handle dataframes containing 0 partitions properly when performing spatial join. For example:

schema = StructType([
    StructField("id", IntegerType(), True)
])

# Create empty RDD
empty_rdd = spark.sparkContext.emptyRDD()

# Create empty DataFrame
empty_df = spark.createDataFrame(empty_rdd, schema)

df_point = spark.range(0, 10).toDF("id").withColumn('geom', expr("ST_Point(id, id)"))
df_poly = empty_df.withColumn("poly", expr("ST_Buffer(ST_Point(id, id), 2)")).drop("geom")

spark.conf.set("sedona.join.autoBroadcastJoinThreshold", "-1")
df_point.join(broadcast(df_poly), expr("ST_Intersects(poly, geom)")).count()

failed with the following error message:

Py4JJavaError: An error occurred while calling o107.showString.
: java.util.NoSuchElementException: next on empty iterator
    at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
    at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
    at scala.collection.IterableLike.head(IterableLike.scala:109)
    at scala.collection.IterableLike.head$(IterableLike.scala:108)
    at scala.collection.AbstractIterable.head(Iterable.scala:56)
    at org.apache.spark.sql.sedona_sql.strategy.join.SpatialIndexExec.doExecuteBroadcast(SpatialIndexExec.scala:63)

This does not only happen to broadcast join, range join also has problems:

df_point.join(df_poly, expr("ST_Intersects(poly, geom)")).count()

24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Join dominant side partition number 8 is larger than 1/2 of the dominant side count 10
24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Try to use follower side partition number 0
Number of partitions must be >= 0

Attachments

Issue Links

links to

GitHub Pull Request #1430

Activity

People

Assignee:: Unassigned

Reporter:: Kristin Cowalcijk

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 27/May/24 02:51

Updated:: 02/Jun/24 05:37

Resolved:: 02/Jun/24 05:37

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m