Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.3.1
Description
The following distance join query won't run using broadcast index join:
SELECT * FROM df1 JOIN df2 ON ST_Distance(df1.geom, df2.geom) < df2.dist
The exception raised by Sedona is as follows:
Couldn't find dist#8638 in [id#8583,geom#8589] java.lang.IllegalStateException: Couldn't find dist#8638 in [id#8583,geom#8589] at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:528) at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73) at org.apache.spark.sql.sedona_sql.strategy.join.SpatialIndexExec.doExecuteBroadcast(SpatialIndexExec.scala:54)
If the distance expression references attribute from the left-side relation, the distance join will run without problem when using broadcast index join.
SELECT * FROM df1 JOIN df2 ON ST_Distance(df1.geom, df2.geom) < df1.dist
The space-partitioned distance join does not have this problem.
Attachments
Issue Links
- links to