Details
Description
Summary
No plan for BroadcastHint is generated in some condition.
Test Case
val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value") val parquetTempFile = "%s/SPARK-xxxx_%d.parquet".format(System.getProperty("java.io.tmpdir"), scala.util.Random.nextInt) df1.write.parquet(parquetTempFile) val pf1 = sqlContext.read.parquet(parquetTempFile) #1. df1.join(broadcast(pf1)).count() #2. broadcast(pf1).count()
Result
It will trigger assertion in QueryPlanner.scala, like below:
scala> df1.join(broadcast(pf1)).count() java.lang.AssertionError: assertion failed: No plan for BroadcastHint +- Relation[key#6,value#7] ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-xxxx_1817830406.parquet] at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)