Description
Seq(Tuple2(1, 1), Tuple2(2, 2)).toDF("i", "j").write.format("parquet").partitionBy("i").save("/tmp/testFilter_partitioned") val df1 = sqlContext.read.format("parquet").load("/tmp/testFilter_partitioned") df1.selectExpr("hash(i)", "hash(j)").show df1.filter("hash(j) = 1").explain == Physical Plan == Scan ParquetRelation[file:/tmp/testFilter_partitioned][j#20,i#21]
Looks like the reason is that we correctly apply the project and filter. Then, we create an RDD for the result and then manually create a PhysicalRDD. So, the Project and Filter on top of the original table scan disappears from the physical plan.
We will not generate wrong result. But, the query plan is confusing.
Attachments
Issue Links
- links to