Seq(Tuple2(1, 1), Tuple2(2, 2)).toDF("i", "j").write.format("parquet").partitionBy("i").save("/tmp/testFilter_partitioned") val df1 ="parquet").load("/tmp/testFilter_partitioned") df1.selectExpr("hash(i)", "hash(j)").show df1.filter("hash(j) = 1").explain == Physical Plan == Scan ParquetRelation[file:/tmp/testFilter_partitioned][j#20,i#21]
Looks like the reason is that we correctly apply the project and filter. Then, we create an RDD for the result and then manually create a PhysicalRDD. So, the Project and Filter on top of the original table scan disappears from the physical plan.
We will not generate wrong result. But, the query plan is confusing.
Issue Links
- links to