[SPARK-10334] Partitioned table scan's query plan does not show Filter and Project on top of the table scan - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.5.0
Fix Version/s: 1.5.0
Component/s: SQL
Labels:
None

Target Version/s:

1.5.0

Description

Seq(Tuple2(1, 1), Tuple2(2, 2)).toDF("i", "j").write.format("parquet").partitionBy("i").save("/tmp/testFilter_partitioned")
val df1 = sqlContext.read.format("parquet").load("/tmp/testFilter_partitioned")
df1.selectExpr("hash(i)", "hash(j)").show
df1.filter("hash(j) = 1").explain
== Physical Plan ==
Scan ParquetRelation[file:/tmp/testFilter_partitioned][j#20,i#21]

Looks like the reason is that we correctly apply the project and filter. Then, we create an RDD for the result and then manually create a PhysicalRDD. So, the Project and Filter on top of the original table scan disappears from the physical plan.

See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L138-L175

We will not generate wrong result. But, the query plan is confusing.

Attachments

Issue Links

links to

[Github] Pull Request #8515 (yhuai)

Activity

People

Assignee:: Yin Huai

Reporter:: Yin Huai

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Aug/15 17:37

Updated:: 30/Aug/15 00:32

Resolved:: 29/Aug/15 23:40