[SPARK-12167] Invoke the right sameResult function when plan is warpped with SubQueries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.5.2
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

I find this bug when I use cache table,
```
spark-sql> create table src_p(key int, value int) stored as parquet;
OK
Time taken: 3.144 seconds
spark-sql> cache table src_p;
Time taken: 1.452 seconds
spark-sql> explain extended select count from src_p;
```
I got the wrong physical plan
```
== Physical Plan ==
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=_c0#28L)
TungstenExchange SinglePartition
TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=currentCount#33L)
Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][]
```
and the right physical plan is
```
== Physical Plan ==
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=_c0#47L)
TungstenExchange SinglePartition
TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=currentCount#62L)
InMemoryColumnarTableScan (InMemoryRelation key#45,value#46, true, 10000, StorageLevel(true, true, false, true, 1), (Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p]key#9,value#10), Some(src_p))
```

When the implementation classes of `MultiInstanceRelation`(eg. `LogicalRelation`, `LocalRelation`) are warpped with SubQueries, they can't invoke the right `sameResult` function in their own implementation. So we need to eliminate SubQueries first and then try to invoke `sameResult` function in their own implementation.
Like:
When plan is `Subquery(LogicalRelation(relation:ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p], expectedOutputAttributes:Some(ArrayBuffer(key#0, value#1))))`, first eliminate SubQueries, and then will invoke the `sameResult` function in `LogicalRelation` instead of `LogicalPlan`.

Attachments

Issue Links

duplicates

SPARK-11246 [1.5] Table cache for Parquet broken in 1.5

Resolved

links to

[Github] Pull Request #10169 (watermen)

Activity

People

Assignee:: Unassigned

Reporter:: Yadong Qi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Dec/15 03:59

Updated:: 08/Dec/15 04:45

Resolved:: 07/Dec/15 07:22