Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12167

Invoke the right sameResult function when plan is warpped with SubQueries

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.5.2
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      I find this bug when I use cache table,
      ```
      spark-sql> create table src_p(key int, value int) stored as parquet;
      OK
      Time taken: 3.144 seconds
      spark-sql> cache table src_p;
      Time taken: 1.452 seconds
      spark-sql> explain extended select count from src_p;
      ```
      I got the wrong physical plan
      ```
      == Physical Plan ==
      TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=_c0#28L)
      TungstenExchange SinglePartition
      TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=currentCount#33L)
      Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][]
      ```
      and the right physical plan is
      ```
      == Physical Plan ==
      TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=_c0#47L)
      TungstenExchange SinglePartition
      TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=currentCount#62L)
      InMemoryColumnarTableScan (InMemoryRelation key#45,value#46, true, 10000, StorageLevel(true, true, false, true, 1), (Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p]key#9,value#10), Some(src_p))
      ```

      When the implementation classes of `MultiInstanceRelation`(eg. `LogicalRelation`, `LocalRelation`) are warpped with SubQueries, they can't invoke the right `sameResult` function in their own implementation. So we need to eliminate SubQueries first and then try to invoke `sameResult` function in their own implementation.
      Like:
      When plan is `Subquery(LogicalRelation(relation:ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p], expectedOutputAttributes:Some(ArrayBuffer(key#0, value#1))))`, first eliminate SubQueries, and then will invoke the `sameResult` function in `LogicalRelation` instead of `LogicalPlan`.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                waterman Yadong Qi
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: