Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12998

Enable OrcRelation when connecting via spark thrift server

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • SQL
    • None

    Description

      When a user connects via spark-thrift server to execute SQL, it does not enable PPD with ORC. It ends up creating MetastoreRelation which does not have ORC PPD. Purpose of this JIRA is to convert MetastoreRelation to OrcRelation in HiveMetastoreCatalog, so that users can benefit from PPD even when connecting to spark-thrift server.

      For example, "explain select count(1) from  tpch_flat_orc_1000.lineitem where l_shipdate = '1990-04-18'", current plan is 
      
      +------------------------------------------------------------------------------------------------------------------+--+
      |                                                       plan                                                       |
      +------------------------------------------------------------------------------------------------------------------+--+
      | == Physical Plan ==                                                                                              |
      | TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#17L])                  |
      | +- Exchange SinglePartition, None                                                                                |
      |    +- WholeStageCodegen                                                                                          |
      |       :  +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#20L])  |
      |       :     +- Project                                                                                           |
      |       :        +- Filter (l_shipdate#11 = 1990-04-18)                                                            |
      |       :           +- INPUT                                                                                       |
      |       +- HiveTableScan [l_shipdate#11], MetastoreRelation tpch_1000, lineitem, None                     |
      +------------------------------------------------------------------------------------------------------------------+--+
      
      It would be good to change it to OrcRelation to do PPD with ORC, which reduces the runtime by large margin.
       
      +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
      |                                                                                             plan                                                                                              |
      +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
      | == Physical Plan ==                                                                                                                                                                           |
      | TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#70L])                                                                                               |
      | +- Exchange SinglePartition, None                                                                                                                                                             |
      |    +- WholeStageCodegen                                                                                                                                                                       |
      |       :  +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#106L])                                                                              |
      |       :     +- Project                                                                                                                                                                        |
      |       :        +- Filter (_col10#64 = 1990-04-18)                                                                                                                                             |
      |       :           +- INPUT                                                                                                                                                                    |
      |       +- Scan OrcRelation[_col10#64] InputPaths: hdfs://nn:8020/apps/hive/warehouse/tpch_1000.db/lineitem, PushedFilters: [EqualTo(_col10,1990-04-18)]  |
      +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
      
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rajesh.balamohan Rajesh Balamohan
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: