Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12899 Bring query optimization time down
  3. HIVE-13808

Use constant expressions to backtrack when we create ReduceSink

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.1.0
    • Parser
    • None

    Description

      Follow-up of HIVE-13068.

      When we create a RS with constant expressions as keys/values, and immediately after we create a SEL operator that backtracks the expressions from the RS. Currently, we automatically create references for all the keys/values.

      Before, we could rely on Hive ConstantPropagate to propagate the constants to the SEL. However, after HIVE-13068, Hive ConstantPropagate does not get exercised anymore. Thus, we can simply create constant expressions when we create the SEL operator instead of a reference.

      Ex. ql/src/test/results/clientpositive/vector_coalesce.q.out

      EXPLAIN SELECT cdouble, cstring1, cint, cfloat, csmallint, coalesce(cdouble, cstring1, cint, cfloat, csmallint) as c
      FROM alltypesorc
      WHERE (cdouble IS NULL)
      ORDER BY cdouble, cstring1, cint, cfloat, csmallint, c
      LIMIT 10
      

      Plan:

      EXPLAIN SELECT cdouble, cstring1, cint, cfloat, csmallint, coalesce(cdouble, cstring1, cint, cfloat, csmallint) as c
      FROM alltypesorc
      WHERE (cdouble IS NULL)
      ORDER BY cdouble, cstring1, cint, cfloat, csmallint, c
      LIMIT 10
      POSTHOOK: type: QUERY
      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Map Reduce
            Map Operator Tree:
                TableScan
                  alias: alltypesorc
                  Statistics: Num rows: 12288 Data size: 2641964 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: cdouble is null (type: boolean)
                    Statistics: Num rows: 6144 Data size: 1320982 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: cstring1 (type: string), cint (type: int), cfloat (type: float), csmallint (type: smallint), COALESCE(null,cstring1,cint,cfloat,csmallint) (type: string)
                      outputColumnNames: _col1, _col2, _col3, _col4, _col5
                      Statistics: Num rows: 6144 Data size: 1320982 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: null (type: double), _col1 (type: string), _col2 (type: int), _col3 (type: float), _col4 (type: smallint), _col5 (type: string)
                        sort order: ++++++
                        Statistics: Num rows: 6144 Data size: 1320982 Basic stats: COMPLETE Column stats: NONE
                        TopN Hash Memory Usage: 0.1
            Execution mode: vectorized
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: double), KEY.reducesinkkey1 (type: string), KEY.reducesinkkey2 (type: int), KEY.reducesinkkey3 (type: float), KEY.reducesinkkey4 (type: smallint), KEY.reducesinkkey5 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                Statistics: Num rows: 6144 Data size: 1320982 Basic stats: COMPLETE Column stats: NONE
                Limit
                  Number of rows: 10
                  Statistics: Num rows: 10 Data size: 2150 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 10 Data size: 2150 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: 10
            Processor Tree:
              ListSink
      

      Attachments

        1. HIVE-13808.01.patch
          130 kB
          jcamachorodriguez
        2. HIVE-13808.patch
          13 kB
          jcamachorodriguez

        Issue Links

          Activity

            People

              jcamacho Jesús Camacho Rodríguez
              jcamacho Jesús Camacho Rodríguez
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: