Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21338

Remove order by and limit for aggregates

    XMLWordPrintableJSON

Details

    Description

      If a query is guaranteed to produce at most one row LIMIT and ORDER BY could be removed. This saves unnecessary vertex for LIMIT/ORDER BY.

      explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 order by cs limit 100
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
            Edges:
              Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
              Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
            DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: store_sales
                        filterExpr: (ss_ext_sales_price > 100) (type: boolean)
                        Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: (ss_ext_sales_price > 100) (type: boolean)
                          Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
                          Select Operator
                            Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              aggregations: count()
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                              Reduce Output Operator
                                sort order:
                                Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                                value expressions: _col0 (type: bigint)
                  Execution mode: vectorized
              Reducer 2
                  Execution mode: vectorized
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: bigint)
                        sort order: +
                        Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                        TopN Hash Memory Usage: 0.1
              Reducer 3
                  Execution mode: vectorized
                  Reduce Operator Tree:
                    Select Operator
                      expressions: KEY.reducesinkkey0 (type: bigint)
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                      Limit
                        Number of rows: 100
                        Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: 100
            Processor Tree:
              ListSink
      

      Attachments

        1. HIVE-21338.1.patch
          15 kB
          Vineet Garg
        2. HIVE-21338.2.patch
          912 kB
          Vineet Garg
        3. HIVE-21338.3.patch
          903 kB
          Vineet Garg
        4. HIVE-21338.4.patch
          902 kB
          Vineet Garg
        5. HIVE-21338.5.patch
          899 kB
          Vineet Garg
        6. HIVE-21338.6.patch
          1.08 MB
          Vineet Garg
        7. HIVE-21338.7.patch
          1.11 MB
          Vineet Garg
        8. HIVE-21338.8.patch
          1.11 MB
          Vineet Garg

        Issue Links

          Activity

            People

              vgarg Vineet Garg
              vgarg Vineet Garg
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m