Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5369 Annotate hive operator tree with statistics from metastore
  3. HIVE-7156

Group-By operator stat-annotation only uses distinct approx to generate rollups

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.14.0
    • 0.14.0
    • None
    • None

    Description

      The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values.

      The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism.

      hive> explain select distinct L_SHIPDATE from lineitem;
      
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: lineitem
                        Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE
                        Select Operator
                          expressions: l_shipdate (type: string)
                          outputColumnNames: l_shipdate
                          Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE
                          Group By Operator
                            keys: l_shipdate (type: string)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: string)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: string)
                              Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Reducer 2 
                  Reduce Operator Tree:
                    Group By Operator
                      keys: KEY._col0 (type: string)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: string)
                        outputColumnNames: _col0
                        Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE
      

      Attachments

        1. HIVE-7156.9.patch
          241 kB
          Prasanth Jayachandran
        2. HIVE-7156.8.patch
          238 kB
          Prasanth Jayachandran
        3. HIVE-7156.8.patch
          238 kB
          Prasanth Jayachandran
        4. HIVE-7156.7.patch
          236 kB
          Prasanth Jayachandran
        5. HIVE-7156.6.patch
          220 kB
          Prasanth Jayachandran
        6. HIVE-7156.5.patch
          217 kB
          Prasanth Jayachandran
        7. hive-debug.log.bz2
          23 kB
          Gopal Vijayaraghavan
        8. HIVE-7156.4.patch
          216 kB
          Prasanth Jayachandran
        9. HIVE-7156.3.patch
          202 kB
          Prasanth Jayachandran
        10. HIVE-7156.2.patch
          90 kB
          Prasanth Jayachandran
        11. HIVE-7156.1.patch
          89 kB
          Prasanth Jayachandran

        Issue Links

          Activity

            People

              prasanth_j Prasanth Jayachandran
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: