[HIVE-7156] Group-By operator stat-annotation only uses distinct approx to generate rollups - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.14.0
Fix Version/s: 0.14.0
Component/s: None
Labels:
None

Release Note:

Hide
Updated the removal of config in
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.map.parallelism
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez

Show
Updated the removal of config in https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.map.parallelism https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez

Description

The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values.

The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism.

hive> explain select distinct L_SHIPDATE from lineitem;

      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: lineitem
                  Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: l_shipdate (type: string)
                    outputColumnNames: l_shipdate
                    Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE
                    Group By Operator
                      keys: l_shipdate (type: string)
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Reducer 2 
            Reduce Operator Tree:
              Group By Operator
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: _col0 (type: string)
                  outputColumnNames: _col0
                  Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-7156.1.patch
09/Sep/14 04:20
89 kB
Prasanth Jayachandran
HIVE-7156.2.patch
10/Sep/14 21:58
90 kB
Prasanth Jayachandran
HIVE-7156.3.patch
11/Sep/14 21:43
202 kB
Prasanth Jayachandran
HIVE-7156.4.patch
12/Sep/14 07:21
216 kB
Prasanth Jayachandran
HIVE-7156.5.patch
23/Sep/14 18:13
217 kB
Prasanth Jayachandran
HIVE-7156.6.patch
23/Sep/14 22:59
220 kB
Prasanth Jayachandran
HIVE-7156.7.patch
24/Sep/14 01:24
236 kB
Prasanth Jayachandran
HIVE-7156.8.patch
26/Sep/14 09:25
238 kB
Prasanth Jayachandran
HIVE-7156.8.patch
26/Sep/14 09:22
238 kB
Prasanth Jayachandran
HIVE-7156.9.patch
29/Sep/14 01:34
241 kB
Prasanth Jayachandran
hive-debug.log.bz2
13/Sep/14 04:13
23 kB
Gopal Vijayaraghavan

Issue Links

is related to

HIVE-8354 HIVE-7156 introduced required dependency on tez

Closed

relates to

HIVE-7589 Some fixes and improvements to statistics annotation rules

Closed

links to

Review Board

Activity

People

Assignee:: Prasanth Jayachandran

Reporter:: Gopal Vijayaraghavan

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 31/May/14 02:18

Updated:: 13/Nov/14 19:42

Resolved:: 29/Sep/14 05:53