[HIVE-5369] Annotate hive operator tree with statistics from metastore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.13.0
Fix Version/s: 0.13.0
Component/s: Query Processor, Statistics
Labels:
- statistics

Description

Currently the statistics gathered at table/partition level and column level are not used during query planning stage. Statistics at table/partition and column level can be used for optimizing the query plans. Basic statistics like uncompressed data size can be used for better reducer estimation. Other statistics like number of rows, distinct values of columns, average length of columns etc. can be used by Cost Based Optimizer (CBO) for making better query plan selection. As a first step in improving query planning the statistics that are available in the metastore should be attached to hive operator tree. The operator tree should be walked and annotated with statistics information. The attached statistics will vary for each operator depending on the operation it performs. For example, select operator will change the average row size but doesn't affect the number of rows. Similarly filter operator will change the number of rows but doesn't change the average row size. Similar rules can be applied for other operators as well.

Rules for different operators are added as comments in the code. For more detailed information, the reference book that I am using is "Database Systems: The Complete Book" by Garcia-Molina et.al.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-5369.1.txt
11/Oct/13 09:28
750 kB
Prasanth Jayachandran
HIVE-5369.10.patch
18/Nov/13 17:31
1.29 MB
Prasanth Jayachandran
HIVE-5369.2.patch.txt
05/Nov/13 01:31
725 kB
Prasanth Jayachandran
HIVE-5369.2.WIP.txt
10/Oct/13 08:02
874 kB
Prasanth Jayachandran
HIVE-5369.3.patch.txt
05/Nov/13 21:49
718 kB
Prasanth Jayachandran
HIVE-5369.4.patch.txt
12/Nov/13 23:43
796 kB
Prasanth Jayachandran
HIVE-5369.5.patch.txt
13/Nov/13 04:04
800 kB
Prasanth Jayachandran
HIVE-5369.6.patch.txt
14/Nov/13 03:40
803 kB
Prasanth Jayachandran
HIVE-5369.7.patch.txt
14/Nov/13 21:18
1.23 MB
Prasanth Jayachandran
HIVE-5369.8.patch.txt
14/Nov/13 23:28
1.27 MB
Prasanth Jayachandran
HIVE-5369.9.patch
18/Nov/13 05:54
1.29 MB
Gunther Hagleitner
HIVE-5369.9.patch.txt
15/Nov/13 07:08
1.29 MB
Prasanth Jayachandran
HIVE-5369.refactor.WIP.txt
04/Nov/13 18:16
700 kB
Prasanth Jayachandran
HIVE-5369.WIP.txt
08/Oct/13 20:50
146 kB
Prasanth Jayachandran

Issue Links

is blocked by

HIVE-5325 Implement statistics providing ORC writer and reader interfaces

Resolved

is related to

HIVE-6300 Add documentation for stats configs to hive-default.xml.template

Resolved

links to

Review Board Link

Sub-Tasks

1.	Improve the stats of operators based on heuristics in the absence of any column statistics	Resolved	Prasanth Jayachandran
2.	Make fetching of column statistics configurable	Resolved	Prasanth Jayachandran
3.	Better heuristics for worst case statistics estimates for join, limit and filter operator	Resolved	Prasanth Jayachandran
4.	Fix statistics annotation related test failures in hadoop2	Resolved	Prasanth Jayachandran
5.	In statistics annotation add flag to say if statistics is estimated or accurate	Open	Prasanth Jayachandran
6.	Support column statistics for expressions in GBY attributes, JOIN condition etc. when annotating operator tree with statistics	Open	Prasanth Jayachandran
7.	Add statistics rule for Union operator	Open	Prasanth Jayachandran
8.	Support for operators like PTF, Script, Extract etc. in statistics annotation.	Open	Prasanth Jayachandran
9.	Update statistics rules for different types of joins	Open	Prasanth Jayachandran
10.	Add documentation for stats configs to hive-default.xml.template	Resolved	Prasanth Jayachandran
11.	Add protection against divide by zero in stats annotation	Resolved	Prasanth Jayachandran
12.	Update column stats based on filter expression in stats annotation	Open	Prasanth Jayachandran
13.	Stats annotation fails to evaluate constant expressions in filter operator	Closed	Prasanth Jayachandran
14.	Make use of number of nulls column statistics in filter rule	Closed	Prasanth Jayachandran
15.	Make use of decimal column statistics in statistics annotation	Closed	Prasanth Jayachandran
16.	Some fixes and improvements to statistics annotation rules	Closed	Prasanth Jayachandran
17.	JOIN operator should update the column stats when number of rows changes	Closed	Prasanth Jayachandran
18.	Join stats annotation rule is not updating columns statistics correctly	Closed	Prasanth Jayachandran
19.	Ease-out denominator for multi-attribute join case in statistics annotation	Closed	Prasanth Jayachandran
20.	Missing null check cause NPE when updating join column stats in statistics annotation	Closed	Prasanth Jayachandran
21.	Column statistics from expression does not handle fields within complex types	Closed	Prasanth Jayachandran
22.	With fetch column stats disabled number of elements in grouping set is not taken into account	Closed	Prasanth Jayachandran
23.	Incorrect calculation of number of rows in JoinStatsRule.process results in overflow	Closed	Prasanth Jayachandran
24.	StatsRulesProcFactory should gracefully handle overflows	Closed	Prasanth Jayachandran
25.	Group-By operator stat-annotation only uses distinct approx to generate rollups	Closed	Prasanth Jayachandran
26.	Select Operator does not rename column stats properly in case of select star	Closed	Prasanth Jayachandran
27.	With dynamic partition enabled fact table selectivity is not taken into account when generating the physical plan (Use CBO cardinality using physical plan generation)	Closed	Prasanth Jayachandran
28.	NPE in PK-FK inference when one side of join is complex tree	Closed	Prasanth Jayachandran
29.	Support LateralViewJoinOperator and LateralViewForwardOperator in stats annotation	Closed	Prasanth Jayachandran

Activity

People

Assignee:: Prasanth Jayachandran

Reporter:: Prasanth Jayachandran

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 26/Sep/13 08:38

Updated:: 26/Jan/14 00:48

Resolved:: 18/Nov/13 19:30