[HIVE-7654] A method to extrapolate columnStats for partitions of a table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.14.0
Component/s: Statistics
Labels:
- TODOC14

Description

In a PARTITIONED table, there are many partitions. For example,

create table if not exists loc_orc (
state string,
locid int,
zip bigint
) partitioned by(year string) stored as orc;

We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003').

We can use the following command to compute statistics for columns state,locid of partition(year='2001')

analyze table loc_orc partition(year='2001') compute statistics for columns state,locid;

We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001')

We propose a method to extrapolate the missing column status for the partitions.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-7654.9.patch
22/Aug/14 17:43
186 kB
Pengcheng Xiong
HIVE-7654.8.patch
21/Aug/14 18:12
176 kB
Pengcheng Xiong
HIVE-7654.7.patch
21/Aug/14 17:35
176 kB
Pengcheng Xiong
HIVE-7654.6.patch
20/Aug/14 20:55
176 kB
Pengcheng Xiong
HIVE-7654.4.patch
20/Aug/14 06:52
172 kB
Pengcheng Xiong
HIVE-7654.1.patch
13/Aug/14 20:52
120 kB
Pengcheng Xiong
HIVE-7654.0.patch
07/Aug/14 22:08
18 kB
Pengcheng Xiong
Extrapolate the Column Status.docx
07/Aug/14 21:53
131 kB
Pengcheng Xiong

Activity

People

Assignee:: Pengcheng Xiong

Reporter:: Pengcheng Xiong

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 07/Aug/14 21:52

Updated:: 13/Nov/14 19:42

Resolved:: 23/Aug/14 00:54