[HIVE-4136] hive should optimize the scenario when the input and output are bucketed/sorted on the same keys - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Query Processor
Labels:
None

Description

Consider a common scenario like:

create table T1 (...) clustered by (key) sorted by (key) into 2 buckets;
create table T2 (...) clustered by (key) sorted by (key) into 2 buckets;

SET hive.enforce.sorting=true;
SET hive.enforce.bucketing=true;

insert overwrite table T2
select * from T1;

The above query creates a reducer to make sure T2 is bucketed/sorted.
That is not needed

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Namit Jain

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Mar/13 16:45

Updated:: 07/Mar/13 16:45