[HIVE-7956] When inserting into a bucketed table, all data goes to a single bucket [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: Spark
Labels:
None

Description

I created a bucketed table:

create table testBucket(x int,y string) clustered by(x) into 10 buckets;

Then I run a query like:

set hive.enforce.bucketing = true;
insert overwrite table testBucket select intCol,stringCol from src;

Here src is a simple textfile-based table containing 40000000 records (not bucketed). The query launches 10 reduce tasks but all the data goes to only one of them.

Attachments

Issue Links

depends upon

HIVE-8017 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

Resolved

is part of

HIVE-7292 Hive on Spark

Resolved

Activity

People

Assignee:: Rui Li

Reporter:: Rui Li

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Sep/14 08:36

Updated:: 15/Sep/14 06:25

Resolved:: 15/Sep/14 06:25