Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
I created a bucketed table:
create table testBucket(x int,y string) clustered by(x) into 10 buckets;
Then I run a query like:
set hive.enforce.bucketing = true;
insert overwrite table testBucket select intCol,stringCol from src;
Here src is a simple textfile-based table containing 40000000 records (not bucketed). The query launches 10 reduce tasks but all the data goes to only one of them.