Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.7.0
-
None
-
None
Description
I am noticing two issues with the sampler used in skewed join:
1. It does not allocate multiple reducers to the key with the highest frequency.
2. It seems to be allocating the same number of reducers to every key (8 in this case).
Query:
a = load 'studenttab10k' using PigStorage() as (name, age, gpa);
b = load 'votertab10k' as (name, age, registration, contributions);
e = join a by name right, b by name using "skewed" parallel 8;
store e into 'SkewedJoin_9.out';