[PIG-1264] Skewed join sampler misses out the key with the highest frequency - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.7.0
Fix Version/s: 0.7.0
Component/s: None
Labels:
None

Description

I am noticing two issues with the sampler used in skewed join:
1. It does not allocate multiple reducers to the key with the highest frequency.
2. It seems to be allocating the same number of reducers to every key (8 in this case).

Query:

a = load 'studenttab10k' using PigStorage() as (name, age, gpa);
b = load 'votertab10k' as (name, age, registration, contributions);
e = join a by name right, b by name using "skewed" parallel 8;
store e into 'SkewedJoin_9.out';

Attachments

Activity

People

Assignee:: Richard Ding

Reporter:: Sriranjan Manjunath

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 26/Feb/10 20:03

Updated:: 14/May/10 06:47

Resolved:: 06/Mar/10 00:05