[PIG-5342] Add setting to turn off bloom join combiner - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.18.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom join. When the keys are all unique, the combiner is unnecessary overhead.
2) In previous case, the keys were the bloom filter index and the values were the join key. Combining involved doing a distinct on the bag of values which has memory issues for more than 10 million records. That needs to be flipped and distinct combiner used to scale to a billions of records.
3) Mention in documentation that bloom join is also ideal in cases of right outer join with smaller dataset on the right. Replicate join only supports left outer join.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-5342-1.patch
13/Jun/18 19:15
26 kB
Satish Saley
PIG-5342-2.patch
15/Jun/18 20:18
46 kB
Satish Saley
PIG-5342-3.patch
28/Jun/18 16:37
48 kB
Satish Saley
PIG-5342-4.patch
06/Jul/18 16:28
41 kB
Satish Saley
PIG-5342-5.patch
06/Jul/18 23:14
41 kB
Satish Saley
PIG-5342-6.patch
01/Oct/18 21:31
43 kB
Satish Saley
PIG-5342-7.patch
03/Oct/18 16:40
43 kB
Satish Saley
PIG-5342-8.patch
03/Oct/18 22:12
43 kB
Satish Saley

Activity

People

Assignee:: Satish Saley

Reporter:: Satish Saley

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 13/Jun/18 19:08

Updated:: 04/Oct/18 22:18

Resolved:: 03/Oct/18 22:40