[PIG-554] Fragment Replicate Join - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.2.0
Fix Version/s: 0.2.0
Component/s: None
Labels:
None

Patch Info:

Patch Available

Description

Fragment Replicate Join(FRJ) is useful when we want a join between a huge table and a very small table (fitting in memory small) and the join doesn't expand the data by much. The idea is to distribute the processing of the huge files by fragmenting it and replicating the small file to all machines receiving a fragment of the huge file. Because of the availability of the entire small file, the join becomes a trivial task without needing any break in the pipeline. Exhaustive test have done to determine the improvement we get out of FRJ. Here are the details: http://wiki.apache.org/pig/PigFRJoin

The patch makes changes to parts of the code where new operators are introduced. Currently, when a new operator is introduced, its alias is not set. For schema computation I have modified this behaviour to set the alias of the new operator to that of its predecessor. The logical side of the patch mimics the cogroup behavior as join syntax closely resembles that of cogroup. Currently, this patch doesn't have support for joins other than inner joins. The rest of the code has been documented.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-554-v4.patch
07/Jan/09 22:41
79 kB
Pradeep Kamath
PIG-554-v3.patch
07/Jan/09 02:03
58 kB
Pradeep Kamath
frjofflat1.patch
12/Dec/08 09:39
75 kB
Shravan Matthur Narayanamurthy
frjofflat.patch
02/Dec/08 12:11
60 kB
Shravan Matthur Narayanamurthy

Activity

People

Assignee:: Shravan Matthur Narayanamurthy

Reporter:: Shravan Matthur Narayanamurthy

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Dec/08 12:09

Updated:: 24/Mar/10 22:04

Resolved:: 30/Jan/09 22:11