[PIG-4771] Implement FR Join for spark engine - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: spark-branch
Component/s: spark
Labels:
None

Description

We use regular join to replace FR join in current code base(fd31fda). We need to implement FR join.

Some info collected from https://pig.apache.org/docs/r0.11.0/perf.html#replicated-joins:
Replicated Joins
Fragment replicate join is a special type of join that works well if one or more relations are small enough to fit into main memory. In such cases, Pig can perform a very efficient join because all of the hadoop work is done on the map side. In this type of join the large relation is followed by one or more small relations. The small relations must be small enough to fit into main memory; if they don't, the process fails and an error is generated.

Usage
Perform a replicated join with the USING clause (see JOIN (inner) and JOIN (outer)). In this example, a large relation is joined with two smaller relations. Note that the large relation comes first followed by the smaller relations; and, all small relations together must fit into main memory, otherwise an error is generated.

big = LOAD 'big_data' AS (b1,b2,b3);

tiny = LOAD 'tiny_data' AS (t1,t2,t3);

mini = LOAD 'mini_data' AS (m1,m2,m3);

C = JOIN big BY b1, tiny BY t1, mini BY m1 USING 'replicated';

Conditions
Fragment replicate joins are experimental; we don't have a strong sense of how small the small relation must be to fit into memory. In our tests with a simple query that involves just a JOIN, a relation of up to 100 M can be used if the process overall gets 1 GB of memory.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-4771_2.patch
31/Mar/16 02:37
16 kB
liyunzhang
PIG-4771_3.patch
17/May/16 04:12
23 kB
liyunzhang
PIG-4771.patch
23/Feb/16 03:50
12 kB
liyunzhang

Issue Links

is related to

PIG-4891 Implement FR join by broadcasting small rdd not making more copys of data

Closed

links to

Review Board

Activity

People

Assignee:: liyunzhang

Reporter:: liyunzhang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 05/Jan/16 04:27

Updated:: 21/Jun/17 09:18

Resolved:: 17/May/16 04:36