[PIG-4601] Implement Merge CoGroup for Spark engine - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: spark-branch
Fix Version/s: spark-branch
Component/s: spark
Labels:
None

Description

When doing a cogroup operation, we need do a map-reduce. The target of merge cogroup is implementing cogroup only by a single stage(map). But we need to guarantee the input data are sorted.

There is performance improvement for cases when A(big dataset) merge cogroup B( small dataset) because we first generate an index file of A then loading A according to the index file and B into memory to do cogroup. The performance improves because there is no cost of reduce period comparing cogroup.

How to use

C = cogroup A by c1, B by c1 using 'merge';

Here A and B is sorted.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-4601_1.patch
14/Jan/16 10:26
26 kB
liyunzhang
PIG-4601_2.patch
18/Jan/16 07:10
26 kB
liyunzhang
PIG-4601_3.patch
17/Feb/16 02:24
23 kB
liyunzhang
PIG-4601_4.patch
18/Feb/16 07:04
26 kB
liyunzhang

Issue Links

links to

review board

Activity

People

Assignee:: liyunzhang

Reporter:: Mohit Sabharwal

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 12/Jun/15 23:29

Updated:: 21/Jun/17 09:18

Resolved:: 19/Feb/16 12:35