[PIG-920] optimizing diamond queries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.6.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

The following query

A = load 'foo';
B = filer A by $0>1;
C = filter A by $1 = 'foo';
D = COGROUP C by $0, B by $0;
......

does not get efficiently executed. Currently, it runs a map only job that basically reads and write the same data before doing the query processing.

Query where the data is loaded twice actually executed more efficiently.

This is not an uncommon query and we should fix this issue.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-920.patch
27/Oct/09 21:47
14 kB
Richard Ding
PIG-920.patch
30/Oct/09 20:01
14 kB
Richard Ding

Activity

People

Assignee:: Richard Ding

Reporter:: Olga Natkovich

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 13/Aug/09 17:12

Updated:: 24/Mar/10 22:15

Resolved:: 30/Oct/09 20:54