[PIG-951] Reset parallelism to 1 for indexing job in MergeJoin - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.6.0
Component/s: impl
Labels:
None

Description

After sampling one tuple from every block, one reducer is used to sort the index entries in reduce phase to produce sorted index to be used in actual join job. Thus, parallelism of index job should be explictly set to 1. Currently, its not.

Currently, this is a non-issue, since we don't allow any blocking operators in pipeline before merge-join. However, later when we do allow blocking operators, then parallelism of indexing job will be that of preceding blocking operator. Even then, job will complete successfully because all tuple will go to only one reducer, because we are grouping on only one key "all". However, it will waste cluster resources by starting all the extra reducers which get no data and thus do nothing.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

pig-951.patch
10/Sep/09 03:46
3 kB
Ashutosh Chauhan

Activity

People

Assignee:: Ashutosh Chauhan

Reporter:: Ashutosh Chauhan

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 10/Sep/09 03:43

Updated:: 24/Mar/10 22:15

Resolved:: 18/Sep/09 17:42