[TEZ-2198] Fix sorter spill counts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.8.0-alpha, 0.7.1
Component/s: None
Labels:
None

Target Version/s:

0.7.1
Hadoop Flags:

Reviewed

Description

Prior to pipelined shuffle, tez merged all spilled data into a single file. This ended up creating one index file and one output file. In this context, TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional spills and there was no counter needed to track the number of merges.

With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT would be misleading, as these spills are direct output files which are consumed by the consumers.

It would be good to have the following

ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task to generate the final merged output
TOTAL_SPILLS: represents the total number of shuffle directories (index + output files) that got created at the end of processing.

For e.g, Assume sorter generated 5 spills in an attempt
Without pipelining:
==============
ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting
TOTAL_SPILLS = 1 <-- Final merged output

With pipelining:
============
ADDITIONAL_SPILL_COUNT = 0 <-- Additional spills involved in sorting
TOTAL_SPILLS = 5 <--- all spills are final output

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TEZ-2198.branch-0.7.patch
02/Sep/15 05:24
32 kB
Rajesh Balamohan
TEZ-2198.6.patch
28/May/15 01:50
32 kB
Rajesh Balamohan
TEZ-2198.5.patch
27/May/15 23:07
32 kB
Rajesh Balamohan
TEZ-2198.4.patch
01/May/15 03:24
32 kB
Rajesh Balamohan
TEZ-2198.3.patch
30/Apr/15 23:30
31 kB
Rajesh Balamohan
no_additional_spills_eg_pipelined_shuffle.png
06/Apr/15 06:24
61 kB
Rajesh Balamohan
with_additional_spills.png
06/Apr/15 06:24
64 kB
Rajesh Balamohan
TEZ-2198.2.patch
16/Mar/15 23:18
30 kB
Rajesh Balamohan
TEZ-2198.1.patch
16/Mar/15 04:28
22 kB
Rajesh Balamohan

Activity

People

Assignee:: Rajesh Balamohan

Reporter:: Rajesh Balamohan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 13/Mar/15 07:57

Updated:: 02/Sep/15 07:20

Resolved:: 28/May/15 23:46