[MAPREDUCE-1783] Task Initialization should be delayed till when a job can be run - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.20.1
Fix Version/s: 0.22.0, 0.23.0
Component/s: contrib/fair-share
Labels:
None

Hadoop Flags:

Reviewed

Description

The FairScheduler task scheduler uses PoolManager to impose limits on the number of jobs that can be running at a given time. However, jobs that are submitted are initiaiized immediately by EagerTaskInitializationListener by calling JobInProgress.initTasks. This causes the job split file to be read into memory. The split information is not needed until the number of running jobs is less than the maximum specified. If the amount of split information is large, this leads to unnecessary memory pressure on the Job Tracker.
To ease memory pressure, FairScheduler can use another implementation of JobInProgressListener that is aware of PoolManager limits and can delay task initialization until the number of running jobs is below the maximum.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

0001-Pool-aware-job-initialization.patch
18/May/10 20:24
30 kB
Ramkumar Vadali
0001-Pool-aware-job-initialization.patch.1
20/May/10 22:39
30 kB
Ramkumar Vadali
submit-mapreduce-1783.patch
21/May/10 16:11
29 kB
Ramkumar Vadali
MAPREDUCE-1783.patch
19/Nov/10 19:11
13 kB
Ramkumar Vadali

Activity

People

Assignee:: Ramkumar Vadali

Reporter:: Ramkumar Vadali

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 11/May/10 22:50

Updated:: 15/Nov/11 00:48

Resolved:: 01/Dec/10 00:41