[HADOOP-4664] Parallelize job initialization - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.20.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

The job init thread currently initializes one job at a time. However, this is a lengthy and partly IO-bound process because all of the job's block locations need to be resolved through the namenode and a map of them needs to be built. It can take tens of seconds. As a result, the cluster sometimes initializes jobs too slowly for full utilization to be achieved, if there are many small jobs queued up. It would be better to have a pool of threads that initialize multiple jobs in parallel. One thing to be careful of, however, is not causing deadlocks or holding locks for too long in these threads.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

parallel-job-init-v1.patch
16/Nov/08 19:21
5 kB
Matei Alexandru Zaharia
hadoop-4664-v1.patch
03/Mar/09 14:10
13 kB
Jothi Padmanabhan
hadoop-4664-v2.patch
05/Mar/09 07:22
14 kB
Jothi Padmanabhan
hadoop-4664-v3.patch
12/Mar/09 12:58
14 kB
Jothi Padmanabhan
hadoop-4664-v4.patch
12/Mar/09 15:28
14 kB
Jothi Padmanabhan

Issue Links

incorporates

HADOOP-5286 DFS client blocked for a long time reading blocks of a file on the JobTracker

Closed

Activity

People

Assignee:: Jothi Padmanabhan

Reporter:: Matei Alexandru Zaharia

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 15/Nov/08 04:45

Updated:: 08/Jul/09 16:53

Resolved:: 12/Mar/09 16:59