|
Just as an FYI, we're doing something similar in the Capacity Scheduler (
In some initial testing of this patch on a jobtracker with a lot of old history files, I found that the lock in JobHistory on getJobHistoryFileName and recoverJobHistoryFile was causing most of the threads to block while one thread listed the directory, leading to no improvement. However, Amar Kamat explained that
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394016/parallel-job-init-v1.patch against trunk revision 718863. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3605/testReport/ This message is automatically generated. Generally looks good. A few comments:
As per discussion in
No, go ahead and take it forward. Please note though that in my tests it didn't help much without something like
Matei, while the performance improvement due to parallelization is definitely something we would like (and that would need
Attaching a patch that uses ThreadPoolExecutor for thread management. It also contains a Test case to test parallel initialization.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12401313/hadoop-4664-v1.patch against trunk revision 749919. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/44/testReport/ This message is automatically generated. Failed test, TestJobHistory, is unrelated.
Few comments :
Hey can you plz check if there are other such apis. In future we might want to associate a timer with each thread. We really dont want 3 out of 4 threads to be blocked for 1hr on dfs operations. But for now I think its a premature step. True. There are dfs calls in JobHistory*. However, I do not foresee a deadlock, these calls just might make it sequential in what otherwise could have been achieved in parallel, no? What this patch does is not insulate from slow data nodes, but just mitigates the effect.
On the different thread-safely items:
Attaching a patch that changes JobHistory.openJobs to a concurrenthashmap
When used with a smaller number of threads, there is not much of a difference between a ThreadPoolExecutor and explicitly managing threads . However, if and when we scale to using a larger number of threads, having a ThreadPoolExecutor manage is definitely a better option than managing explicitly. Resubmit patch to hudson, trunk test was broken by
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12401498/hadoop-4664-v2.patch against trunk revision 751463. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/31/testReport/ This message is automatically generated. Some comments:
Set the daemon attribute for the init threads. The termination of the main init thread should be fixed. The "while(true)" should be checking for the interrupt status. It would be better to use the static method from the Executors factory - Executors.newFixedThreadPool(int) instead of constructing a new thread pool using the explicit constructor. Don't have to catch Exception in JobInitManager.run Patch incorporating review comments.
Test Patch result: [exec] +1 overall. Based on an offline discussion with Devaraj, made some minor changes to the patch.
Test-Patch results: [exec] +1 overall. I just committed this to the 0.20 branch and the trunk. Thanks Matei and Jothi!
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12402051/hadoop-4664-v4.patch against trunk revision 752927. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/77/testReport/ This message is automatically generated. Integrated in Hadoop-trunk #778 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/778/
. Introduces multiple job initialization threads, where the number of threads are configurable via mapred.jobinit.threads. Contributed by (Matei Zaharia and Jothi Padmanabhan. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This patch also makes the CachedDNSToSwitchMap use a ConcurrentHashMap instead of a TreeMap for its rack resolving cache to avoid errors caused by multiple writes. (Cache-hit reads require no locks with ConcurrentHashMap.) Apart from the possibility of multiple writes to the resolution cache, I think I saw no other potentially conflict-inducing operations in initTasks, but I'd really welcome a second pair of eyes to look at it.
The number of job init threads is configurable as mapred.jobinit.threads. I set it to 4 by default, but let me know if there are any objections.