|
+1, I've seen this happen too. THe hod log-harvester was eating up CPU on each of the task-trackers... Does this patch address the issue of excessive logging (resulting in hudge log files)? sigh I forgot about that point. Will upload one shortly.
Here is another stab at this. The patch logs only when updates happen or a specific time limit expires (1 minute). I hope I have covered all possible logging that happens frequently in the shuffle.
Does this patch apply to 0.17? -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381688/3332.patch against trunk revision 654315. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2430/testReport/ This message is automatically generated. No this patch doesn't apply to 0.17. I will put up a version for 0.17
This patch is for the 0.17 branch
The patch looks good.
It should be checked into 17 too. I'm a little concerned about doing adding a 'gettimeofday'-ish call in the inner-loop; especially when it is just for logging...
Arun, actually there was a call to currentTimeMillis already. I moved it up so that it can be reused for logging also. From the patch, this is the place where currentTime is removed from...
@@ -1098,7 +1123,7 @@ Iterator<MapOutputLocation> locIt = knownOutputs.iterator();
MapOutputLocation loc = locIt.next(); I might be missing something, but the patch moves it down into the while loop, from outside it...
Can we do something simple like log every 1% of shuffle progress or something like that? i.e. log status once we copy 1% of map-outputs... Hey Arun, if you see the fetchOutputs method, the entire stuff is within a big while loop "while (!neededOutputs.isEmpty() && mergeThrowable == null) {" The call to System.currentTimeMillis was within that earlier (to be precise within, synchronized (scheduledCopies) ), I moved it outside the "synchronized (scheduledCopies)" and I think I moved it to the place where it should be in the loop.
+1
Ok, I missed that one - sorry! This was caused by
I just committed this to the trunk.
Integrated in Hadoop-trunk #492 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/492/
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For any reasonable size of jobs where the shuffling may take some time, the userlog/syslog file of each reducer task may
reach unreasonably large (0.5GB, say). This may impose a big burden for hod to harvest the log files when deallocating
a cluster. Also, if those log files are archived on a DFS (as what the hod does now), the space requirements on DFS
will be quite significant.