|
I'm posting the results of lsof while running Abdul's code.
After I upped the max number of fds to 16K, the job ran to completion. It looks like something (garbage collection?) cleans up fds periodically; the Most of those open files are disk, and in your link, Raghu, you're talking about network IO, unless I misunderstood. Does it still look to you like these are related?
> Does it still look to you like these are related?
I don't think they are related. This seems like a different issue. Note that 5007 does not matter much since that is for all the processes together. What really matters is the max fds for a single process. In the log you attached it might be in 2-3k range. This seems more related to internals of mapred. please change the 'component' to mapred so that it gets better attention. Also provide approximate details of how big the output is, number of mappers, number of reducers etc. Here are few details about the input and output:
Input is a single gzip file of size: 2.3 G Bytes which deflates to 6.27 G Bytes of plain text. Job was run on 17 node Hadoop cluster with Max mapper capacity of 28 and reducer capacity of 39 HDFS bytes written 0 (at Map Stage) 598,425 (at Reduce Stage) 598,425 (Total) The job completes in 4 hours and 10 minutes. I took one look at the mergeParts code and didn't see any obvious issue with that. What's your setting for io.sort.factor and io.sort.mb? Also, could you please run the same job again with the exact (default?) setting for the Max FDs that resulted in this problem, and this time could you do "lsof -p <pid-of-map-task-process>". Please post the results here. Thanks!
io.sort.factor = 10
io.sort.mb = 100 Yuri attached a file with this JIRA named openfds.txt It has the information you asked for. All the FD parameters were on default. This file only shows hadoop open FDs. from openfds.txt, you probably mean PID 25731. It has 2283 fds open.. much below the 4k limit. But it might be an initial symptom. Devaraj, do the files opened by 25731 look appropriate?
I believe ulimit -n is per user, not per process. Also, as I was taking samples of lsof every 15s; the number could have been larger in-between. I don't know how to run lsof at the precise moment the job dies of fds starvation. > I believe ulimit -n is per user, not per process.
That is news to me. Not the case in linux machines I used. Any documentation that says so? Apologies, it looks like it's per process (and it's children, I guess?).
Looks like there is indeed a problem. In mergeParts all the spill files are opened at once for merge (although merge actually doesn't require more than io.sort.factor number of files to be opened at once). This should be fixed.
I wonder if something like this attachment might fix the problem.. Not that I know what I'm doing, really. I'll do the testing, and post an update later; in the meantime I'd like Hudson to run unit tests on this.
I ran this patch locally after I reduced ulimit -n to 4096. The job that was consistently failing before under this setup now runs to completion. I don't why Hudson didn't validate it...
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393895/HADOOP-4614.patch against trunk revision 714107. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/testReport/ This message is automatically generated. The attached patch won't work since the Segment that is created is not positioned at the right place for any partition# > 0. Note that in the current code there is a in.seek(segmentOffset); This seek is required at some point (maybe you can define a new Segment constructor that takes position as an argument and use that to seek to the correct offset). Ditto for segmentLength.
added another ctor as suggested by Devaraj.
BTW, we need a test for multiple partitions. I'll not volunteer to write it, though.
Thanks.
I'm concerned that all unit tests passed while the first version of the patch should not have worked for the case where number of partitions > 1. Please comment: I'd like to understand this before I create a jira on the lack of tests. Thanks! I doubt whether there exists a testcase that would spill more than once in the map task (note that the code in question would be exercised only if the number of first level spills is greater than 1). Even if some testcase did, it may not be checking the result versus what is expected.
If I understand what you're saying, in that code numSpills is most commonly 1 and the code in question will run at all because of a prior check... How would a test case force multiple spills? Can it for instance set io.sort.mb, io.sort.spill.percent, and io.sort.record.percent to something really small? Will this alone do the trick? The code in question will get executed only when the number of spills is greater than 1. A testcase can force multiple spills by having a very low number for io.sort.mb (something like a few KB and where the map method generates many records)...
Created a jira https://issues.apache.org/jira/browse/HADOOP-4688
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394101/HADOOP-4614.patch against trunk revision 719324. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/testReport/ This message is automatically generated. Can a committer please take a look at this? I'd like to close a loop on this. Thanks.
There has been some more discussion since Davaraj's +1. Let me know if it is still the case, I can commit it. Also fill in short description in "release note" for Change log.
Another question: should this go into 0.18 or just 0.19 and trunk is enough?
If you're talking about the discussion that took place here, I don't think it invalidated the patch in any way.
You want me to update the patch to add this description, right?
I realize you're probably not asking me, but my humble opinion is: this is should go in as soon as possible, because this kind of bugs force people to set ulimit -n to hundreds of thousands (as some mentioned on the mailing list). Just a few nits:
The fix looks good, though. Good catch on the unit test, too.
If this needs to go to 18, we need a different patch as the current patch will not apply directly to the 18 branch. However, the changes might be minimal.
attached a slightly modified version for 0.18 branch (untested)
Yuri,
Please attach the trunk patch again. Hudson (system that runs patch tests) picks up the latest patch and the 0.18 patch will fail for trunk. Regd commiting to 0.18, I am not so sure : I will leave that MapRed folks. Usually only the regressions from 0.17 and new bugs in 0.18 will qualify for 0.18. As I understand, this bug might have existed for a long time. That said, the patch for 0.18 is useful for users who want to apply it themselves. I think it's a regression from 0.17; SequenceFile.SegmentDescriptor is opened lazily.
+1 on the current patch. Unless anyone objects, I'll commit this to the 18 and 19 branches as well. Committing
I think in the process of merging opening all files crept back in! I'm attaching yet another version which should fix it.
The diff of patches is as follows:
sigh Thanks for catching that. I'm surprised findbugs didn't complain...
[exec] -1 overall.
[exec] +1 @author. The patch does not contain any @author tags.
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no tests are needed for this patch.
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
All unit tests passed on my machine.
I just committed this. Thanks, Yuri -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394702/HADOOP-4614.patch against trunk revision 720632. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/testReport/ This message is automatically generated. Integrated in Hadoop-trunk #671 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/671/
. Lazily open segments when merging map spills to avoid using too many file descriptors. Contributed by Yuri Pradkin. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Also make sure you are not hitting
HADOOP-4346(may be not, /proc/pid/fd would make it more clear).