Issue Details (XML | Word | Printable)

Key: HADOOP-4614
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Yuri Pradkin
Reporter: Abdul Qadeer
Votes: 0
Watchers: 7
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

"Too many open files" error while processing a large gzip file

Created: 07/Nov/08 08:56 AM   Updated: 08/Jul/09 04:53 PM
Return to search
Component/s: None
Affects Version/s: 0.18.2
Fix Version/s: 0.18.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 4614-trunk.patch 2008-11-25 10:43 PM Chris Douglas 4 kB
Text File Licensed for inclusion in ASF works HADOOP-4614-branch0.18.patch 2008-11-24 04:40 PM Yuri Pradkin 3 kB
Text File Licensed for inclusion in ASF works HADOOP-4614.patch 2008-11-25 11:33 PM Yuri Pradkin 4 kB
Text File Licensed for inclusion in ASF works HADOOP-4614.patch 2008-11-24 09:32 PM Yuri Pradkin 3 kB
Text File Licensed for inclusion in ASF works HADOOP-4614.patch 2008-11-24 04:13 PM Yuri Pradkin 3 kB
Text File Licensed for inclusion in ASF works HADOOP-4614.patch 2008-11-17 09:16 PM Yuri Pradkin 3 kB
Text File Licensed for inclusion in ASF works HADOOP-4614.patch 2008-11-13 10:02 PM Yuri Pradkin 1.0 kB
Text File openfds.txt 2008-11-08 12:33 AM Yuri Pradkin 575 kB

Hadoop Flags: Reviewed
Resolution Date: 26/Nov/08 01:42 AM


 Description  « Hide
I am running a simple word count program on a gzip compressed data of size 4 GB (Uncompressed size is about 7 GB). I have setup of 17 nodes in my Hadoop cluster. After some time, I get the following exception:

java.io.FileNotFoundException: /usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_0/output/spill4055.out.index
(Too many open files)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:137)
at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:62)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:98)
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:168)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
at org.apache.hadoop.mapred.IndexRecord.readIndexFile(IndexRecord.java:47)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.getIndexInformation(MapTask.java:1339)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1237)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.Child.main(Child.java:155)

From a user's perspective I know that Hadoop will use only one mapper for a gzipped file. The above exception suggests that probably Hadoop puts the intermediate data into many files. But the question is that "exactly how many open files are required in the worst case for any data size and cluster size?" Currently it looks as if Hadoop needs more number of open files as the size of input or the cluster size (in terms of nodes, mapper, reducers) increases. This is not plausible as far as scalability is concerned. A user needs to write some number in the /etc/security/limits.conf file that how many open files are allowed by hadoop node. The question is what that "magical number" should be?

So probably the best solution to this problem is to change Hadoop such a way that it can work with some moderate number of allowed open files (e.g. 4 K) or any other number should be suggested as an upper limit such that a user is sure that for any data size and cluster size, hadoop will not run into this "too many open files" issue.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Raghu Angadi added a comment - 07/Nov/08 11:50 PM
It would help if you have a list of all the file that were open around this time (with lsof or better, under /proc/pid/fd).

Also make sure you are not hitting HADOOP-4346 (may be not, /proc/pid/fd would make it more clear).


Yuri Pradkin added a comment - 08/Nov/08 12:33 AM
I'm posting the results of lsof while running Abdul's code.

After I upped the max number of fds to 16K, the job ran to completion.
I was monitoring the number of open files/processes every 15s (by simply
running ps and lsof | wc -l) and saw this:
#processes open_files
...
13 646
13 648
12 2535
13 4860
12 4346
12 3842
12 3324
12 2823
12 2316
12 1852
12 1387
12 936
12 643
12 643
12 643
12 643
12 643
12 643
13 642
12 642
12 4775
12 2738
12 917
12 643
12 642
12 4992
12 4453
12 3943
12 3299
12 2855
12 2437
...

It looks like something (garbage collection?) cleans up fds periodically; the
max I saw was 5007 (but again, there may have been more in between the 15s
sampling interval).


Yuri Pradkin added a comment - 08/Nov/08 01:14 AM
Most of those open files are disk, and in your link, Raghu, you're talking about network IO, unless I misunderstood. Does it still look to you like these are related?

Raghu Angadi added a comment - 08/Nov/08 04:49 AM
> Does it still look to you like these are related?
I don't think they are related. This seems like a different issue.

Note that 5007 does not matter much since that is for all the processes together. What really matters is the max fds for a single process. In the log you attached it might be in 2-3k range.

This seems more related to internals of mapred. please change the 'component' to mapred so that it gets better attention. Also provide approximate details of how big the output is, number of mappers, number of reducers etc.


Abdul Qadeer added a comment - 12/Nov/08 02:11 PM
Here are few details about the input and output:

Input is a single gzip file of size: 2.3 G Bytes which deflates to 6.27 G Bytes of plain text.

Job was run on 17 node Hadoop cluster with Max mapper capacity of 28 and reducer capacity of 39

HDFS bytes written 0 (at Map Stage) 598,425 (at Reduce Stage) 598,425 (Total)
Local bytes read 45,190,007,737 11,044,381,978 56,234,389,715
Local bytes written 53,421,433,873 11,044,381,978 64,465,815,851

The job completes in 4 hours and 10 minutes.


Devaraj Das added a comment - 12/Nov/08 04:56 PM
I took one look at the mergeParts code and didn't see any obvious issue with that. What's your setting for io.sort.factor and io.sort.mb? Also, could you please run the same job again with the exact (default?) setting for the Max FDs that resulted in this problem, and this time could you do "lsof -p <pid-of-map-task-process>". Please post the results here. Thanks!

Abdul Qadeer added a comment - 12/Nov/08 07:17 PM
io.sort.factor = 10
io.sort.mb = 100

Yuri attached a file with this JIRA named openfds.txt It has the information you asked for. All the FD parameters were on default. This file only shows hadoop open FDs.


Raghu Angadi added a comment - 12/Nov/08 07:24 PM

from openfds.txt, you probably mean PID 25731. It has 2283 fds open.. much below the 4k limit. But it might be an initial symptom.

Devaraj, do the files opened by 25731 look appropriate?


Yuri Pradkin added a comment - 12/Nov/08 07:47 PM

It has 2283 fds open.. much below the 4k limit. But it might be an initial symptom.

I believe ulimit -n is per user, not per process.

Also, as I was taking samples of lsof every 15s; the number could have been larger in-between. I don't know how to run lsof at the precise moment the job dies of fds starvation.


Raghu Angadi added a comment - 12/Nov/08 08:02 PM
> I believe ulimit -n is per user, not per process.

That is news to me. Not the case in linux machines I used. Any documentation that says so?


Yuri Pradkin added a comment - 12/Nov/08 08:23 PM
Apologies, it looks like it's per process (and it's children, I guess?).

Devaraj Das added a comment - 13/Nov/08 04:31 AM
Looks like there is indeed a problem. In mergeParts all the spill files are opened at once for merge (although merge actually doesn't require more than io.sort.factor number of files to be opened at once). This should be fixed.

Yuri Pradkin added a comment - 13/Nov/08 10:02 PM
I wonder if something like this attachment might fix the problem.. Not that I know what I'm doing, really. I'll do the testing, and post an update later; in the meantime I'd like Hudson to run unit tests on this.

Yuri Pradkin added a comment - 14/Nov/08 08:00 PM
I ran this patch locally after I reduced ulimit -n to 4096. The job that was consistently failing before under this setup now runs to completion. I don't why Hudson didn't validate it...

Hadoop QA added a comment - 17/Nov/08 09:42 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393895/HADOOP-4614.patch
against trunk revision 714107.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/console

This message is automatically generated.


Devaraj Das added a comment - 17/Nov/08 07:24 PM
The attached patch won't work since the Segment that is created is not positioned at the right place for any partition# > 0. Note that in the current code there is a in.seek(segmentOffset); This seek is required at some point (maybe you can define a new Segment constructor that takes position as an argument and use that to seek to the correct offset). Ditto for segmentLength.

Yuri Pradkin added a comment - 17/Nov/08 09:16 PM
added another ctor as suggested by Devaraj.

Yuri Pradkin added a comment - 17/Nov/08 09:18 PM
BTW, we need a test for multiple partitions. I'll not volunteer to write it, though.

Devaraj Das added a comment - 18/Nov/08 11:10 AM
Patch looks fine to me.

Yuri Pradkin added a comment - 18/Nov/08 07:03 PM
Thanks.

I'm concerned that all unit tests passed while the first version of the patch should not have worked for the case where number of partitions > 1.
Does it mean there are no unit tests that would exercise this condition? (isn't the number of partitions the same as the number of reducers? - I'm pretty sure there are tests that have multiple reducers coded in) . Or is it that seeking to segment start and finishing at segment end simply an optimization and the first version wasn't buggy, just sub-optimal?

Please comment: I'd like to understand this before I create a jira on the lack of tests. Thanks!


Devaraj Das added a comment - 18/Nov/08 07:57 PM
I doubt whether there exists a testcase that would spill more than once in the map task (note that the code in question would be exercised only if the number of first level spills is greater than 1). Even if some testcase did, it may not be checking the result versus what is expected.

Yuri Pradkin added a comment - 19/Nov/08 05:13 PM

I doubt whether there exists a testcase that would spill more than once in the map task (note that the code in question would be exercised only if the number of first level spills is greater than 1.

If I understand what you're saying, in that code numSpills is most commonly 1 and the code in question will run at all because of a prior check...

How would a test case force multiple spills? Can it for instance set io.sort.mb, io.sort.spill.percent, and io.sort.record.percent to something really small? Will this alone do the trick?


Devaraj Das added a comment - 19/Nov/08 05:50 PM
The code in question will get executed only when the number of spills is greater than 1. A testcase can force multiple spills by having a very low number for io.sort.mb (something like a few KB and where the map method generates many records)...

Yuri Pradkin added a comment - 19/Nov/08 07:05 PM
Created a jira https://issues.apache.org/jira/browse/HADOOP-4688 ("multiple spills/reducers test case for mergeParts() needed").

Hadoop QA added a comment - 20/Nov/08 10:47 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394101/HADOOP-4614.patch
against trunk revision 719324.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/console

This message is automatically generated.


Yuri Pradkin added a comment - 21/Nov/08 11:42 PM
Can a committer please take a look at this? I'd like to close a loop on this. Thanks.

Raghu Angadi added a comment - 21/Nov/08 11:47 PM
There has been some more discussion since Davaraj's +1. Let me know if it is still the case, I can commit it. Also fill in short description in "release note" for Change log.

Another question: should this go into 0.18 or just 0.19 and trunk is enough?


Yuri Pradkin added a comment - 22/Nov/08 12:12 AM

There has been some more discussion since Davaraj's +1.

If you're talking about the discussion that took place here, I don't think it invalidated the patch in any way.

Also fill in short description in "release note" for Change log.

You want me to update the patch to add this description, right?

Another question: should this go into 0.18 or just 0.19 and trunk is enough?

I realize you're probably not asking me, but my humble opinion is: this is should go in as soon as possible, because this kind of bugs force people to set ulimit -n to hundreds of thousands (as some mentioned on the mailing list).


Chris Douglas added a comment - 22/Nov/08 01:17 AM
Just a few nits:
  • To avoid maintenance hiccups, it would be helpful if the old constructor called the new one:
    this(conf, fs, file, 0, fs.getFileStatus(file).getLen(), codec, preserve);
  • The position typically precedes the length in argument lists; it would be easier to read and less likely to confuse if segmentOffset preceded segmentLength in the Segment cstr
  • The new comment describes the fix, not the code. It should be omitted
  • The indentation for new code in init uses four spaces instead of two

Chris Douglas added a comment - 22/Nov/08 01:18 AM
The fix looks good, though. Good catch on the unit test, too.

Jothi Padmanabhan added a comment - 22/Nov/08 02:48 AM
If this needs to go to 18, we need a different patch as the current patch will not apply directly to the 18 branch. However, the changes might be minimal.

Yuri Pradkin added a comment - 24/Nov/08 04:13 PM
Fixed Chris's nits, thanks.

Yuri Pradkin added a comment - 24/Nov/08 04:40 PM
attached a slightly modified version for 0.18 branch (untested)

Raghu Angadi added a comment - 24/Nov/08 09:11 PM
Yuri,

Please attach the trunk patch again. Hudson (system that runs patch tests) picks up the latest patch and the 0.18 patch will fail for trunk.

Regd commiting to 0.18, I am not so sure : I will leave that MapRed folks. Usually only the regressions from 0.17 and new bugs in 0.18 will qualify for 0.18. As I understand, this bug might have existed for a long time. That said, the patch for 0.18 is useful for users who want to apply it themselves.


Yuri Pradkin added a comment - 24/Nov/08 09:32 PM
re-uploaded

Chris Douglas added a comment - 25/Nov/08 01:50 AM
I think it's a regression from 0.17; SequenceFile.SegmentDescriptor is opened lazily.

+1 on the current patch. Unless anyone objects, I'll commit this to the 18 and 19 branches as well.


Chris Douglas added a comment - 25/Nov/08 10:43 PM
Committing HADOOP-2774 causes trunk to conflict with the current patch. Attaching a merged version, testing it now

Yuri Pradkin added a comment - 25/Nov/08 11:33 PM
I think in the process of merging opening all files crept back in! I'm attaching yet another version which should fix it.
The diff of patches is as follows:

> @@ -1236,27 +1236,23 @@
17,19c17,19
< FSDataInputStream in = rfs.open(filename[i]);
< in.seek(segmentOffset);
<

> - FSDataInputStream in = rfs.open(filename[i]);
> - in.seek(segmentOffset);
> -


Chris Douglas added a comment - 25/Nov/08 11:42 PM
sigh Thanks for catching that. I'm surprised findbugs didn't complain...
     [exec] -1 overall.

     [exec]     +1 @author.  The patch does not contain any @author tags.

     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.

     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.

     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

Chris Douglas added a comment - 26/Nov/08 01:42 AM
All unit tests passed on my machine.

I just committed this. Thanks, Yuri


Hadoop QA added a comment - 26/Nov/08 04:21 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394702/HADOOP-4614.patch
against trunk revision 720632.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/console

This message is automatically generated.


Hudson added a comment - 26/Nov/08 04:41 PM
Integrated in Hadoop-trunk #671 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/671/)
. Lazily open segments when merging map spills to avoid using
too many file descriptors. Contributed by Yuri Pradkin.