Hadoop Common
  1. Hadoop Common
  2. HADOOP-4614

"Too many open files" error while processing a large gzip file

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.2
    • Fix Version/s: 0.18.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I am running a simple word count program on a gzip compressed data of size 4 GB (Uncompressed size is about 7 GB). I have setup of 17 nodes in my Hadoop cluster. After some time, I get the following exception:

      java.io.FileNotFoundException: /usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_0/output/spill4055.out.index
      (Too many open files)
      at java.io.FileInputStream.open(Native Method)
      at java.io.FileInputStream.(FileInputStream.java:137)
      at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:62)
      at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:98)
      at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:168)
      at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
      at org.apache.hadoop.mapred.IndexRecord.readIndexFile(IndexRecord.java:47)
      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.getIndexInformation(MapTask.java:1339)
      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1237)
      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
      at org.apache.hadoop.mapred.Child.main(Child.java:155)

      From a user's perspective I know that Hadoop will use only one mapper for a gzipped file. The above exception suggests that probably Hadoop puts the intermediate data into many files. But the question is that "exactly how many open files are required in the worst case for any data size and cluster size?" Currently it looks as if Hadoop needs more number of open files as the size of input or the cluster size (in terms of nodes, mapper, reducers) increases. This is not plausible as far as scalability is concerned. A user needs to write some number in the /etc/security/limits.conf file that how many open files are allowed by hadoop node. The question is what that "magical number" should be?

      So probably the best solution to this problem is to change Hadoop such a way that it can work with some moderate number of allowed open files (e.g. 4 K) or any other number should be suggested as an upper limit such that a user is sure that for any data size and cluster size, hadoop will not run into this "too many open files" issue.

      1. openfds.txt
        575 kB
        Yuri Pradkin
      2. HADOOP-4614-branch0.18.patch
        3 kB
        Yuri Pradkin
      3. HADOOP-4614.patch
        1.0 kB
        Yuri Pradkin
      4. HADOOP-4614.patch
        3 kB
        Yuri Pradkin
      5. HADOOP-4614.patch
        3 kB
        Yuri Pradkin
      6. HADOOP-4614.patch
        3 kB
        Yuri Pradkin
      7. HADOOP-4614.patch
        4 kB
        Yuri Pradkin
      8. 4614-trunk.patch
        4 kB
        Chris Douglas

        Activity

        Abdul Qadeer created issue -
        Hide
        Raghu Angadi added a comment -

        It would help if you have a list of all the file that were open around this time (with lsof or better, under /proc/pid/fd).

        Also make sure you are not hitting HADOOP-4346 (may be not, /proc/pid/fd would make it more clear).

        Show
        Raghu Angadi added a comment - It would help if you have a list of all the file that were open around this time (with lsof or better, under /proc/pid/fd). Also make sure you are not hitting HADOOP-4346 (may be not, /proc/pid/fd would make it more clear).
        Hide
        Yuri Pradkin added a comment -

        I'm posting the results of lsof while running Abdul's code.

        After I upped the max number of fds to 16K, the job ran to completion.
        I was monitoring the number of open files/processes every 15s (by simply
        running ps and lsof | wc -l) and saw this:
        #processes open_files
        ...
        13 646
        13 648
        12 2535
        13 4860
        12 4346
        12 3842
        12 3324
        12 2823
        12 2316
        12 1852
        12 1387
        12 936
        12 643
        12 643
        12 643
        12 643
        12 643
        12 643
        13 642
        12 642
        12 4775
        12 2738
        12 917
        12 643
        12 642
        12 4992
        12 4453
        12 3943
        12 3299
        12 2855
        12 2437
        ...

        It looks like something (garbage collection?) cleans up fds periodically; the
        max I saw was 5007 (but again, there may have been more in between the 15s
        sampling interval).

        Show
        Yuri Pradkin added a comment - I'm posting the results of lsof while running Abdul's code. After I upped the max number of fds to 16K, the job ran to completion. I was monitoring the number of open files/processes every 15s (by simply running ps and lsof | wc -l) and saw this: #processes open_files ... 13 646 13 648 12 2535 13 4860 12 4346 12 3842 12 3324 12 2823 12 2316 12 1852 12 1387 12 936 12 643 12 643 12 643 12 643 12 643 12 643 13 642 12 642 12 4775 12 2738 12 917 12 643 12 642 12 4992 12 4453 12 3943 12 3299 12 2855 12 2437 ... It looks like something (garbage collection?) cleans up fds periodically; the max I saw was 5007 (but again, there may have been more in between the 15s sampling interval).
        Yuri Pradkin made changes -
        Field Original Value New Value
        Attachment openfds.txt [ 12393542 ]
        Hide
        Yuri Pradkin added a comment -

        Most of those open files are disk, and in your link, Raghu, you're talking about network IO, unless I misunderstood. Does it still look to you like these are related?

        Show
        Yuri Pradkin added a comment - Most of those open files are disk, and in your link, Raghu, you're talking about network IO, unless I misunderstood. Does it still look to you like these are related?
        Hide
        Raghu Angadi added a comment -

        > Does it still look to you like these are related?
        I don't think they are related. This seems like a different issue.

        Note that 5007 does not matter much since that is for all the processes together. What really matters is the max fds for a single process. In the log you attached it might be in 2-3k range.

        This seems more related to internals of mapred. please change the 'component' to mapred so that it gets better attention. Also provide approximate details of how big the output is, number of mappers, number of reducers etc.

        Show
        Raghu Angadi added a comment - > Does it still look to you like these are related? I don't think they are related. This seems like a different issue. Note that 5007 does not matter much since that is for all the processes together. What really matters is the max fds for a single process. In the log you attached it might be in 2-3k range. This seems more related to internals of mapred. please change the 'component' to mapred so that it gets better attention. Also provide approximate details of how big the output is, number of mappers, number of reducers etc.
        Abdul Qadeer made changes -
        Component/s mapred [ 12310690 ]
        Component/s io [ 12310687 ]
        Hide
        Abdul Qadeer added a comment -

        Here are few details about the input and output:

        Input is a single gzip file of size: 2.3 G Bytes which deflates to 6.27 G Bytes of plain text.

        Job was run on 17 node Hadoop cluster with Max mapper capacity of 28 and reducer capacity of 39

        HDFS bytes written 0 (at Map Stage) 598,425 (at Reduce Stage) 598,425 (Total)
        Local bytes read 45,190,007,737 11,044,381,978 56,234,389,715
        Local bytes written 53,421,433,873 11,044,381,978 64,465,815,851

        The job completes in 4 hours and 10 minutes.

        Show
        Abdul Qadeer added a comment - Here are few details about the input and output: Input is a single gzip file of size: 2.3 G Bytes which deflates to 6.27 G Bytes of plain text. Job was run on 17 node Hadoop cluster with Max mapper capacity of 28 and reducer capacity of 39 HDFS bytes written 0 (at Map Stage) 598,425 (at Reduce Stage) 598,425 (Total) Local bytes read 45,190,007,737 11,044,381,978 56,234,389,715 Local bytes written 53,421,433,873 11,044,381,978 64,465,815,851 The job completes in 4 hours and 10 minutes.
        Hide
        Devaraj Das added a comment -

        I took one look at the mergeParts code and didn't see any obvious issue with that. What's your setting for io.sort.factor and io.sort.mb? Also, could you please run the same job again with the exact (default?) setting for the Max FDs that resulted in this problem, and this time could you do "lsof -p <pid-of-map-task-process>". Please post the results here. Thanks!

        Show
        Devaraj Das added a comment - I took one look at the mergeParts code and didn't see any obvious issue with that. What's your setting for io.sort.factor and io.sort.mb? Also, could you please run the same job again with the exact (default?) setting for the Max FDs that resulted in this problem, and this time could you do "lsof -p <pid-of-map-task-process>". Please post the results here. Thanks!
        Hide
        Abdul Qadeer added a comment -

        io.sort.factor = 10
        io.sort.mb = 100

        Yuri attached a file with this JIRA named openfds.txt It has the information you asked for. All the FD parameters were on default. This file only shows hadoop open FDs.

        Show
        Abdul Qadeer added a comment - io.sort.factor = 10 io.sort.mb = 100 Yuri attached a file with this JIRA named openfds.txt It has the information you asked for. All the FD parameters were on default. This file only shows hadoop open FDs.
        Hide
        Raghu Angadi added a comment -

        from openfds.txt, you probably mean PID 25731. It has 2283 fds open.. much below the 4k limit. But it might be an initial symptom.

        Devaraj, do the files opened by 25731 look appropriate?

        Show
        Raghu Angadi added a comment - from openfds.txt, you probably mean PID 25731. It has 2283 fds open.. much below the 4k limit. But it might be an initial symptom. Devaraj, do the files opened by 25731 look appropriate?
        Hide
        Yuri Pradkin added a comment -

        It has 2283 fds open.. much below the 4k limit. But it might be an initial symptom.

        I believe ulimit -n is per user, not per process.

        Also, as I was taking samples of lsof every 15s; the number could have been larger in-between. I don't know how to run lsof at the precise moment the job dies of fds starvation.

        Show
        Yuri Pradkin added a comment - It has 2283 fds open.. much below the 4k limit. But it might be an initial symptom. I believe ulimit -n is per user, not per process. Also, as I was taking samples of lsof every 15s; the number could have been larger in-between. I don't know how to run lsof at the precise moment the job dies of fds starvation.
        Hide
        Raghu Angadi added a comment -

        > I believe ulimit -n is per user, not per process.

        That is news to me. Not the case in linux machines I used. Any documentation that says so?

        Show
        Raghu Angadi added a comment - > I believe ulimit -n is per user, not per process. That is news to me. Not the case in linux machines I used. Any documentation that says so?
        Hide
        Yuri Pradkin added a comment -

        Apologies, it looks like it's per process (and it's children, I guess?).

        Show
        Yuri Pradkin added a comment - Apologies, it looks like it's per process (and it's children, I guess?).
        Hide
        Devaraj Das added a comment -

        Looks like there is indeed a problem. In mergeParts all the spill files are opened at once for merge (although merge actually doesn't require more than io.sort.factor number of files to be opened at once). This should be fixed.

        Show
        Devaraj Das added a comment - Looks like there is indeed a problem. In mergeParts all the spill files are opened at once for merge (although merge actually doesn't require more than io.sort.factor number of files to be opened at once). This should be fixed.
        Hide
        Yuri Pradkin added a comment -

        I wonder if something like this attachment might fix the problem.. Not that I know what I'm doing, really. I'll do the testing, and post an update later; in the meantime I'd like Hudson to run unit tests on this.

        Show
        Yuri Pradkin added a comment - I wonder if something like this attachment might fix the problem.. Not that I know what I'm doing, really. I'll do the testing, and post an update later; in the meantime I'd like Hudson to run unit tests on this.
        Yuri Pradkin made changes -
        Attachment HADOOP-4614.patch [ 12393895 ]
        Yuri Pradkin made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Yuri Pradkin added a comment -

        I ran this patch locally after I reduced ulimit -n to 4096. The job that was consistently failing before under this setup now runs to completion. I don't why Hudson didn't validate it...

        Show
        Yuri Pradkin added a comment - I ran this patch locally after I reduced ulimit -n to 4096. The job that was consistently failing before under this setup now runs to completion. I don't why Hudson didn't validate it...
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12393895/HADOOP-4614.patch
        against trunk revision 714107.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12393895/HADOOP-4614.patch against trunk revision 714107. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3595/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        The attached patch won't work since the Segment that is created is not positioned at the right place for any partition# > 0. Note that in the current code there is a in.seek(segmentOffset); This seek is required at some point (maybe you can define a new Segment constructor that takes position as an argument and use that to seek to the correct offset). Ditto for segmentLength.

        Show
        Devaraj Das added a comment - The attached patch won't work since the Segment that is created is not positioned at the right place for any partition# > 0. Note that in the current code there is a in.seek(segmentOffset); This seek is required at some point (maybe you can define a new Segment constructor that takes position as an argument and use that to seek to the correct offset). Ditto for segmentLength.
        Devaraj Das made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Yuri Pradkin made changes -
        Assignee Yuri Pradkin [ yurip ]
        Yuri Pradkin made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Hide
        Yuri Pradkin added a comment -

        added another ctor as suggested by Devaraj.

        Show
        Yuri Pradkin added a comment - added another ctor as suggested by Devaraj.
        Yuri Pradkin made changes -
        Attachment HADOOP-4614.patch [ 12394101 ]
        Hide
        Yuri Pradkin added a comment -

        BTW, we need a test for multiple partitions. I'll not volunteer to write it, though.

        Show
        Yuri Pradkin added a comment - BTW, we need a test for multiple partitions. I'll not volunteer to write it, though.
        Yuri Pradkin made changes -
        Status In Progress [ 3 ] Patch Available [ 10002 ]
        Hide
        Devaraj Das added a comment -

        Patch looks fine to me.

        Show
        Devaraj Das added a comment - Patch looks fine to me.
        Hide
        Yuri Pradkin added a comment -

        Thanks.

        I'm concerned that all unit tests passed while the first version of the patch should not have worked for the case where number of partitions > 1.
        Does it mean there are no unit tests that would exercise this condition? (isn't the number of partitions the same as the number of reducers? - I'm pretty sure there are tests that have multiple reducers coded in) . Or is it that seeking to segment start and finishing at segment end simply an optimization and the first version wasn't buggy, just sub-optimal?

        Please comment: I'd like to understand this before I create a jira on the lack of tests. Thanks!

        Show
        Yuri Pradkin added a comment - Thanks. I'm concerned that all unit tests passed while the first version of the patch should not have worked for the case where number of partitions > 1. Does it mean there are no unit tests that would exercise this condition? (isn't the number of partitions the same as the number of reducers? - I'm pretty sure there are tests that have multiple reducers coded in) . Or is it that seeking to segment start and finishing at segment end simply an optimization and the first version wasn't buggy, just sub-optimal? Please comment: I'd like to understand this before I create a jira on the lack of tests. Thanks!
        Hide
        Devaraj Das added a comment -

        I doubt whether there exists a testcase that would spill more than once in the map task (note that the code in question would be exercised only if the number of first level spills is greater than 1). Even if some testcase did, it may not be checking the result versus what is expected.

        Show
        Devaraj Das added a comment - I doubt whether there exists a testcase that would spill more than once in the map task (note that the code in question would be exercised only if the number of first level spills is greater than 1). Even if some testcase did, it may not be checking the result versus what is expected.
        Hide
        Yuri Pradkin added a comment -

        I doubt whether there exists a testcase that would spill more than once in the map task (note that the code in question would be exercised only if the number of first level spills is greater than 1.

        If I understand what you're saying, in that code numSpills is most commonly 1 and the code in question will run at all because of a prior check...

        How would a test case force multiple spills? Can it for instance set io.sort.mb, io.sort.spill.percent, and io.sort.record.percent to something really small? Will this alone do the trick?

        Show
        Yuri Pradkin added a comment - I doubt whether there exists a testcase that would spill more than once in the map task (note that the code in question would be exercised only if the number of first level spills is greater than 1. If I understand what you're saying, in that code numSpills is most commonly 1 and the code in question will run at all because of a prior check... How would a test case force multiple spills? Can it for instance set io.sort.mb , io.sort.spill.percent , and io.sort.record.percent to something really small? Will this alone do the trick?
        Hide
        Devaraj Das added a comment -

        The code in question will get executed only when the number of spills is greater than 1. A testcase can force multiple spills by having a very low number for io.sort.mb (something like a few KB and where the map method generates many records)...

        Show
        Devaraj Das added a comment - The code in question will get executed only when the number of spills is greater than 1. A testcase can force multiple spills by having a very low number for io.sort.mb (something like a few KB and where the map method generates many records)...
        Hide
        Yuri Pradkin added a comment -

        Created a jira https://issues.apache.org/jira/browse/HADOOP-4688 ("multiple spills/reducers test case for mergeParts() needed").

        Show
        Yuri Pradkin added a comment - Created a jira https://issues.apache.org/jira/browse/HADOOP-4688 ("multiple spills/reducers test case for mergeParts() needed").
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12394101/HADOOP-4614.patch
        against trunk revision 719324.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12394101/HADOOP-4614.patch against trunk revision 719324. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3612/console This message is automatically generated.
        Hide
        Yuri Pradkin added a comment -

        Can a committer please take a look at this? I'd like to close a loop on this. Thanks.

        Show
        Yuri Pradkin added a comment - Can a committer please take a look at this? I'd like to close a loop on this. Thanks.
        Hide
        Raghu Angadi added a comment -

        There has been some more discussion since Davaraj's +1. Let me know if it is still the case, I can commit it. Also fill in short description in "release note" for Change log.

        Another question: should this go into 0.18 or just 0.19 and trunk is enough?

        Show
        Raghu Angadi added a comment - There has been some more discussion since Davaraj's +1. Let me know if it is still the case, I can commit it. Also fill in short description in "release note" for Change log. Another question: should this go into 0.18 or just 0.19 and trunk is enough?
        Hide
        Yuri Pradkin added a comment -

        There has been some more discussion since Davaraj's +1.

        If you're talking about the discussion that took place here, I don't think it invalidated the patch in any way.

        Also fill in short description in "release note" for Change log.

        You want me to update the patch to add this description, right?

        Another question: should this go into 0.18 or just 0.19 and trunk is enough?

        I realize you're probably not asking me, but my humble opinion is: this is should go in as soon as possible, because this kind of bugs force people to set ulimit -n to hundreds of thousands (as some mentioned on the mailing list).

        Show
        Yuri Pradkin added a comment - There has been some more discussion since Davaraj's +1. If you're talking about the discussion that took place here, I don't think it invalidated the patch in any way. Also fill in short description in "release note" for Change log. You want me to update the patch to add this description, right? Another question: should this go into 0.18 or just 0.19 and trunk is enough? I realize you're probably not asking me, but my humble opinion is: this is should go in as soon as possible, because this kind of bugs force people to set ulimit -n to hundreds of thousands (as some mentioned on the mailing list).
        Hide
        Chris Douglas added a comment -

        Just a few nits:

        • To avoid maintenance hiccups, it would be helpful if the old constructor called the new one:
          this(conf, fs, file, 0, fs.getFileStatus(file).getLen(), codec, preserve);
          
        • The position typically precedes the length in argument lists; it would be easier to read and less likely to confuse if segmentOffset preceded segmentLength in the Segment cstr
        • The new comment describes the fix, not the code. It should be omitted
        • The indentation for new code in init uses four spaces instead of two
        Show
        Chris Douglas added a comment - Just a few nits: To avoid maintenance hiccups, it would be helpful if the old constructor called the new one: this (conf, fs, file, 0, fs.getFileStatus(file).getLen(), codec, preserve); The position typically precedes the length in argument lists; it would be easier to read and less likely to confuse if segmentOffset preceded segmentLength in the Segment cstr The new comment describes the fix, not the code. It should be omitted The indentation for new code in init uses four spaces instead of two
        Hide
        Chris Douglas added a comment -

        The fix looks good, though. Good catch on the unit test, too.

        Show
        Chris Douglas added a comment - The fix looks good, though. Good catch on the unit test, too.
        Hide
        Jothi Padmanabhan added a comment -

        If this needs to go to 18, we need a different patch as the current patch will not apply directly to the 18 branch. However, the changes might be minimal.

        Show
        Jothi Padmanabhan added a comment - If this needs to go to 18, we need a different patch as the current patch will not apply directly to the 18 branch. However, the changes might be minimal.
        Yuri Pradkin made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Yuri Pradkin made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Hide
        Yuri Pradkin added a comment -

        Fixed Chris's nits, thanks.

        Show
        Yuri Pradkin added a comment - Fixed Chris's nits, thanks.
        Yuri Pradkin made changes -
        Attachment HADOOP-4614.patch [ 12394567 ]
        Yuri Pradkin made changes -
        Status In Progress [ 3 ] Patch Available [ 10002 ]
        Hide
        Yuri Pradkin added a comment -

        attached a slightly modified version for 0.18 branch (untested)

        Show
        Yuri Pradkin added a comment - attached a slightly modified version for 0.18 branch (untested)
        Yuri Pradkin made changes -
        Attachment HADOOP-4614-branch0.18.patch [ 12394576 ]
        Hide
        Raghu Angadi added a comment -

        Yuri,

        Please attach the trunk patch again. Hudson (system that runs patch tests) picks up the latest patch and the 0.18 patch will fail for trunk.

        Regd commiting to 0.18, I am not so sure : I will leave that MapRed folks. Usually only the regressions from 0.17 and new bugs in 0.18 will qualify for 0.18. As I understand, this bug might have existed for a long time. That said, the patch for 0.18 is useful for users who want to apply it themselves.

        Show
        Raghu Angadi added a comment - Yuri, Please attach the trunk patch again. Hudson (system that runs patch tests) picks up the latest patch and the 0.18 patch will fail for trunk. Regd commiting to 0.18, I am not so sure : I will leave that MapRed folks. Usually only the regressions from 0.17 and new bugs in 0.18 will qualify for 0.18. As I understand, this bug might have existed for a long time. That said, the patch for 0.18 is useful for users who want to apply it themselves.
        Hide
        Yuri Pradkin added a comment -

        re-uploaded

        Show
        Yuri Pradkin added a comment - re-uploaded
        Yuri Pradkin made changes -
        Attachment HADOOP-4614.patch [ 12394595 ]
        Hide
        Chris Douglas added a comment -

        I think it's a regression from 0.17; SequenceFile.SegmentDescriptor is opened lazily.

        +1 on the current patch. Unless anyone objects, I'll commit this to the 18 and 19 branches as well.

        Show
        Chris Douglas added a comment - I think it's a regression from 0.17; SequenceFile.SegmentDescriptor is opened lazily. +1 on the current patch. Unless anyone objects, I'll commit this to the 18 and 19 branches as well.
        Chris Douglas made changes -
        Priority Major [ 3 ] Blocker [ 1 ]
        Hide
        Chris Douglas added a comment -

        Committing HADOOP-2774 causes trunk to conflict with the current patch. Attaching a merged version, testing it now

        Show
        Chris Douglas added a comment - Committing HADOOP-2774 causes trunk to conflict with the current patch. Attaching a merged version, testing it now
        Chris Douglas made changes -
        Attachment 4614-trunk.patch [ 12394699 ]
        Hide
        Yuri Pradkin added a comment -

        I think in the process of merging opening all files crept back in! I'm attaching yet another version which should fix it.
        The diff of patches is as follows:

        > @@ -1236,27 +1236,23 @@
        17,19c17,19
        < FSDataInputStream in = rfs.open(filename[i]);
        < in.seek(segmentOffset);
        <

        > - FSDataInputStream in = rfs.open(filename[i]);
        > - in.seek(segmentOffset);
        > -

        Show
        Yuri Pradkin added a comment - I think in the process of merging opening all files crept back in! I'm attaching yet another version which should fix it. The diff of patches is as follows: > @@ -1236,27 +1236,23 @@ 17,19c17,19 < FSDataInputStream in = rfs.open(filename [i] ); < in.seek(segmentOffset); < — > - FSDataInputStream in = rfs.open(filename [i] ); > - in.seek(segmentOffset); > -
        Yuri Pradkin made changes -
        Attachment HADOOP-4614.patch [ 12394702 ]
        Hide
        Chris Douglas added a comment -

        sigh Thanks for catching that. I'm surprised findbugs didn't complain...

             [exec] -1 overall.
        
             [exec]     +1 @author.  The patch does not contain any @author tags.
        
             [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
             [exec]                         Please justify why no tests are needed for this patch.
        
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
        
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
        
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
        
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
        
        Show
        Chris Douglas added a comment - sigh Thanks for catching that. I'm surprised findbugs didn't complain... [exec] -1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
        Hide
        Chris Douglas added a comment -

        All unit tests passed on my machine.

        I just committed this. Thanks, Yuri

        Show
        Chris Douglas added a comment - All unit tests passed on my machine. I just committed this. Thanks, Yuri
        Chris Douglas made changes -
        Resolution Fixed [ 1 ]
        Hadoop Flags [Reviewed]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12394702/HADOOP-4614.patch
        against trunk revision 720632.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12394702/HADOOP-4614.patch against trunk revision 720632. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3647/console This message is automatically generated.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk #671 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/671/)
        . Lazily open segments when merging map spills to avoid using
        too many file descriptors. Contributed by Yuri Pradkin.

        Show
        Hudson added a comment - Integrated in Hadoop-trunk #671 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/671/ ) . Lazily open segments when merging map spills to avoid using too many file descriptors. Contributed by Yuri Pradkin.
        Nigel Daley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]

          People

          • Assignee:
            Yuri Pradkin
            Reporter:
            Abdul Qadeer
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development