Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.17.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Map-Reduce jobs which worked with 0.16.* are failing with OOM with the following trace:

      java.lang.OutOfMemoryError: Java heap space
              at java.util.Arrays.copyOf(Arrays.java:2786)
              at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
              at java.io.DataOutputStream.write(DataOutputStream.java:90)
              at org.apache.hadoop.io.SequenceFile$UncompressedBytes.writeUncompressedBytes(SequenceFile.java:617)
              at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.readNextValue(ReduceTask.java:289)
              at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:232)
              at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:311)
              at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:67)
              at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
              at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
      
      1. 3550-0.patch
        1 kB
        Chris Douglas
      2. 3550-0v17.patch
        1 kB
        Chris Douglas

        Issue Links

          Activity

          Arun C Murthy created issue -
          Hide
          Nigel Daley added a comment -

          Putting this on the 0.18 radar as well.

          Show
          Nigel Daley added a comment - Putting this on the 0.18 radar as well.
          Nigel Daley made changes -
          Field Original Value New Value
          Fix Version/s 0.18.0 [ 12312972 ]
          Hide
          Arun C Murthy added a comment -

          Interestingly, when I ran this with -XX:+HeapDumpOnOutOfMemoryError I see 2 very large byte buffers: each greater than 100MB amounting to a total of ~250M. I'm currently testing it with hprof and heap=sites...

          Show
          Arun C Murthy added a comment - Interestingly, when I ran this with -XX:+HeapDumpOnOutOfMemoryError I see 2 very large byte buffers: each greater than 100MB amounting to a total of ~250M. I'm currently testing it with hprof and heap=sites...
          Hide
          Arun C Murthy added a comment -

          Ok, hprof squeaked more:

          SITES BEGIN (ordered by live bytes) Fri Jun 13 07:51:54 2008
                    percent          live          alloc'ed  stack class
           rank   self  accum     bytes objs     bytes  objs trace name
              1 49.28% 49.28% 127078480 25039 475983328 6696233 376386 byte[]
              2 44.80% 94.08% 115532056    1 171655272     2 377442 byte[]
              3  3.30% 97.38%   8511552 25061 1513552256 4881857 376504 byte[]
              4  0.71% 98.09%   1835232   14   1835232    14 372662 byte[]
              5  0.71% 98.80%   1835232   14   1835232    14 372912 byte[]
              6  0.23% 99.03%    599184    9 174961728  2628 376627 byte[]
              7  0.16% 99.19%    400960 25060  78111920 4881857 376505 byte[]
              8  0.05% 99.24%    131088    1    131088     1 334784 byte[]
              9  0.05% 99.29%    131088    1    131088     1 375533 byte[]
             10  0.05% 99.34%    131088    1    131088     1 376424 byte[]
             11  0.05% 99.39%    131088    1    262176     2 377077 byte[]
             12  0.03% 99.42%     65552    1     65552     1 309878 byte[]
          SITES END
          

          Here are the traces for the top two:

          TRACE 376386: (thread=200001)
            org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(SequenceFile.java:604)
            org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$900(SequenceFile.java:594)
            org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java:2022)
            org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(SequenceFile.java:3014)
            org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2758)
            org.apache.hadoop.mapred.ReduceTask$ValuesIterator.readNextKey(ReduceTask.java:272)
            org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:233)
            org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:311)
            org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:67)
            org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
            org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
          

          and

          TRACE 377442: (thread=200001)
            java.util.Arrays.copyOf(Arrays.java:2786)
            java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
            java.io.DataOutputStream.write(DataOutputStream.java:90)
            org.apache.hadoop.io.SequenceFile$UncompressedBytes.writeUncompressedBytes(SequenceFile.java:617)
            org.apache.hadoop.mapred.ReduceTask$ValuesIterator.readNextValue(ReduceTask.java:289)
            org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:232)
            org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:311)
            org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:67)
            org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
            org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
          
          Show
          Arun C Murthy added a comment - Ok, hprof squeaked more: SITES BEGIN (ordered by live bytes) Fri Jun 13 07:51:54 2008 percent live alloc'ed stack class rank self accum bytes objs bytes objs trace name 1 49.28% 49.28% 127078480 25039 475983328 6696233 376386 byte[] 2 44.80% 94.08% 115532056 1 171655272 2 377442 byte[] 3 3.30% 97.38% 8511552 25061 1513552256 4881857 376504 byte[] 4 0.71% 98.09% 1835232 14 1835232 14 372662 byte[] 5 0.71% 98.80% 1835232 14 1835232 14 372912 byte[] 6 0.23% 99.03% 599184 9 174961728 2628 376627 byte[] 7 0.16% 99.19% 400960 25060 78111920 4881857 376505 byte[] 8 0.05% 99.24% 131088 1 131088 1 334784 byte[] 9 0.05% 99.29% 131088 1 131088 1 375533 byte[] 10 0.05% 99.34% 131088 1 131088 1 376424 byte[] 11 0.05% 99.39% 131088 1 262176 2 377077 byte[] 12 0.03% 99.42% 65552 1 65552 1 309878 byte[] SITES END Here are the traces for the top two: TRACE 376386: (thread=200001) org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(SequenceFile.java:604) org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$900(SequenceFile.java:594) org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java:2022) org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(SequenceFile.java:3014) org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2758) org.apache.hadoop.mapred.ReduceTask$ValuesIterator.readNextKey(ReduceTask.java:272) org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:233) org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:311) org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:67) org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) and TRACE 377442: (thread=200001) java.util.Arrays.copyOf(Arrays.java:2786) java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) java.io.DataOutputStream.write(DataOutputStream.java:90) org.apache.hadoop.io.SequenceFile$UncompressedBytes.writeUncompressedBytes(SequenceFile.java:617) org.apache.hadoop.mapred.ReduceTask$ValuesIterator.readNextValue(ReduceTask.java:289) org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:232) org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:311) org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:67) org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
          Chris Douglas made changes -
          Link This issue depends on HADOOP-3475 [ HADOOP-3475 ]
          Hide
          Chris Douglas added a comment -

          Linking this to HADOOP-3475, as part of the fix for this includes correctly sizing the kvindices buffer. getVBytesForOffset relies on it being 3x the size of the kvoffset buffer to be correct.

          Show
          Chris Douglas added a comment - Linking this to HADOOP-3475 , as part of the fix for this includes correctly sizing the kvindices buffer. getVBytesForOffset relies on it being 3x the size of the kvoffset buffer to be correct.
          Hide
          Chris Douglas added a comment -

          Worked with Arun to find a number of other corner cases- such as zero-length values- that might cause the framework to produce incorrect results.

          Show
          Chris Douglas added a comment - Worked with Arun to find a number of other corner cases- such as zero-length values- that might cause the framework to produce incorrect results.
          Chris Douglas made changes -
          Attachment 3550-0v17.patch [ 12384163 ]
          Chris Douglas made changes -
          Attachment 3550-0.patch [ 12384164 ]
          Chris Douglas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Arun C Murthy made changes -
          Assignee Arun C Murthy [ acmurthy ] Chris Douglas [ chris.douglas ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12384164/3550-0.patch
          against trunk revision 668867.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2676/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2676/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2676/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2676/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384164/3550-0.patch against trunk revision 668867. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2676/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2676/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2676/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2676/console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          Chris/me are working on test cases for this...

          Show
          Arun C Murthy added a comment - Chris/me are working on test cases for this...
          Arun C Murthy made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Devaraj Das made changes -
          Component/s mapred [ 12310690 ]
          Hide
          Chris Douglas added a comment - - edited

          The core issue is that the data structure that stores serialized keys and values from the map can incorrectly calculate the length of a value under certain conditions. The OOM exception is from deserialized values exceeding the memory available to a merge.

          The error was introduced by HADOOP-2919, which replaced the data structures responsible for collection and sort. The size of one of the buffers- assumed to be exactly 3x the size of a complementary buffer- can be larger than assumed in some cases. If map output is spilled to disk while the index buffer is wrapped, it will read indices from the uninitialized portion of the buffer for the last item in the spill, and incorrectly believe that it extends to the end of the buffer.

          An example might be helpful. Per the description in HADOOP-2919, each key and value is serialized into a circular buffer. The partition, start, and end indices for each record are stored in an array. For each value, its end index is either the following key start index or a marker reserved for the last record in the spill.

          Consider:

          | P | 35 | 41 | ... | P |  5 | 20 | P | 25 | 30 | 0 | 0 | 0 |
            0    1    2    ...  i                  ...                i+9
          

          A record at offset i belongs to partition P, has a key beginning at position 5 w/ length 15 (20 - 5), and a value beginning at position 10 with length 5 (25 - 20). Similarly, the record following it belongs to partition P, a key beginning at position 25 w/ length 5, and a value beginning at position 30. What follows this record (from position i+6) is uninitialized data; the record that follows it has metadata stored at the start of the buffer, but the "next" position is computed mod the length of the buffer will look in a section that suggests that the next key starts at the front of the buffer. As such, it will determine that this value must extend all the way to the end of the buffer, which is almost certainly not true. The correct "next" offset is at the front of the buffer (35).

          This is the most common case. It occurs when both of the following are true:

          1. A map emits no fewer than io.sort.mb * io.sort.record.percent / 4 records (default 1310720 records, 1677721 with a 128 io.sort.mb)
          2. io.sort.mb * io.sort.record.percent (mod 16) != 0, 7, 10, or 13 (note that with the default settings of 100MB and 0.05, respectively, this isn't the case)

          This is corrected by HADOOP-3475.

          When the preceding is true, silent data corruption can occur when all of the following are true:

          1. The heap size is sufficient to prevent the merge from failing
          2. The Writable consumed takes its length not from the stream, but from the intermediate file format. This could be true of Writables that read an entire stream without a length field (none of the Writables packaged in Hadoop work this way, I think)
          3. io.sort.mb * io.sort.record.percent (mod 16) == 2, 3, 5, 6, 8, 9, 12, or 15 (other values will cause errors in the spill)

          Most of the time, it will emit the value followed by the contents of the buffer, the remainder of which most if not all Writables- satisfying the DataOutput contract- will ignore. Anything dealing with the raw bytes- particularly comparators- might be confused, but deserialized objects should work. Preliminarily, few successful jobs should be suspected.

          The second case addressed in this patch is zero-length values (NullWritable is a good, and in my mind only, example). Note that this doesn't include empty Text, BytesWritable, or other data that includes a length field requiring at least one byte. The size for these is almost never correct, but this case should fail noisily.

          The last case is when a serialized key fills the last bytes of the buffer. In this case, an ArrayOutOfBoundsException should be thrown during the sort, as the comparator assumes value starts right after the key. An equivalent check for valstart == 0 would work and arguably be more efficient, but this is easier to understand.

          Show
          Chris Douglas added a comment - - edited The core issue is that the data structure that stores serialized keys and values from the map can incorrectly calculate the length of a value under certain conditions. The OOM exception is from deserialized values exceeding the memory available to a merge. The error was introduced by HADOOP-2919 , which replaced the data structures responsible for collection and sort. The size of one of the buffers- assumed to be exactly 3x the size of a complementary buffer- can be larger than assumed in some cases. If map output is spilled to disk while the index buffer is wrapped, it will read indices from the uninitialized portion of the buffer for the last item in the spill, and incorrectly believe that it extends to the end of the buffer. An example might be helpful. Per the description in HADOOP-2919 , each key and value is serialized into a circular buffer. The partition, start, and end indices for each record are stored in an array. For each value, its end index is either the following key start index or a marker reserved for the last record in the spill. Consider: | P | 35 | 41 | ... | P | 5 | 20 | P | 25 | 30 | 0 | 0 | 0 | 0 1 2 ... i ... i+9 A record at offset i belongs to partition P, has a key beginning at position 5 w/ length 15 (20 - 5), and a value beginning at position 10 with length 5 (25 - 20). Similarly, the record following it belongs to partition P, a key beginning at position 25 w/ length 5, and a value beginning at position 30. What follows this record (from position i+6) is uninitialized data; the record that follows it has metadata stored at the start of the buffer, but the "next" position is computed mod the length of the buffer will look in a section that suggests that the next key starts at the front of the buffer. As such, it will determine that this value must extend all the way to the end of the buffer, which is almost certainly not true. The correct "next" offset is at the front of the buffer (35). This is the most common case. It occurs when both of the following are true: A map emits no fewer than io.sort.mb * io.sort.record.percent / 4 records (default 1310720 records, 1677721 with a 128 io.sort.mb) io.sort.mb * io.sort.record.percent (mod 16) != 0, 7, 10, or 13 (note that with the default settings of 100MB and 0.05, respectively, this isn't the case) This is corrected by HADOOP-3475 . When the preceding is true, silent data corruption can occur when all of the following are true: The heap size is sufficient to prevent the merge from failing The Writable consumed takes its length not from the stream, but from the intermediate file format. This could be true of Writables that read an entire stream without a length field (none of the Writables packaged in Hadoop work this way, I think) io.sort.mb * io.sort.record.percent (mod 16) == 2, 3, 5, 6, 8, 9, 12, or 15 (other values will cause errors in the spill) Most of the time, it will emit the value followed by the contents of the buffer, the remainder of which most if not all Writables- satisfying the DataOutput contract- will ignore. Anything dealing with the raw bytes- particularly comparators- might be confused, but deserialized objects should work. Preliminarily, few successful jobs should be suspected. The second case addressed in this patch is zero-length values (NullWritable is a good, and in my mind only, example). Note that this doesn't include empty Text, BytesWritable, or other data that includes a length field requiring at least one byte. The size for these is almost never correct, but this case should fail noisily. The last case is when a serialized key fills the last bytes of the buffer. In this case, an ArrayOutOfBoundsException should be thrown during the sort, as the comparator assumes value starts right after the key. An equivalent check for valstart == 0 would work and arguably be more efficient, but this is easier to understand.
          Hide
          Chris Douglas added a comment -

          I just committed this.

          Show
          Chris Douglas added a comment - I just committed this.
          Chris Douglas made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #523 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/523/ )
          Nigel Daley made changes -
          Fix Version/s 0.18.0 [ 12312972 ]
          Owen O'Malley made changes -
          Component/s mapred [ 12310690 ]
          Gavin made changes -
          Link This issue depends on HADOOP-3475 [ HADOOP-3475 ]
          Gavin made changes -
          Link This issue depends upon HADOOP-3475 [ HADOOP-3475 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          4d 20h 40m 1 Chris Douglas 18/Jun/08 00:35
          Patch Available Patch Available Open Open
          4h 35m 1 Arun C Murthy 18/Jun/08 05:10
          Open Open Resolved Resolved
          18h 34m 1 Chris Douglas 18/Jun/08 23:45

            People

            • Assignee:
              Chris Douglas
              Reporter:
              Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development