Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14376

Memory leak when reading a compressed file using the native library

    Details

    • Hadoop Flags:
      Reviewed

      Description

      Opening and closing a large number of bzip2-compressed input streams causes the process to be killed on OutOfMemory when using the native bzip2 library.

      Our initial analysis suggests that this can be caused by DecompressorStream overriding the close() method, and therefore skipping the line from its parent: CodecPool.returnDecompressor(trackedDecompressor). When the decompressor object is a Bzip2Decompressor, its native end() method is never called, and the allocated memory isn't freed.

      If this analysis is correct, the simplest way to fix this bug would be to replace in.close() with super.close() in DecompressorStream.

        Attachments

        1. Bzip2MemoryTester.java
          0.8 kB
          Eli Acherkan
        2. log4j.properties
          0.3 kB
          Eli Acherkan
        3. HADOOP-14376.001.patch
          10 kB
          Eli Acherkan
        4. HADOOP-14376.002.patch
          12 kB
          Eli Acherkan
        5. HADOOP-14376.003.patch
          12 kB
          Eli Acherkan
        6. HADOOP-14376.004.patch
          12 kB
          Eli Acherkan

          Activity

            People

            • Assignee:
              eliac Eli Acherkan
              Reporter:
              eliac Eli Acherkan
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: