Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14376

Memory leak when reading a compressed file using the native library

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Opening and closing a large number of bzip2-compressed input streams causes the process to be killed on OutOfMemory when using the native bzip2 library.

      Our initial analysis suggests that this can be caused by DecompressorStream overriding the close() method, and therefore skipping the line from its parent: CodecPool.returnDecompressor(trackedDecompressor). When the decompressor object is a Bzip2Decompressor, its native end() method is never called, and the allocated memory isn't freed.

      If this analysis is correct, the simplest way to fix this bug would be to replace in.close() with super.close() in DecompressorStream.

      Attachments

        1. Bzip2MemoryTester.java
          0.8 kB
          Eli Acherkan
        2. log4j.properties
          0.3 kB
          Eli Acherkan
        3. HADOOP-14376.001.patch
          10 kB
          Eli Acherkan
        4. HADOOP-14376.002.patch
          12 kB
          Eli Acherkan
        5. HADOOP-14376.003.patch
          12 kB
          Eli Acherkan
        6. HADOOP-14376.004.patch
          12 kB
          Eli Acherkan

        Activity

          People

            eliac Eli Acherkan
            eliac Eli Acherkan
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: