Uploaded image for project: 'Commons Compress'
  1. Commons Compress
  2. COMPRESS-539

TarArchiveInputStream allocates a lot of memory when iterating through an archive

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.20
    • None
    • None
    • None

    Description

       I iterated through the linux source tar and noticed some unneeded allocations happen without extracting any data.

      Reproducing code

      File tarFile = new File("linux-5.7.1.tar");
          try (TarArchiveInputStream in = new TarArchiveInputStream(Files.newInputStream(tarFile.toPath()))) {
              TarArchiveEntry entry;
              while ((entry = in.getNextTarEntry()) != null) {
              }
          }
      

      The measurement was done on Java 11.0.7 with the Java Flight Recorder. Options used: -XX:StartFlightRecording=settings=profile,filename=allocations.jfr

      Baseline with the current master implementation:
      Estimated TLAB allocation: 293MiB

      1. IOUtils.skip -> input.skip(numToSkip)
      This delegates in my test scenario to the InputStream.skip implementation which allocates a new byte[] for every invocation. By simply commenting out the while loop which calls the skip method the estimated TLAB allocation drops to 164MiB (-129MiB).

      Commenting out the skip call does not seem to be the best solution but it was quick for me to see how much memory can be saved. Also no unit tests where failing for me.

      2. TarArchiveInputStream.readRecord
      For every read of the record a new byte[] is created. Since the record size does not change the byte[] can be reused and created when instantiating the TarStream. This optimization is already present in the TarArchiveOutputStream. Reusing the buffer reduces the estimated TLAB allocations further to 128MiB (-36MiB).

      I attached the patches I used so the results can be verified.

      Attachments

        1. Reuse_recordBuffer.patch
          2 kB
          Robin Schimpf
        2. image-2020-07-05-22-32-31-511.png
          14 kB
          Robin Schimpf
        3. image-2020-07-05-22-32-15-131.png
          18 kB
          Robin Schimpf
        4. image-2020-07-05-22-11-25-526.png
          18 kB
          Robin Schimpf
        5. image-2020-07-05-22-10-07-402.png
          15 kB
          Robin Schimpf
        6. image-2020-06-21-10-59-10-825.png
          13 kB
          Robin Schimpf
        7. image-2020-06-21-10-58-43-255.png
          13 kB
          Robin Schimpf
        8. image-2020-06-21-10-58-07-917.png
          13 kB
          Robin Schimpf
        9. Don't_call_InputStream#skip.patch
          0.9 kB
          Robin Schimpf

        Activity

          People

            peterlee Peter Lee
            rschimpf Robin Schimpf
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: