Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
1.20
-
None
-
None
-
None
Description
I iterated through the linux source tar and noticed some unneeded allocations happen without extracting any data.
Reproducing code
File tarFile = new File("linux-5.7.1.tar"); try (TarArchiveInputStream in = new TarArchiveInputStream(Files.newInputStream(tarFile.toPath()))) { TarArchiveEntry entry; while ((entry = in.getNextTarEntry()) != null) { } }
The measurement was done on Java 11.0.7 with the Java Flight Recorder. Options used: -XX:StartFlightRecording=settings=profile,filename=allocations.jfr
Baseline with the current master implementation:
Estimated TLAB allocation: 293MiB
1. IOUtils.skip -> input.skip(numToSkip)
This delegates in my test scenario to the InputStream.skip implementation which allocates a new byte[] for every invocation. By simply commenting out the while loop which calls the skip method the estimated TLAB allocation drops to 164MiB (-129MiB).
Commenting out the skip call does not seem to be the best solution but it was quick for me to see how much memory can be saved. Also no unit tests where failing for me.
2. TarArchiveInputStream.readRecord
For every read of the record a new byte[] is created. Since the record size does not change the byte[] can be reused and created when instantiating the TarStream. This optimization is already present in the TarArchiveOutputStream. Reusing the buffer reduces the estimated TLAB allocations further to 128MiB (-36MiB).
I attached the patches I used so the results can be verified.