Uploaded image for project: 'Commons Compress'
  1. Commons Compress
  2. COMPRESS-679

Regression on parallel processing of 7zip files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.26.0, 1.26.1
    • 1.26.2
    • None
    • None

    Description

      I've run into a bug which occurs when attempting to read a 7zip file in several threads simultaneously.  The following code illustrates the problem. The file.7z is in attachment

       

      import java.io.InputStream;
      import java.nio.file.Paths;
      import java.util.stream.IntStream;
      import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
      import org.apache.commons.compress.archivers.sevenz.SevenZFile;
      public class TestZip {
          public static void main(final String[] args) {
              final Runnable runnable = () -> {
                  try {
                      try (final SevenZFile sevenZFile = SevenZFile.builder().setPath(Paths.get("file.7z")).get()) {
                          SevenZArchiveEntry sevenZArchiveEntry;
                          while ((sevenZArchiveEntry = sevenZFile.getNextEntry()) != null) {
                              if ("file4.txt".equals(sevenZArchiveEntry.getName())) { // The entry must not be the first of the ZIP archive to reproduce
                                  final InputStream inputStream = sevenZFile.getInputStream(sevenZArchiveEntry);
                                  // treatments...
                                  break;
                              }
                          }
                      }
                  } catch (final Exception e) { // java.io.IOException: Checksum verification failed
                      e.printStackTrace();
                  }
              };
              IntStream.range(0, 30).forEach(i -> new Thread(runnable).start());
          }
      }
      

      Below is the output I receive on version 1.26: 

       

      java.io.IOException: Checksum verification failed
        at org.apache.commons.compress.utils.ChecksumVerifyingInputStream.verify(ChecksumVerifyingInputStream.java:98)
        at org.apache.commons.compress.utils.ChecksumVerifyingInputStream.read(ChecksumVerifyingInputStream.java:92)
        at org.apache.commons.io.IOUtils.skip(IOUtils.java:2422)
        at org.apache.commons.io.IOUtils.skip(IOUtils.java:2380)
        at org.apache.commons.compress.archivers.sevenz.SevenZFile.getCurrentStream(SevenZFile.java:912)
        at org.apache.commons.compress.archivers.sevenz.SevenZFile.getInputStream(SevenZFile.java:988)
        at com.infotel.arcsys.nativ.archiving.zip.TestZip.lambda$main$0(TestZip.java:21)
        at java.base/java.lang.Thread.run(Thread.java:833)
       
      

      The issue seems to arise from the transition from version 1.25 to 1.26 of Apache Commons Compress. In the SevenZFile class of the library, the private method getCurrentStream has migrated from IOUtils.skip(InputStream, long) to a method with a same signature but in Commons-IO package, which leads to a change in behavior. In version 1.26, it uses a shared and unsynchronized buffer, theoretically intended only for writing (SCRATCH_BYTE_BUFFER_WO). This causes checksum verification issues within the library. The problem seems to be resolved by specifying the Supplier of the buffer to use.

      try (InputStream stream = deferredBlockStreams.remove(0)) {
          org.apache.commons.io.IOUtils.skip(stream, Long.MAX_VALUE, () -> new byte[org.apache.commons.io.IOUtils.DEFAULT_BUFFER_SIZE]);
      } 

      Attachments

        1. file.7z
          0.2 kB
          Mikaël MECHOULAM

        Activity

          People

            ggregory Gary D. Gregory
            mikael_mechoulam Mikaël MECHOULAM
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: