Uploaded image for project: 'Commons Compress'
  1. Commons Compress
  2. COMPRESS-666

Multithreaded access to Tar archive throws java.util.zip.ZipException: Corrupt GZIP trailer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.26.0
    • 1.26.1
    • None
    • None
    • Commons compress 1.26.0 to get a failure. Any tar tgz.

    Description

      Something in https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master seems to make iterating through the tar entries of multiple 
      TarArchiveInputStreams throw Corrupted TAR archive:
       

      @Test
      void bla() {
          ExecutorService executorService = Executors.newFixedThreadPool(10);
          List<CompletableFuture<Void>> tasks = IntStream.range(0, 200)
                  .mapToObj(_idx -> CompletableFuture.runAsync(
                          () -> {
                              try (InputStream inputStream = this.getClass()
                                              .getResourceAsStream(
                                                      "/<your favourite tar tgz>");
                                      TarArchiveInputStream tarInputStream =
                                              new TarArchiveInputStream(new GZIPInputStream(inputStream))) {
                                  TarArchiveEntry tarEntry;
                                  while ((tarEntry = tarInputStream.getNextTarEntry()) != null) {
                                      System.out.println("Reading entry %s with size %d"
                                              .formatted(tarEntry.getName(), tarEntry.getSize()));
                                  }
                              } catch (Exception ex) {
                                  throw new RuntimeException(ex);
                              }
                          },
                          executorService))
                  .toList();
          Futures.getUnchecked(CompletableFuture.allOf(tasks.toArray(new CompletableFuture<?>[0])));
      } 

      Although TarArchiveInputStream is marked as not thread safe, I am not reusing objects here. Those are in fact separate objects, presumably all with their own position tracking info.
       
      The stacktrace here looks like:

      Caused by: java.io.IOException: Corrupted TAR archive.
          at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
          at org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
          at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
          at
      Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 in 'dddddddddddd' len=12
          at org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
          at org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
          at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
          at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
          ... 7 more
       

      That code shows that occasionally the header is wrong (the tar entry name contains gibberish bits) which makes me think that `getNextTarEntry()` can be faulty.
       
      Running that code with commons compress 1.25.0 works as expected. So it's probably something added since November. Note that this is something related to parallelism - using an executor service with a single thread doesn't suffer from the same error. The tgz to decompress doesn't really matter - you can use a manually created one worth a few KBs.

      Attachments

        Issue Links

          Activity

            People

              ggregory Gary D. Gregory
              cosmin79 Cosmin Carabet
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: