Description
We maintain the Bazel (https://bazel.build/) build system, which uses Apache Commons Compress to handle archive extraction. A user reported that a certain sparse tarball always triggers an error (https://github.com/bazelbuild/bazel/issues/20269#issuecomment-1821250607), and the steps to reproduce the error are very simple:
#!/usr/bin/env bash
set -o errexit -o nounset
echo "Downloading commons-compress"
wget https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.25.0/commons-compress-1.25.0.jar
echo "Downloading sample sparse archive"
wget https://github.com/astral-sh/ruff/releases/download/v0.1.6/ruff-aarch64-apple-darwin.tar.gz
gunzip ruff-aarch64-apple-darwin.tar.gz
echo "Testing with system tar"
tar -tf ruff-aarch64-apple-darwin.tar
echo "Testing with commons-compress"
java -jar commons-compress-1.25.0.jar ruff-aarch64-apple-darwin.tar
Output:
Testing with system tar
ruff
Testing with commons-compress
Analysing ruff-aarch64-apple-darwin.tar
Created org.apache.commons.compress.archivers.tar.TarArchiveInputStream@17f052a3
ruff
Exception in thread "main" java.io.IOException: Truncated TAR archive
at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.read(TarArchiveInputStream.java:694)
at org.apache.commons.compress.utils.IOUtils.readFully(IOUtils.java:244)
at org.apache.commons.compress.utils.IOUtils.skip(IOUtils.java:355)
at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:451)
at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextEntry(TarArchiveInputStream.java:426)
at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextEntry(TarArchiveInputStream.java:50)
at org.apache.commons.compress.archivers.Lister.listStream(Lister.java:79)
at org.apache.commons.compress.archivers.Lister.main(Lister.java:133)