Details
Description
I’m finding that commons-compress-1.26.1 is recognising a utf-16 text file as a tar archive – unlike the previous version
This is the code that changed in that release in ArchiveStreamFactory - public static String detect(final InputStream in) throws ArchiveException {
that differs in detection:
if (signatureLength >= TAR_HEADER_SIZE) {
try (TarArchiveInputStream inputStream = new TarArchiveInputStream(new ByteArrayInputStream(tarHeader))) {
// COMPRESS-191 - verify the header checksum
// COMPRESS-644 - do not allow zero byte file entries
__ TarArchiveEntry entry = inputStream.getNextEntry();
// try to find the first non-directory entry within the first 10 entries.
__ int count = 0;
while (entry != null && entry.isDirectory() && count++ < TAR_TEST_ENTRY_COUNT) {
entry = inputStream.getNextEntry();
{{ }}}
if (entry != null && entry.isCheckSumOK() && !entry.isDirectory() && entry.getSize() > 0 || count > 0) {
return TAR;
{{ }}}
} catch (final Exception e) { // NOPMD NOSONAR
// can generate IllegalArgumentException as well as IOException auto-detection, simply not a TAR ignored
{{ __ }}}
}
I feel this is being too lenient. For instance at the last “if” statement, for the test file, entry is null and count=1. The code suggests it is looking for the first non-directory entry. It hasn’t found a non-directory entry in our case.
For instance, the earlier code at least checked that the checksum was OK for the one entry it checked (it isn’t for our test file…)
Attachments
Issue Links
- is related to
-
TIKA-4220 Commons-compress too lenient on headless tar detection
- Resolved
- links to