[COMPRESS-674] commons-compress-1.26.1 false positive on detecting archive - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.26.1
Fix Version/s: 1.26.2
Component/s: Archivers
Labels:
None
Environment:

Intel running macOS Sonoma - but doubt this is significant

Description

I’m finding that commons-compress-1.26.1 is recognising a utf-16 text file as a tar archive – unlike the previous version

This is the code that changed in that release in ArchiveStreamFactory - public static String detect(final InputStream in) throws ArchiveException {

that differs in detection:

if (signatureLength >= TAR_HEADER_SIZE) {
    try (TarArchiveInputStream inputStream = new TarArchiveInputStream(new ByteArrayInputStream(tarHeader))) {
        // COMPRESS-191 - verify the header checksum
        // COMPRESS-644 - do not allow zero byte file entries
        __        TarArchiveEntry entry = inputStream.getNextEntry();
        // try to find the first non-directory entry within the first 10 entries.
        __        int count = 0;
        while (entry != null && entry.isDirectory() && count++ < TAR_TEST_ENTRY_COUNT) {
            entry = inputStream.getNextEntry();
{{        }}}
        if (entry != null && entry.isCheckSumOK() && !entry.isDirectory() && entry.getSize() > 0 || count > 0) {
            return TAR;
{{        }}}
    } catch (final Exception e) { // NOPMD NOSONAR
        // can generate IllegalArgumentException as well as IOException auto-detection, simply not a TAR ignored
{{    __    }}}
}

I feel this is being too lenient. For instance at the last “if” statement, for the test file, entry is null and count=1. The code suggests it is looking for the first non-directory entry. It hasn’t found a non-directory entry in our case.

For instance, the earlier code at least checked that the checksum was OK for the one entry it checked (it isn’t for our test file…)

Attachments

Issue Links

is related to

TIKA-4220 Commons-compress too lenient on headless tar detection

Resolved

links to

GitHub Pull Request #500

Activity

People

Assignee:: Unassigned

Reporter:: Gren Elliot

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Mar/24 22:12

Updated:: 23/Mar/24 18:30

Resolved:: 21/Mar/24 01:08