[COMPRESS-164] Cannot Read Winzip Archives With Unicode Extra Fields - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3
Fix Version/s: 1.4
Component/s: Archivers
Labels:
None
Environment:

Windows 7, Oracle JDK 6

Description

I have a zip file created with WinZip containing Unicode extra fields. Upon attempting to extract it with org.apache.commons.compress.archivers.zip.ZipFile, ZipFile.getInputStream() returns null for ZipArchiveEntries previously retrieved with ZipFile.getEntry() or even ZipFile.getEntries(). See UTF8ZipFilesTest.patch in the attachments for a test case exposing the bug. The original test case stopped short of trying to read the entries, that's why this wasn't flagged up before.

The problem lies in the fact that inside ZipFile.java entries are stored in a HashMap. However, at one point after populating the HashMap, the unicode extra fields are read, which leads to a change of the ZipArchiveEntry name, and therefore a change of its hash code. Because of this, subsequent gets on the HashMap fail to retrieve the original values.

ZipFile.patch contains an (admittedly simple-minded) fix for this problem by reconstructing the entries HashMap after the Unicode extra fields have been parsed. The purpose of this patch is mainly to show that the problem is indeed what I think, rather than providing a well-designed solution.

The patches have been tested against revision 1210416.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ZipFile.patch
05/Dec/11 12:32
1 kB
Volker Leidl
UTF8ZipFilesTest.patch
05/Dec/11 12:31
2 kB
Volker Leidl

Activity

People

Assignee:: Unassigned

Reporter:: Volker Leidl

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 05/Dec/11 12:30

Updated:: 05/Dec/11 16:33

Resolved:: 05/Dec/11 15:43