Uploaded image for project: 'Commons Compress'
  1. Commons Compress
  2. COMPRESS-164

Cannot Read Winzip Archives With Unicode Extra Fields

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3
    • 1.4
    • Archivers
    • None
    • Windows 7, Oracle JDK 6

    Description

      I have a zip file created with WinZip containing Unicode extra fields. Upon attempting to extract it with org.apache.commons.compress.archivers.zip.ZipFile, ZipFile.getInputStream() returns null for ZipArchiveEntries previously retrieved with ZipFile.getEntry() or even ZipFile.getEntries(). See UTF8ZipFilesTest.patch in the attachments for a test case exposing the bug. The original test case stopped short of trying to read the entries, that's why this wasn't flagged up before.

      The problem lies in the fact that inside ZipFile.java entries are stored in a HashMap. However, at one point after populating the HashMap, the unicode extra fields are read, which leads to a change of the ZipArchiveEntry name, and therefore a change of its hash code. Because of this, subsequent gets on the HashMap fail to retrieve the original values.

      ZipFile.patch contains an (admittedly simple-minded) fix for this problem by reconstructing the entries HashMap after the Unicode extra fields have been parsed. The purpose of this patch is mainly to show that the problem is indeed what I think, rather than providing a well-designed solution.

      The patches have been tested against revision 1210416.

      Attachments

        1. ZipFile.patch
          1 kB
          Volker Leidl
        2. UTF8ZipFilesTest.patch
          2 kB
          Volker Leidl

        Activity

          People

            Unassigned Unassigned
            vleidl Volker Leidl
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: