Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3701

ZipDetector on a file should back off to streaming detection on failure to open a zipfile

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.4.0
    • None
    • None

    Description

      If a file is passed to Tika wrapped as a TikaInputStream with an underlying file, the DefaultZipDetector tries to open a ZipFile. If there's a truncated file or if that ZipFile open fails, the DefaultZipDetector effectively gives up.

      Given that there's still a file available, we should try to do a streaming detect by reopening the file as a regular InputStream.

      If we don't do this, we wind up getting different detection for some truncated ooxml if the user sends in a file vs a stream.

      Attachments

        1. Carved-107429888
          171 kB
          Luís Filipe Nassif

        Activity

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: