Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3556

DefaultZipContainerDetector returns application/zip for .odt files when OPCPackageDetector is on the classpath

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.1.0
    • None
    • detector
    • None

    Description

      This is happening because the OPCPackageDetector.detect method will fail and close the underlying zip stream. When the next detector runs (e.g. OpenDocumentDetector), the stream it receives has been closed and it won't be able to detect anything.

      After all detectors have effectively no-oped, the DefaultZipContainerDetector falls back to application/zip.

      Now, when running with the default CompositeDetector, the next detector is usually the MimeTypes detector. This returns the proper application/vnd.oasis.opendocument.text, but the CompositeDetector will ignore it as that mime type isn't marked up as a subclass of application/zip in the registry.

       

      In short, I think there are two bugs here potentially:

      1. The OPCPacakageDetector either shouldn't close the zip while detecting or the DefaultZipContainerDetector should re-open if necessary?
      2. The registry should be updated to mark up application/vnd.oasis.opendocument.text as a subclass of application/zip ?

      Attachments

        Activity

          People

            Unassigned Unassigned
            gaeremyncks Simon Gaeremynck
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: