Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-79

Mime type detection from file header appears to be failing.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.1-incubating, 0.2
    • 0.3
    • general
    • None

    Description

      Unit tests to test the behavior of AutoDetectParser fail when byte header detection is needed. When correct names of resources and MIME types are passed into the Metadata object, the values below show what was found. Note that some of the document types have null for typeFromHeader:

      typeFromContentTypeHint = application/vnd.ms-excel
      typeFromResourceName = application/vnd.ms-excel
      typeFromHeader = null
      type = application/vnd.ms-excel

      typeFromContentTypeHint = text/html
      typeFromResourceName = text/html
      typeFromHeader = text/html
      type = text/html

      typeFromContentTypeHint = application/vnd.oasis.opendocument.text
      typeFromResourceName = application/vnd.oasis.opendocument.text
      typeFromHeader = application/vnd.oasis.opendocument.text
      type = application/vnd.oasis.opendocument.text

      typeFromContentTypeHint = application/pdf
      typeFromResourceName = application/pdf
      typeFromHeader = application/pdf
      type = application/pdf

      typeFromContentTypeHint = application/vnd.ms-powerpoint
      typeFromResourceName = application/vnd.ms-powerpoint
      typeFromHeader = null
      type = application/vnd.ms-powerpoint

      log4j:WARN No appenders could be found for logger (root).
      log4j:WARN Please initialize the log4j system properly.

      typeFromContentTypeHint = application/rtf
      typeFromResourceName = application/rtf
      typeFromHeader = null
      type = application/rtf

      typeFromContentTypeHint = text/plain
      typeFromResourceName = text/plain
      typeFromHeader = null
      type = text/plain

      typeFromContentTypeHint = application/msword
      typeFromResourceName = application/msword
      typeFromHeader = null
      type = application/msword

      typeFromContentTypeHint = application/xml
      typeFromResourceName = application/xml
      typeFromHeader = null
      type = application/xml

      Attachments

        1. AutoDetectParser.patch
          4 kB
          Keith Bennett

        Activity

          People

            chrismattmann Chris A. Mattmann
            kbennett Keith Bennett
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: