Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1644

Mime type diffs between 1.8 and 1.9-rc1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      When running 1.9-rc1 against govdocs1, I found a few files whose mime-types have changed. I'm posting this now so that others can look...some of these are for the better, and some not.
      For further investigation:

      • embedded pict and wmf are now sometimes identified as pdf (TIKA-1085)
      • several .doc files are now identified as application/x-msmetafile and no text is being extracted
      • several .doc files are now identified as jpeg or png and no text is being extracted
      • several .ppt files which were being identified as various (jpeg, ppt, png, msoffice, word) are now being detected as excel

      Probably for the good:

      • a handful of files that were identified as text are now identified as pdf (TIKA-1085)

      Attachments

        1. mime_diffs_1_8_vs_1_9-rc2.csv
          31 kB
          Tim Allison
        2. mime_diffs.xlsx
          20 kB
          Tim Allison

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: