Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-913

MagicMime detection of msdos executables does not work

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1
    • 1.2
    • mime
    • Linux, JDK 1.6

    Description

      Mime detection does not work as expected (at least from me) in contrast e.g. to sourceforge mime-util detection or "file" utility.
      For example using putty ms-dos executable does result in wrong detections:

      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/putty
      application/octet-stream
      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/putty.jpg
      image/jpeg
      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/putty.exe
      application/x-msdownload

      Its everytime the same binary resource only with different names.
      In contrast using "file" does output:

      krah@sf050:~$ file /tmp/putty
      /tmp/putty: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
      krah@sf050:~$ file /tmp/putty.jpg
      /tmp/putty.jpg: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
      krah@sf050:~$ file /tmp/putty.exe
      /tmp/putty.exe: PE32 executable for MS Windows (GUI) Intel 80386 32-bit

      So magic mime detection should be able to detect that this is actually an executable.

      E.g. for a PDF it does work:

      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/print.pdf
      application/pdf
      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/print
      application/pdf
      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/print.jpg
      application/pdf

      Here Tika detects what is expected.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tkrah Torsten Krah
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: