Tika
  1. Tika
  2. TIKA-913

MagicMime detection of msdos executables does not work

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1
    • Fix Version/s: 1.2
    • Component/s: mime
    • Labels:
    • Environment:

      Linux, JDK 1.6

      Description

      Mime detection does not work as expected (at least from me) in contrast e.g. to sourceforge mime-util detection or "file" utility.
      For example using putty ms-dos executable does result in wrong detections:

      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/putty
      application/octet-stream
      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/putty.jpg
      image/jpeg
      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/putty.exe
      application/x-msdownload

      Its everytime the same binary resource only with different names.
      In contrast using "file" does output:

      krah@sf050:~$ file /tmp/putty
      /tmp/putty: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
      krah@sf050:~$ file /tmp/putty.jpg
      /tmp/putty.jpg: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
      krah@sf050:~$ file /tmp/putty.exe
      /tmp/putty.exe: PE32 executable for MS Windows (GUI) Intel 80386 32-bit

      So magic mime detection should be able to detect that this is actually an executable.

      E.g. for a PDF it does work:

      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/print.pdf
      application/pdf
      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/print
      application/pdf
      krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/print.jpg
      application/pdf

      Here Tika detects what is expected.

        Activity

        Nick Burch made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 1.2 [ 12320169 ]
        Resolution Fixed [ 1 ]
        Torsten Krah created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Torsten Krah
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development