Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-879

Detection problem: message/rfc822 file is detected as text/plain.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.0, 1.1, 1.2
    • 1.18, 2.0.0
    • metadata, mime
    • linux 3.2.9
      oracle jdk7, openjdk7, sun jdk6

    Description

      When using DefaultDetector mime type for .eml files is different (you can test it on testRFC822 and testRFC822_base64 in tika-parsers/src/test/resources/test-documents/).

      Main reason for such behavior is that only magic detector is really works for such files. Even if you set CONTENT_TYPE in metadata or some .eml file name in RESOURCE_NAME_KEY.

      As I found MediaTypeRegistry.isSpecializationOf("message/rfc822", "text/plain") returns false, so detection by MimeTypes.detect(...) works only by magic.

      Attachments

        1. TIKA-879-thunderbird.eml
          0.7 kB
          Sebastian Nagel
        2. mime_diffs_A_to_B.html
          1 kB
          Tim Allison
        3. mbox_email_section.txt
          2 kB
          Matthew Caruana Galizia

        Issue Links

          Activity

            People

              Unassigned Unassigned
              grossws Konstantin Gribov
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: