Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-879

Detection problem: message/rfc822 file is detected as text/plain.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.0, 1.1, 1.2
    • Fix Version/s: 2.0, 1.18
    • Component/s: metadata, mime
    • Labels:
    • Environment:

      linux 3.2.9
      oracle jdk7, openjdk7, sun jdk6

      Description

      When using DefaultDetector mime type for .eml files is different (you can test it on testRFC822 and testRFC822_base64 in tika-parsers/src/test/resources/test-documents/).

      Main reason for such behavior is that only magic detector is really works for such files. Even if you set CONTENT_TYPE in metadata or some .eml file name in RESOURCE_NAME_KEY.

      As I found MediaTypeRegistry.isSpecializationOf("message/rfc822", "text/plain") returns false, so detection by MimeTypes.detect(...) works only by magic.

        Attachments

        1. TIKA-879-thunderbird.eml
          0.7 kB
          Sebastian Nagel
        2. mime_diffs_A_to_B.html
          1 kB
          Tim Allison
        3. mbox_email_section.txt
          2 kB
          Matthew Caruana Galizia

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                grossws Konstantin Gribov
              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: