Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3106

Tika Fails to detect some EML files if extension is not .eml

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.24
    • Fix Version/s: None
    • Component/s: metadata, mime
    • Labels:
      None

      Description

      I have an eml file that can be detected as message/rfc822 only if the file extension is .eml,  otherwise it will be detected as text/plain.  Following is the code that I use to detect the file type and extension.

             TikaConfig config = TikaConfigFactory.getTikaConfig();

             Detector detector = config.getDetector();

             Metadata metadata = new Metadata();

             TikaInputStream stream = TikaInputStream.get(fis = new FileInputStream(filePath));

             metadata.add(Metadata.RESOURCE_NAME_KEY, filePath);

             MediaType mediaType = detector.detect(stream, metadata);

             MimeType mimeType = config.getMimeRepository().forName(mediaType.toString());

             String tikaExtension = mimeType.getExtension();

       

      When the sample file has .eml extension,  mimeType is message/rfc822 and  tikaExtension is eml. When I change the extension to .txt, mimeType is text/plain and  tikaExtension is .txt.

       

      The same mimeType and tikaExtension should be detected regardless the file extension. 

       

       

       

       

        Attachments

        1. EmlFile.txt
          18 kB
          Xiaohong Yang

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              xyang200 Xiaohong Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: