Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3106

Tika Fails to detect some EML files if extension is not .eml

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.24
    • None
    • metadata, mime
    • None

    Description

      I have an eml file that can be detected as message/rfc822 only if the file extension is .eml,  otherwise it will be detected as text/plain.  Following is the code that I use to detect the file type and extension.

             TikaConfig config = TikaConfigFactory.getTikaConfig();

             Detector detector = config.getDetector();

             Metadata metadata = new Metadata();

             TikaInputStream stream = TikaInputStream.get(fis = new FileInputStream(filePath));

             metadata.add(Metadata.RESOURCE_NAME_KEY, filePath);

             MediaType mediaType = detector.detect(stream, metadata);

             MimeType mimeType = config.getMimeRepository().forName(mediaType.toString());

             String tikaExtension = mimeType.getExtension();

       

      When the sample file has .eml extension,  mimeType is message/rfc822 and  tikaExtension is eml. When I change the extension to .txt, mimeType is text/plain and  tikaExtension is .txt.

       

      The same mimeType and tikaExtension should be detected regardless the file extension. 

       

       

       

       

      Attachments

        1. EmlFile.txt
          18 kB
          Xiaohong Yang

        Activity

          People

            Unassigned Unassigned
            xyang200 Xiaohong Yang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: