Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3687

Email file detected as text/html

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • None
    • None

    Description

      The attached email (which I redacted from a real email received from Office365) is detected a HTML.

      This is because it contains ARC * headers, but they're not the first one, so the matcher that looks for ARC headers fails, and the matcher for regular 'From' header also fails because the 'From' headers occurs after 1024 characters.

      Attachments

        1. testRFC822-ARC.eml
          6 kB
          Thierry Guérin

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tguerin Thierry Guérin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: