Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1771

lower magic priority xhtml magic priority to ensure emails detected as message/rfc822

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.11
    • detector
    • None

    Description

      Emails I have (happy to share if you want) contain XHTML, as one part of a multipart email. Prior to this pull request, the priority on the application/xhtml+xml magic detector was 50, equal to the priority on the message/rfc822 detector. Because of the relative position of the two detectors in tika-mimetypes.xml, the emails were incorrectly detected as XHTML documents.

      With this PR, by downgrading the priority of application/xhtml+xml to 40, the more-sensitive email magic detectors take precedence, causing the emails to be properly detected as message/rfc822.

      I have not run this thru the govdocs tester or anything other than my own documents, so, full disclosure, this could cause false negative xhtml-detections elsewhere.

      I should note this occurs on trunk, from Github, up-to-date as of Tuesday-ish.

      Attachments

        Activity

          People

            chrismattmann Chris A. Mattmann
            jeremybmerrill Jeremy B. Merrill
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: