Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1771

lower magic priority xhtml magic priority to ensure emails detected as message/rfc822

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.11
    • Component/s: detector
    • Labels:
      None

      Description

      Emails I have (happy to share if you want) contain XHTML, as one part of a multipart email. Prior to this pull request, the priority on the application/xhtml+xml magic detector was 50, equal to the priority on the message/rfc822 detector. Because of the relative position of the two detectors in tika-mimetypes.xml, the emails were incorrectly detected as XHTML documents.

      With this PR, by downgrading the priority of application/xhtml+xml to 40, the more-sensitive email magic detectors take precedence, causing the emails to be properly detected as message/rfc822.

      I have not run this thru the govdocs tester or anything other than my own documents, so, full disclosure, this could cause false negative xhtml-detections elsewhere.

      I should note this occurs on trunk, from Github, up-to-date as of Tuesday-ish.

        Activity

        Hide
        chrismattmann Chris A. Mattmann added a comment -

        Thanks Jeremy B. Merrill!

        [chipotle:~/tmp/tika1.11] mattmann% svn commit -m "Fix for TIKA-1771 lower magic priority xhtml magic priority to ensure emails detected as message/rfc822 contributed by Jeremy B. Merrill <jeremy.merrill@nytimes.com> this closes #58."
        Sending        CHANGES.txt
        Sending        tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
        Transmitting file data ..
        Committed revision 1709301.
        [chipotle:~/tmp/tika1.11] mattmann% 
        
        Show
        chrismattmann Chris A. Mattmann added a comment - Thanks Jeremy B. Merrill ! [chipotle:~/tmp/tika1.11] mattmann% svn commit -m "Fix for TIKA-1771 lower magic priority xhtml magic priority to ensure emails detected as message/rfc822 contributed by Jeremy B. Merrill <jeremy.merrill@nytimes.com> this closes #58." Sending CHANGES.txt Sending tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml Transmitting file data .. Committed revision 1709301. [chipotle:~/tmp/tika1.11] mattmann%
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in tika-trunk-jdk1.7 #872 (See https://builds.apache.org/job/tika-trunk-jdk1.7/872/)
        Fix for TIKA-1771 lower magic priority xhtml magic priority to ensure emails detected as message/rfc822 contributed by Jeremy B. Merrill <jeremy.merrill@nytimes.com> this closes #58. (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1709301)

        • trunk/CHANGES.txt
        • trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in tika-trunk-jdk1.7 #872 (See https://builds.apache.org/job/tika-trunk-jdk1.7/872/ ) Fix for TIKA-1771 lower magic priority xhtml magic priority to ensure emails detected as message/rfc822 contributed by Jeremy B. Merrill <jeremy.merrill@nytimes.com> this closes #58. (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1709301 ) trunk/CHANGES.txt trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

          People

          • Assignee:
            chrismattmann Chris A. Mattmann
            Reporter:
            jeremybmerrill Jeremy B. Merrill
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development