Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2471

Tab-prefixed message body lines in Mbox interpreted as headers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.16
    • None
    • parser

    Description

      The mbox parser code is overly optimistic. It parses the entire message looking for anything that matches a header pattern, wherever it occurs in a line!

      It looks to me like the parsing logic is in desperate need of a refactor. But more to the point, what is the idea behind setting the headers in the MboxParser if they're going to be set by the RFC822Parser in any case?

      Also, out of curiosity, why does the parser force Windows-1252 as the charset?

      Attachments

        1. mbox
          8 kB
          Matthew Caruana Galizia

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mcaruanagalizia Matthew Caruana Galizia
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: