Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-461

RFC822 messages not parsed

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7
    • 0.10
    • parser
    • None

    Description

      Presented with an RFC822 message exported from Thunderbird, AutodetectParser produces an empty body, and a Metadata containing only one key-value pair: "Content-Type=message/rfc822". Directly calling MboxParser likewise gives an empty body, but with two metadata pairs: "Content-Encoding=us-ascii Content-Type=application/mbox".

      A quick peek at the source of MboxParser shows that the implementation is pretty naive. If the wiring can be sorted out, something like Apache James' mime4j might be a better bet.

      Attachments

        1. TIKA-461.patch
          13 kB
          Julien Nioche
        2. testRFC822-multipart
          7 kB
          Julien Nioche
        3. TIKA-461-plus-tests-1.patch
          25 kB
          Benjamin Douglas
        4. TIKA-461-parse.patch
          30 kB
          Benjamin Douglas
        5. TIKA-461-config.patch
          1 kB
          Benjamin Douglas
        6. testRFC822-CC-BCC
          4 kB
          Sjoerd Smeets
        7. testRFC822-big
          6 kB
          Sjoerd Smeets
        8. extra_metadata.patch
          9 kB
          Sjoerd Smeets

        Activity

          People

            jnioche Julien Nioche
            jturner Joshua Turner
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: