Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-4248

Improve PST handling of attachments

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The PST parser doesn't handle attachments in quite the same way as other parsers which hinders analysis of attachments.

      The problem is that the PST parser handles the text content of an email and the embedded attachments. And, the PST parser processes attachments before the main body. These two features make the normal patterns for embedded attachments break down in the RecursiveParserWrapper. For example, when the attachments are being processed, the RecursiveParserWrapper can't figure out what the path will be through the "body" because that hasn't been parsed yet.

      We should probably create a PSTMailItemParser that handles the content and the attachments like other parsers so that embedded paths can be maintained.

      This will be a breaking change, and I'm targeting it only to the 3.x branch.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: