Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.18
-
None
-
None
Description
I have a number of email messages that are reports of deliverable emails that contain the original email message as attachment.
The original emails are parts with "Content-Type: message/rfc822" but are not being recognized as such.
Attached is an example email:
- Subject: Undeliverable: SRE Agent Out of Space Source:WindowsApp
- Subject: Subject: SRE Agent Out of Space Source:WindowsApp
I would like to see 2 separate emails parsed out (top level undeliverable report, 1st level attached original email), but I get 1 email and 2 unnamed text attachments:
$ java -jar tika-app-1.18.jar -m -J /tmp/undeliverable.eml | python -m json.tool [ { "Author": "postmaster@bank.com", "Content-Length": "17356", "Content-Type": "message/rfc822", "Creation-Date": "2017-11-04T16:00:11Z", "Message-From": "postmaster@bank.com", "Message-To": "UATAlerting@logscape.com", "Message:From-Email": "postmaster@bank.com", "Message:Raw-Header:Auto-Submitted": "auto-generated", "Message:Raw-Header:MIME-Version": "1.0", "Message:Raw-Header:Message-ID": "<936a3c2c-49e5-46a0-b58c-151c024b80fe@journal.report.generator>", "Message:Raw-Header:Return-Path": "<>", "Message:Raw-Header:Sender": "<MicrosoftExchange329e71ec88ae4615bbc36ab6ce41109e@bank.com>", "Message:Raw-Header:X-MS-Exchange-Generated-Message-Source": "Journal Agent", "Message:Raw-Header:X-MS-Exchange-Message-Is-Ndr": "", "Message:Raw-Header:X-MS-Exchange-Parent-Message-Id": "\t<1451b918-770a-4d83-b1f9-0c9c0668f1d6@BXTS124020.eu.banknet.com>", "Message:Raw-Header:X-MS-Journal-Report": "", "Multipart-Boundary": "_5a8d7320-7cd6-4c1b-8e30-9616634562b2_", "Multipart-Subtype": "mixed", "X-Parsed-By": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.mail.RFC822Parser" ], "X-TIKA:parse_time_millis": "326", "creator": "postmaster@bank.com", "dc:creator": "postmaster@bank.com", "dc:title": "Undeliverable: SRE Agent Out of Space Source:WindowsApp", "dcterms:created": "2017-11-04T16:00:11Z", "meta:author": "postmaster@bank.com", "meta:creation-date": "2017-11-04T16:00:11Z", "resourceName": "undeliverable.eml", "subject": "Undeliverable: SRE Agent Out of Space Source:WindowsApp" }, { "Content-Encoding": "windows-1252", "Content-Type": "text/plain; charset=windows-1252", "Multipart-Boundary": "_dd8c2c7d-5333-4f9a-a282-d2056075e7aa_", "Multipart-Subtype": "report", "X-Parsed-By": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.txt.TXTParser" ], "X-TIKA:embedded_resource_path": "/embedded-1", "X-TIKA:parse_time_millis": "4", "embeddedResourceType": "ATTACHMENT" }, { "Content-Encoding": "US-ASCII", "Content-Type": "text/html; charset=US-ASCII", "Multipart-Boundary": "_dd8c2c7d-5333-4f9a-a282-d2056075e7aa_", "Multipart-Subtype": "report", "X-Parsed-By": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.html.HtmlParser" ], "X-TIKA:embedded_resource_path": "/embedded-2", "X-TIKA:parse_time_millis": "7", "embeddedResourceType": "ATTACHMENT" } ]
Attachments
Attachments
Issue Links
- relates to
-
TIKA-2680 Email attachments to an email are not extracted
- Resolved