Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.18
-
None
-
None
Description
I have a number of email messages that contain other email messages as attachments (with multiple levels of nesting).
The email attachments are parts with "Content-Type: message/rfc822" but are not being recognized as such.
Attached is an example email, with the multiple levels of attachments:
- Subject: Test email within email
- Subject: Email within email test
- Subject: Stand-up today
- Subject: Email within email test
I would like to see 3 separate emails parsed out (top level, 1st level attached email, 2nd level attached email), but I only get 1 email and 1 unnamed text attachment:
$ java -jar tika-app-1.18.jar -m -J nested.eml | python -m json.tool [ { "Author": "Smith Van der, H (Henry) <Henry.Van.der.Smith@bank.com>", "Content-Length": "16649", "Content-Type": "message/rfc822", "Creation-Date": "2018-04-25T12:46:41Z", "Message-From": "Smith Van der, H (Henry) <Henry.Van.der.Smith@bank.com>", "Message-To": [ "fm.SAN Management Team <fm.SANManagementTeam@bank.com>", "Smith Van der, H (Henry) <Henry.Van.der.Smith@bank.com>" ], "Message:From-Email": "Henry.Van.der.Smith@bank.com", "Message:From-Name": "Smith Van der, H (Henry)", "Message:Raw-Header:Auto-Submitted": "auto-generated", "Message:Raw-Header:Content-Transfer-Encoding": "binary", "Message:Raw-Header:Keywords": "", "Message:Raw-Header:MIME-Version": "1.0", "Message:Raw-Header:Message-ID": "<ab2078ea-fd2f-4b28-bc8d-451916369b3c@journal.report.generator>", "Message:Raw-Header:Return-Path": "<>", "Message:Raw-Header:Sender": "<MicrosoftExchange329e71ec88ae4615bbc36ab6ce41109e@bank.com>", "Message:Raw-Header:X-MS-Exchange-Generated-Message-Source": "Journal Agent", "Message:Raw-Header:X-MS-Exchange-Parent-Message-Id": "<0fab98cd190c41f199a25c73f78a2070@BSTS124002.eu.banknet.com>", "Message:Raw-Header:X-MS-Journal-Report": "", "Multipart-Boundary": "_728aa617-16cf-4d95-8bc2-9f1868397202_", "Multipart-Subtype": "mixed", "X-Parsed-By": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.mail.RFC822Parser" ], "X-TIKA:parse_time_millis": "325", "creator": "Smith Van der, H (Henry) <Henry.Van.der.Smith@bank.com>", "dc:creator": "Smith Van der, H (Henry) <Henry.Van.der.Smith@bank.com>", "dc:title": "Test email within email", "dcterms:created": "2018-04-25T12:46:41Z", "meta:author": "Smith Van der, H (Henry) <Henry.Van.der.Smith@bank.com>", "meta:creation-date": "2018-04-25T12:46:41Z", "resourceName": "nested.eml", "subject": "Test email within email" }, { "Content-Encoding": "US-ASCII", "Content-Type": "text/plain; charset=US-ASCII", "Multipart-Boundary": "_004_8075737674787666767166806676697476787366657271727266777_", "Multipart-Subtype": "mixed", "X-Parsed-By": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.txt.TXTParser" ], "X-TIKA:embedded_resource_path": "/embedded-1", "X-TIKA:parse_time_millis": "5", "embeddedResourceType": "ATTACHMENT" } ]
Attachments
Attachments
Issue Links
- is related to
-
TIKA-2685 Email attached to an undeliverable email report are not extracted
- Resolved