Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
What
I discovered that the main body part, holding the text of an email, and already indexed as part of textBody/htmlBody properties, is also indexed as an attachment.
This behaviour is functionally wrong, as it returns attachment hits for terms contained in the body of the message.
It also cause a larger index size, meaning more disk costs, and higher latencies.
Definition of done
Unit tests emonstrating ElasticSearch main bodies are no longer indexed as attachments.
How
Upon turning children subparts into attachment (flattening) only keep mime parts that explicitly have a content-disposition (either inline or attachment).
This by the way avoids indexing multiparts as attachments (they were not filtered out...)
Proposed fix: https://github.com/linagora/james-project/pull/4152