Description
The MIME-type detected by Tika's Detect() API is never added to a Parse's ContentMetaData or ParseMetaData. Because of this bad Content-Types will end up in the documents.
Attachments
Attachments
Issue Links
- is related to
-
NUTCH-1258 MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata
- Closed
- relates to
-
NUTCH-1293 IndexingFiltersChecker to store detected content type in crawldatum metadata
- Closed