The MIME-type detected by Tika's Detect() API is never added to a Parse's ContentMetaData or ParseMetaData. Because of this bad Content-Types will end up in the documents.
- is related to
-
NUTCH-1258 MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata
-
- Closed
-
- relates to
-
NUTCH-1293 IndexingFiltersChecker to store detected content type in crawldatum metadata
-
- Closed
-