Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.4
-
None
-
None
-
Patch Available
Description
Similar or duplicating content-types can end-up differently in an index. With, for example, both application/xhtml+xml and text/html it is impossible to use a single filter to select `web pages`.
See also: http://lucene.472066.n3.nabble.com/application-xhtml-xml-gt-text-html-td3699942.html
Content-Type mapping is disabled by default and is enabled via moreIndexingFilter.mapMimeTypes. Example mapping file is provided in conf/.
# target MIME-type <TAB> type1 [<TAB> type2 ...]
# Map XHTML to HTML
text/html application/xhtml+xml
# Map XHTML and HTML to something else
Web page text/html application/xhtml+xml
# Map some office documents to each other
Office document application/vnd.oasis.opendocument.text application/x-tika-msoffice