Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.4.0
-
None
Description
When tika is disabled, the DefaultTextExtract is used, which does not perform html text extraction.
This results in decreased precision in search in such situation (index being polluted by html) and of course results in a massive index size.
Proposal:
CassandraGuice should default to JsoupTextExtractor when tika is disabled.
This will allow html text extraction to actually happen.