Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
For text extraction, the cli currently provides both --text and --text-main options. For html files, --text will return the body, while --text-main will only return the title. There is currently no cli option that gives all text content. However, the Tika API has the WriteOutContentHandler which does the trick.