Description
As discussed in TIKA-171, it would be a good idea to make the XHTMLContentHandler automatically add extra whitespace to separate block level elements from each other. This would prevent extracted words to accidentally get concatenated in clients that only care about the character events.
Attachments
Issue Links
- is related to
-
TIKA-189 Text extraction from Excel files juxtaposes cells
- Closed