Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.8
-
None
-
None
Description
BodyContentHandler works fine to extract the text from pages,
except this page:
http://www.lucidimagination.com/developers/whitepapers/whats-new-solr-14
there is a selection,
the text returned by BodyContentHandler contains
"...Country: *
– Select a Country – United StatesCanadaArgentinaAustraliaBrazilChinaFranceGermanyIndiaIndonesiaItalyJapanMexicoRussiaSaudi"
to have a space between the country names would be favourable.
Attachments
Issue Links
- duplicates
-
TIKA-394 Missing spaces on html parsing
- Resolved