Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.2
-
None
-
None
-
Operating System: All
Platform: All
-
19253
Description
When parsing HTML code " abc</td><dt>xyz " the HTML parser skips over elements
and concatenates text around them without separating them with white space, in
that case producing abcxyz. Searching resulting index will not be able to find
the abc.
At least for tags <td>, <p>, <br>, <blockquote>, <dt>, <h1> - <h6>, <li>, and
<q> the parser should separate string on both sides of tags with space. Using
square brackets "[", or "]" for separating gthe strings will also work as it is
already used for text in ALT attribute of images.
There is a workaround for this bug to add spaces when authoring HTML code, but
that may not always be done if documents are created by somebody else.