Description
On TIKA-2791, we added extraction of structure tags. If there's a parse failure on Tika's xhtml, we initially backed off to treat the full xhtml as if it were a string of text that happened to include markup.
It would be better to back off to the html parser so that content comparisons can still work accurately even if there is a tag failure: <b><i></b></i>