Affects Version/s: 1.4.1
Fix Version/s: None
Component/s: Schema and Analysis
Some kinds of 'bad' HTML are missed by HTMLStripCharFilter. For example, the following invalid HTML:
Is filtered to:
I understand the challenge here, without the end > it's tough to know what to do. It turns out that real-world web pages are full of this kind of garbage HTML, and browsers (impressively!) seem to handle this quite gracefully.
Plus, users in my app can search for 'href' and find lots of matches (that don't appear to contain 'href') as a result.