Description
startImportantElement can throw exceptions when parsing malformed html:
Given this html:
<div id="div_super" class="div_super" valign:"middle"></div>
You get an exception like this:
org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
org.apache.xerces.dom.CoreDocumentImpl.createAttribute(Unknown Source)
org.apache.xerces.dom.ElementImpl.setAttribute(Unknown Source)
org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser$DocumentHandler.startImportantElement(NekoSimplifiedHtmlParser.java:292)
org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser$DocumentHandler.startElement(NekoSimplifiedHtmlParser.java:242)
org.apache.shindig.gadgets.parse.nekohtml.SocialMarkupHtmlParser$SocialMarkupDocumentHandler.startElement(SocialMarkupHtmlParser.java:130)
Which is caused here:
for (int i = 0; i < xmlAttributes.getLength(); i++) {
if (xmlAttributes.getURI != null)
else
{ element.setAttribute(xmlAttributes.getLocalName(i) , xmlAttributes.getValue(i)); }}
because we're trying to set a tag with a colon in it.
We should probably add some error checking here so that we can more easily identify the offending HTML without using a debugger.