Uploaded image for project: 'Shindig'
  1. Shindig
  2. SHINDIG-987

NekoParser returns cryptic error messages when parsing bad html

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1-BETA1
    • 1.1-BETA1
    • Java
    • None

    Description

      startImportantElement can throw exceptions when parsing malformed html:

      Given this html:

      <div id="div_super" class="div_super" valign:"middle"></div>

      You get an exception like this:

      org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
      org.apache.xerces.dom.CoreDocumentImpl.createAttribute(Unknown Source)
      org.apache.xerces.dom.ElementImpl.setAttribute(Unknown Source)
      org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser$DocumentHandler.startImportantElement(NekoSimplifiedHtmlParser.java:292)
      org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser$DocumentHandler.startElement(NekoSimplifiedHtmlParser.java:242)
      org.apache.shindig.gadgets.parse.nekohtml.SocialMarkupHtmlParser$SocialMarkupDocumentHandler.startElement(SocialMarkupHtmlParser.java:130)

      Which is caused here:

      for (int i = 0; i < xmlAttributes.getLength(); i++) {
      if (xmlAttributes.getURI != null)

      { element.setAttributeNS(xmlAttributes.getURI(i), xmlAttributes.getQName(i), xmlAttributes.getValue(i)); }

      else

      { element.setAttribute(xmlAttributes.getLocalName(i) , xmlAttributes.getValue(i)); }

      }

      because we're trying to set a tag with a colon in it.

      We should probably add some error checking here so that we can more easily identify the offending HTML without using a debugger.

      Attachments

        1. SHINDIG-987.patch
          6 kB
          Vincent Siveton

        Activity

          People

            Unassigned Unassigned
            plindner Paul Lindner
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: