Uploaded image for project: 'Sling'
  1. Sling
  2. SLING-6783

Updates for Commons HTML

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Done
    • None
    • Commons HTML 1.0.2
    • Commons
    • None

    Description

      Following updates:

      Updated tagsoup lib to 1.2.1 which has the following modifications

      • DOCTYPE is now recognized even in lower case.
      • We make sure to buffer the reader, eliminating a long-standing bug that would crash on certain inputs, such as & followed by CR+LF.
      • The HTML scanner's table is precompiled at run time for efficiency, causing a 4x speedup on large input documents.
      • ]] within a CDATA section no longer causes input to be discarded.
      • Remove bogus newline after printing children of the root element.
      • Allow the noscript element anywhere, the same as the script element.
      • Updated to the 2011 edition of the W3C character entity list.

      Additionally:
      Updated license with new home page for tagsoup
      Updated annotations to OSGi annotations
      Added the ability to specify additional features/properties for the parser
      Documented available settings
      Javadoc fixed
      Prepared for different parsers by renaming HtmlParserImpl and adding component properties
      Configuration improved

      Attachments

        1. sling.patch
          8 kB
          Jason E Bailey

        Issue Links

          Activity

            People

              olli Oliver Lietz
              jebailey Jason E Bailey
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: