Uploaded image for project: 'Xerces2-J'
  1. Xerces2-J
  2. XERCESJ-970

Large comments are extremely slow to parse

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.2.0, 2.2.1, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.6.1, 2.6.2
    • None
    • XNI
    • None
    • Windows XP running Java 1.4.2

    Description

      Very large comments drastically increase the parsing time for both SAX and DOM implementations. Running the sax.Counter and dom.Counter samples with a 410KB file where the entire thing is uncommented results in parse times in the 100ms to 300ms range. However, if I comment out 95% of the file and run the same samples the parse times jump to between 40 and 50 seconds. I ran the same samples using the Aelfred parser shipped with Saxon 7.9 and, while the file with the large comment was slower than without the comment, it jumped by only 100ms or so.

      I briefly compared the code between the two parsers, and they don't look significantly different when it comes to handling comments. The only main difference I noticed was around low/high byte character checks. I suspect it is an inefficiency in the XMLStringBuffer class, but I'm not seeing anything.

      Attachments

        1. comments.txt
          3 kB
          Chris Simmons

        Activity

          People

            Unassigned Unassigned
            trenchguinea Sean Griffin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: