Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.2.0, 2.2.1, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.6.1, 2.6.2
-
None
-
None
-
Windows XP running Java 1.4.2
Description
Very large comments drastically increase the parsing time for both SAX and DOM implementations. Running the sax.Counter and dom.Counter samples with a 410KB file where the entire thing is uncommented results in parse times in the 100ms to 300ms range. However, if I comment out 95% of the file and run the same samples the parse times jump to between 40 and 50 seconds. I ran the same samples using the Aelfred parser shipped with Saxon 7.9 and, while the file with the large comment was slower than without the comment, it jumped by only 100ms or so.
I briefly compared the code between the two parsers, and they don't look significantly different when it comes to handling comments. The only main difference I noticed was around low/high byte character checks. I suspect it is an inefficiency in the XMLStringBuffer class, but I'm not seeing anything.