Bug 25767 - Performance issues in JAXPUtils' use of SAXParser & SAXParserFactory
Summary: Performance issues in JAXPUtils' use of SAXParser & SAXParserFactory
Status: NEW
Alias: None
Product: Ant
Classification: Unclassified
Component: Core (show other bugs)
Version: 1.6.0
Hardware: PC Linux
: P3 enhancement with 1 vote (vote)
Target Milestone: ---
Assignee: Ant Notifications List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-12-26 17:51 UTC by Jesse Glick
Modified: 2008-11-24 03:58 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jesse Glick 2003-12-26 17:51:06 UTC
Comments on when JAXPUtils creates new factory / parser / document builder
objects and when it caches them.

1. Correctness:

private static SAXParserFactory parserFactory = null;
public static synchronized SAXParserFactory getParserFactory() {
    if (parserFactory == null) {
        parserFactory = newParserFactory();
    }
    return parserFactory;
}

(and similar methods) is likely not correct. An instance of SAXParserFactory is
not thread-safe and might not be reentrant, according to its Javadoc. If more
than one thread in the JVM is calling this method and using the same parser,
random corruption might result - this might happen to someone using <parallel>,
for example; or someone running several builds at once using an IDE integration.
Or if this method is called in the middle of a parse for some reason - to parse
some referenced file? - there might be corruption; at least SAX1's Parser says
it is not reentrant.

2. Performance: casual examination of the performance of running a tiny build
script w/ several antlib.xml's over and over using a profiler shows some 20-30%
of the CPU time being spent in JAXPUtils.getNamespaceXMLReader(), specifically
calling Xerces' SAXParserImpl.<init>(). Not an especially cheap call since it
sets up a lot of objects of various kinds; you are supposed to cache the result
from call to call if you want good performance and might be parsing a lot of
files. (E.g. a lot of small antlib.xml's, or many little build.xml's with
<subant>, etc.) Note that this time is not spent actually parsing the XML or
even opening the input stream to it - just getting a parser object.

Suggested fix to both problems:

1. Rework the helper methods in JAXPUtils to cache not just the factory objects
but the parsers & document builders themselves. I don't know how expensive
getting the factory is - JAXPUtils already caches them, and my profiling only
took place on a "warmed-up" JVM - but invoking the factories' new*() methods is
too expensive to do every time an XML file is parsed.

2. Leave uncached impls as e.g. SAXParser createSAXParser() and so on - these
would always create new ones. Or for compatibility, keep the current method
names, but deprecate them in favor of properly cached versions (see #3).

3. For cached usage, use e.g. SAXParser obtainSAXParser() and void
releaseSAXParser(SAXParser) methods. These should use java.lang.ThreadLocal:

private static final ThreadLocal/*<SAXParser>*/ saxParser = new ThreadLocal();
public static SAXParser obtainSAXParser() {
    SAXParser p = (SAXParser)saxParser.get();
    if (p != null) {
        saxParser.set(null);
    } else {
        p = createSAXParser();
    }
    return p;
}
public static void releaseSAXParser(SAXParser p) {
    saxParser.set(p);
}

// Usage:
SAXParser p = obtainSAXParser();
try {
   // parse...
} finally {
    releaseSAXParser(p);
}

Forgetting to release a parser is not a big deal, it will just need to be
recreated on the next obtain*() call. Using ThreadLocal ensures that different
threads never share the same object. Setting the ref to null in obtain*()
ensures that reentrant calls (i.e. calling obtain*() while the last-obtained
parser is still in use) will be safe.

One issue is that several utility methods do not return the original cacheable
object, but rather something derived from it, e.g.

return newSAXParser(getNSParserFactory()).getXMLReader();

For this usage you should probably have obtainNamespaceXMLReader +
releaseNamespaceXMLReader, etc. - i.e. keep a separate cache for each kind of
object - since you can't get back to the SAXParser from the XMLReader.

Could try to provide a patch if there is interest - at least modifications to
JAXPUtils, and usage of the new methods in e.g. ProjectHelper2.parse().
Comment 1 Jesse Glick 2003-12-27 03:05:03 UTC
Correction from the JAXP 1.2 spec - for factories it is only the configuration
methods which are not thread-safe; the new*() constructor methods are
thread-safe. So there is probably no correctness error in this regard in
JAXPUtils since the only cached objects are factories. Performance note still holds.