Bug 44426

Summary: Please make catalog use default instead of an afterthough
Product: XmlCommons - Now in JIRA Reporter: Ted Guild <ted>
Component: ResolverAssignee: Commons Developers Mailing List <commons-dev>
Status: NEW ---    
Severity: normal Keywords: ErrorMessage, RFC, Xerces2, XSLTBug
Priority: P2    
Version: 1.x   
Target Milestone: ---   
Hardware: Other   
OS: other   
URL: http://xerces.apache.org/xerces2-j/faq-xcatalogs.html

Description Ted Guild 2008-02-14 17:51:12 UTC
W3C gets an immense amount of DTD traffic with user-agent often only identifying
itself as Python or Java.  

http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

In a number of cases we have heard back from people affected by our automated
blocking indicating they are running Xalan and/or Xerces doing such things as
validating XML or doing XSL transforms.  We have directed some we have been in
correspondence with to your catalog instructions.

http://xerces.apache.org/xerces2-j/faq-xcatalogs.html

The vast majority of Xalan/Xerces installations most likely do not implement
catalogs nor caching of external DTDs and other schemata. It would seem the
resolver does not care about HTTP response codes nor caching directives.

http://www.ietf.org/rfc/rfc2616.txt

Better than a default catalog would be a caching XML Catalog resolver as I
understand is part of Glassfish

http://norman.walsh.name/2007/09/07/treadLightly

There are other Java libraries contributing to this traffic as well. Xalan and
Xerces are widely used, important libraries.  Your assistance in reducing this
excessive traffic to W3C and others hosting standards schemata would be greatly
appreciated.
Comment 1 Michael Glavassevich 2010-03-07 16:06:53 UTC
Ted, I'm not sure what you're suggesting we do. Changing default behaviours has the potential to break many applications. We simply cannot do that.

There are well documented ways for applications to avoid or reduce network access, including the use of XML Catalogs, custom entity resolvers and the grammar caching facilities supported by Xerces and also the JAXP standard. The tools are there. People should be using them.

I believe improving the situation is a matter of education. The more folks you block with a 503 response the more they'll realize that they need to do something and will have to change their application for it to work again.