Bug 44426 - Please make catalog use default instead of an afterthough
Summary: Please make catalog use default instead of an afterthough
Status: NEW
Alias: None
Product: XmlCommons - Now in JIRA
Classification: Unclassified
Component: Resolver (show other bugs)
Version: 1.x
Hardware: Other other
: P2 normal (vote)
Target Milestone: ---
Assignee: Commons Developers Mailing List
URL: http://xerces.apache.org/xerces2-j/fa...
Keywords: ErrorMessage, RFC, Xerces2, XSLTBug
Depends on:
Blocks:
 
Reported: 2008-02-14 17:51 UTC by Ted Guild
Modified: 2010-03-07 16:06 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ted Guild 2008-02-14 17:51:12 UTC
W3C gets an immense amount of DTD traffic with user-agent often only identifying
itself as Python or Java.  

http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

In a number of cases we have heard back from people affected by our automated
blocking indicating they are running Xalan and/or Xerces doing such things as
validating XML or doing XSL transforms.  We have directed some we have been in
correspondence with to your catalog instructions.

http://xerces.apache.org/xerces2-j/faq-xcatalogs.html

The vast majority of Xalan/Xerces installations most likely do not implement
catalogs nor caching of external DTDs and other schemata. It would seem the
resolver does not care about HTTP response codes nor caching directives.

http://www.ietf.org/rfc/rfc2616.txt

Better than a default catalog would be a caching XML Catalog resolver as I
understand is part of Glassfish

http://norman.walsh.name/2007/09/07/treadLightly

There are other Java libraries contributing to this traffic as well. Xalan and
Xerces are widely used, important libraries.  Your assistance in reducing this
excessive traffic to W3C and others hosting standards schemata would be greatly
appreciated.
Comment 1 Michael Glavassevich 2010-03-07 16:06:53 UTC
Ted, I'm not sure what you're suggesting we do. Changing default behaviours has the potential to break many applications. We simply cannot do that.

There are well documented ways for applications to avoid or reduce network access, including the use of XML Catalogs, custom entity resolvers and the grammar caching facilities supported by Xerces and also the JAXP standard. The tools are there. People should be using them.

I believe improving the situation is a matter of education. The more folks you block with a 503 response the more they'll realize that they need to do something and will have to change their application for it to work again.