|
[
Permlink
| « Hide
]
stack added a comment - 13/Oct/05 09:19 AM
Attached patch runs all xml text through a check for bad xml characters. This patch is brutal dropping silently illegal characters. Patch was made after hunting xalan, jdk, and nutch itself for a method that would do the above filtering but was unable to find any such method – perhaps an oversight on my part?
Patch version 2. This patch benefits from discussion held up on nutch dev list. This patch differs from the first in that it handles ALL illegal XML characters, entity encoding the 5 'special characters' AND (silently) dropping characters outside the xml legal range of characters. The previous patch just did the latter task letting the configured transformer/DOM Serializer handle entity escaping.
This patch also differs from patch version 1 in that it moves the method that processes the xml out into util.StringUtil: The assumption being that not only OpenSearchServlet needs to make text safe to include in xml. The core method, StringUtil#toValidXmlText, was authored by Dawid Weiss and was taken from carrot2 XMLSerializerHelper. Below is except from mail up on nutch dev where he grants permission to copy toValidXmlText. Message-ID: <434F5368.6040202@cs.put.poznan.pl> ... > So, will I amend the patch in Copy the method's contents. It doesn't really make sense to copy the D. Scrub
Use the original patch, fixIllegalXmlChars.patch, to address the problem described in this issue. Since original patch didn't cleanly apply for me on 0.8-dev (nightly-2006-05-20) I re-did it for 0.8 ...
With this patch the XML is fine. Without I had big trouble parsing the RSS-feed in another application. Stefan's patch didn't apply cleanly for me on svn revision 413155 so I re-did it.
This patch fixes the illegal XML characters and prevents opensearch clients from choking on that bad XML previously emitted. This patch process the String twice if it contains some illegal characters!
Version of patch that doesn't "...process the String twice if it contains some illegal characters!". Its name is fixIllegalXmlChars08-v3.patch (Be careful, its not the last patch in the list). It was made against 414852.
At least 3 different people have run into this awkward issue going by the comments in this issue. I petition that is sufficent to earn a commit. Thanks. in method addAttribute(...)
line: intentional? I just committed this with small changes (moved test to a test case) thanks.
closing issues for released versions
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||