Details
-
Bug
-
Status: Resolved
-
Resolution: Incomplete
-
2.6.0
-
None
-
None
-
Operating System: Linux
Platform: PC
-
27807
Description
Platforms tested are AIX and Gentoo Linux.
I have a java parser that implements ContentHandler and uses SAXParser
to create a tab-delimited file of a subset of information in an XML file.
My problem is that small percentages of the results from this code are being
beheaded, by which I mean the string that's being returned is a subset of what's
actually in the XML, with characters missing from the front of the string.
My original XML file is 566+ MBs. I have managed to pare this down to about
a 4 MB file, but haven't yet found a way to reproduce the problem on a smaller
file.
The following urls link to the xml file and the two java files used to parse the
xml into the tab-delimited output:
227.xml
https://www.slashtmp.iu.edu/public/download.php?FILE=aarenson/7404E5qOli
BindParserInter.java
https://www.slashtmp.iu.edu/public/download.php?FILE=aarenson/26035zBIzer
BindHandlerInter.java
https://www.slashtmp.iu.edu/public/download.php?FILE=aarenson/897296MWT3R
The following should compile the code and parse the xml:
> javac BindParserInter.java
> javac BindHandlerInter.java
> java BindParserInter 227.xml > 227.txt
The 227.xml file has 227 BIND-Interaction elements. The last one has the
following subelement:
<Org-ref_taxname>Mus musculus</Org-ref_taxname>
After producing the tab-delimited file, the error I'm seeing is that the last
line in the tab-delimited file contains only 'ulus' in the 7th field.