Uploaded image for project: 'Xerces2-J'
  1. Xerces2-J
  2. XERCESJ-1016

SAXParseException when delimiter at end of buffer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.6.2
    • None
    • SAX
    • None
    • AIX 5.2 and Windows XP

    Description

      XML documents which parse successfully with a Xerces1 parser will not parse with Xerces 2.3.0 and Xerces 2.6.2 unless the DEFAULT_BUFFER_SIZE in the org.apache.xerces.impl.XMLEntityManager is increased. A SAXParseException "XML document structures must start and end within the same entity" is thrown. The row and column in the exception always point to the end of the document.

      The problem can be recreated with the following Foo class. If a character is removed (e.g. the last '\n' in the XML document in constructDocumentString()), the document will parse successfully. If the last processing instruction fits into the 2048K buffer with the exception of (1) the delimiter "?>" or (2) part of the delimiter ">", a SAXParseException is thrown. Here are excerpts from the 'debug buffer' traces:

      (1) the delimiter "?>" not included in the 2048 buffer
      (scanName:
      )scanName: -> sanchez
      (skipSpaces:
      )skipSpaces: -> true
      (scanData:
      )scanData: -> false
      (scanData:
      (load, 0:
      length to try to read: 2048
      length actually read: 2
      )load, 0:
      (load, 2:
      length to try to read: 2046
      length actually read: -1
      )load, 2:
      (load, 0:
      length to try to read: 2048
      length actually read: -1
      [Fatal Error] :99:46: XML document structures must start and end within the same entity.

      (2) part of the delimiter ">" not included in the 2048 buffer
      (scanName:
      )scanName: -> sanchez
      (skipSpaces:
      )skipSpaces: -> true
      (scanData:
      )scanData: -> false
      (scanData:
      (load, 1:
      length to try to read: 2047
      length actually read: 1
      )load, 1:
      (load, 2:
      length to try to read: 2046
      length actually read: -1
      )load, 2:
      (load, 0:
      length to try to read: 2048
      length actually read: -1
      [Fatal Error] :55:46: XML document structures must start and end within the same entity.

      Changing the following code in the scanData method in the org.apache.xerces.impl.XMLEntityScanner class FROM:

      if (fCurrentEntity.position >= fCurrentEntity.count - delimLen) {

      TO:
      if (fCurrentEntity.position >= fCurrentEntity.count - delimLen
      && fCurrentEntity.ch[fCurrentEntity.position] != charAt0 && fCurrentEntity.count != delimLen) {

      seemed to have resolved this problem.

      Foo class:

      import java.io.*;
      import javax.xml.parsers.*;
      import java.net.*;
      import org.w3c.dom.*;
      import org.xml.sax.*;
      import org.apache.xml.serialize.*;

      Foo Class:

      /**

      • @author KOwen
        *
      • To change this generated comment edit the template variable "typecomment":
      • Window>Preferences>Java>Templates.
      • To enable and disable the creation of type comments go to
      • Window>Preferences>Java>Code Generation.
        */
        public class Foo {

      public static void main(String[] args) {
      Foo foo = new Foo();

      try

      { foo.parseDocumentString(); }

      catch (Throwable t)

      { t.toString(); }

      }

      private void parseDocumentString() throws Exception

      { // get the document string String response = constructDocumentString(); // convert to bytes byte[] byteArray = response.getBytes(); // convert to input stream ByteArrayInputStream is = new ByteArrayInputStream(byteArray); // parse document Document doc = parse(new InputSource(is)); System.out.println("Size of response: " + byteArray.length); System.out.println(format(doc)); }

      public String format(Document document) throws IOException

      { StringWriter stringWriter = new StringWriter(2000); OutputFormat apacheOutputFormat = new OutputFormat(); apacheOutputFormat.setIndenting( true ); Serializer serializer = SerializerFactory.getSerializerFactory( Method.XML ).makeSerializer(stringWriter, apacheOutputFormat ); serializer.asDOMSerializer().serialize( document ); return stringWriter.toString(); }

      public Document parse(InputSource inputSource) throws SAXException, IOException {
      DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
      docBuilderFactory.setValidating( false );

      DocumentBuilder docBuilder = null;
      try

      { docBuilder = docBuilderFactory.newDocumentBuilder(); }

      catch (ParserConfigurationException excp)

      { throw new SAXException( "JAXP Parser Configuration Error", excp ); }

      return docBuilder.parse( inputSource );
      }

      public static String constructDocumentString()

      { StringBuffer buffer = new StringBuffer("<?xml version=\"1.0\" encoding=\"UTF-8\"?>") .append("\r\n<?bns version=\"2.0\" msgid=\"2004-10-04 14:31:20.332UO\"?>") .append("<COMServiceReply>") .append("<ReplyValue type=\"ResultSet\">") .append("<ResultSet>\n") .append("<Item name=\"Account\" value=\"1105016\">\n") .append("<Status>8</Status>\n") .append("<Account>\n") .append("<FUNDEDSTATUS>N</FUNDEDSTATUS>\n") .append("<STATUS>330</STATUS>\n") .append("<COMPLETESTATUS>N</COMPLETESTATUS>\n") .append("<PAYMENTFREQUENCY>MONTHLY 1</PAYMENTFREQUENCY>\n") .append("<INTERESTRATE>5.82876</INTERESTRATE>\n") .append("<CUSTOMERCODE>99</CUSTOMERCODE>\n") .append("<COMMERCIALFLAG>N</COMMERCIALFLAG>\n") .append("<CLASSTYPE>61</CLASSTYPE>\n") .append("<PRODUCTTYPE>904</PRODUCTTYPE>\n") .append("<TERM>24M</TERM>\n") .append("<AMORTIZATION>300M</AMORTIZATION>\n") .append("<FIRSTPAYMENTDATE>01-Aug-2004 00:00:00</FIRSTPAYMENTDATE>\n") .append("<LOANAMOUNT>225000.00</LOANAMOUNT>\n") .append("<MATURITYDATE>03-May-2006 00:00:00</MATURITYDATE>\n") .append("<CLOSINGORPOSESSIONDATE>01-May-2004 00:00:00</CLOSINGORPOSESSIONDATE>\n") .append("<CLOSINGORPOSESSIONDATEINDICATOR>N</CLOSINGORPOSESSIONDATEINDICATOR>\n") .append("<SERVICINGBRANCHTRANSIT>80002</SERVICINGBRANCHTRANSIT>\n") .append("<PAYMENTESCROWAMOUNT>0.00</PAYMENTESCROWAMOUNT>\n") .append("<PAYMENTCURRENTAMOUNT>750.00</PAYMENTCURRENTAMOUNT>\n") .append("<TOTALESCROWBALANCE>0.00</TOTALESCROWBALANCE>\n") .append("<PRINCIPALANDINTEREST>1426.22</PRINCIPALANDINTEREST>\n") .append("<SWITCHCODE>1</SWITCHCODE>\n") .append("<REMAININGAMORTIZATION>300M</REMAININGAMORTIZATION>\n") .append("<EFFECTIVEINTERESTRATE>5.90000</EFFECTIVEINTERESTRATE>\n") .append("<DATELOANAUTHORIZED>12-Apr-2004 00:00:00</DATELOANAUTHORIZED>\n") .append("<REPAYMENTSTARTDATE>03-May-2004 00:00:00</REPAYMENTSTARTDATE>\n") .append("<VRMINDICATOR>N</VRMINDICATOR>\n") .append("<PROCESSINGBRANCHTRANSIT>80002</PROCESSINGBRANCHTRANSIT>\n") .append("<DUEONSALE>1</DUEONSALE>\n") .append("<REPAYMENTTYPE>B</REPAYMENTTYPE>\n") .append("<MORTGAGENUMBER>1105016</MORTGAGENUMBER>\n") .append("<IPLINDICATOR>N</IPLINDICATOR>\n") .append("<FIXEDRATEINDICATOR>Y</FIXEDRATEINDICATOR>\n") .append("<SPRMINDICATOR>N</SPRMINDICATOR>\n") .append("<AUTHORIZEDINPUTTRANSIT>80002</AUTHORIZEDINPUTTRANSIT>\n") .append("<MORTGAGERATETYPE>F</MORTGAGERATETYPE>\n") .append("<VRMBASERATE>5.82876</VRMBASERATE>\n") .append("<DATEOFAPPLICATION>10-Apr-2004 00:00:00</DATEOFAPPLICATION>\n") .append("<Address>\n") .append("<ADDRESSLINE1>231 SANDHURST</ADDRESSLINE1>\n") .append("<CITY>SCARBOROUGH</CITY>\n") .append("<STATE>9</STATE>\n") .append("<POSTALCODE>M1B 2B2</POSTALCODE>\n") .append("<COUNTRY>37</COUNTRY>\n") .append("</Address>\n") .append("<FeePlan FEEPLANDESC=\"Appraisal Fee\" FEETYP=\"APPRFEE\">\n") .append("<FEEASSESSEDTOCUSTOMERPRIORYEAR>0.00</FEEASSESSEDTOCUSTOMERPRIORYEAR>\n") .append("<FEEASSESSEDTOCUSTOMERLIFE>225.00</FEEASSESSEDTOCUSTOMERLIFE>\n") .append("<FEEASSESSEDTOCUSTOMERYTD>225.00</FEEASSESSEDTOCUSTOMERYTD>\n") .append("<FEEPAIDYTD>0.00</FEEPAIDYTD>\n") .append("<FEEPAIDPRIORYEAR>0.00</FEEPAIDPRIORYEAR>\n") .append("<FEEWAIVEDLIFE>0.00</FEEWAIVEDLIFE>\n") .append("<FEEPAIDLIFE>0.00</FEEPAIDLIFE>\n") .append("</FeePlan>\n") .append("<FeePlan FEEPLANDESC=\"BC Discharge Fee\" FEETYP=\"BCDISCH\">\n") .append("<FEEASSESSEDTOCUSTOMERPRIORYEAR>0.00</FEEASSESSEDTOCUSTOMERPRIORYEAR>\n") .append("<FEEASSESSEDTOCUSTOMERLIFE>325.00</FEEASSESSEDTOCUSTOMERLIFE>\n") .append("<FEEASSESSEDTOCUSTOMERYTD>325.00</FEEASSESSEDTOCUSTOMERYTD>\n") .append("<FEEPAIDYTD>0.00</FEEPAIDYTD>\n") .append("<FEEPAIDPRIORYEAR>0.00</FEEPAIDPRIORYEAR>\n") .append("<FEEWAIVEDLIFE>0.00</FEEWAIVEDLIFE>\n") .append("<FEEPAIDLIFE>0.00</FEEPAIDLIFE>\n") .append("</FeePlan>\n") .append("<FeePlan FEEPLANDESC=\"CMHC Application Fee\" FEETYP=\"NHAAPP\">\n") .append("<FEEASSESSEDTOCUSTOMERPRIORYEAR>0.00</FEEASSESSEDTOCUSTOMERPRIORYEAR>\n") .append("<FEEASSESSEDTOCUSTOMERLIFE>275.00</FEEASSESSEDTOCUSTOMERLIFE>\n") .append("<FEEASSESSEDTOCUSTOMERYTD>275.00</FEEASSESSEDTOCUSTOMERYTD>\n") .append("<FEEPAIDYTD>0.00</FEEPAIDYTD>\n") .append("<FEEPAIDPRIORYEAR>0.00</FEEPAIDPRIORYEAR>\n") .append("<FEEWAIVEDLIFE>0.00</FEEWAIVEDLIFE>\n") .append("<FEEPAIDLIFE>0.00</FEEPAIDLIFE>\n") .append("</FeePlan>\n") .append("<Collateral COLLATERALCODE=\"10\">\n") .append("<PROPERTYDESCRIPTIONLINE1>222 Main street</PROPERTYDESCRIPTIONLINE1>\n") .append("<PROPERTYADDRESSLINE3>Burneby</PROPERTYADDRESSLINE3>\n") .append("<PROPERTYADDRESSSTATE>2</PROPERTYADDRESSSTATE>\n") .append("<PROPERTYADDRESSCOUNTRY>37</PROPERTYADDRESSCOUNTRY>\n") .append("<DATEBUILDINGAPPRAISEDANDDATELANDAPPRAISED>2004-04-08 00:00:00.0</DATEBUILDINGAPPRAISEDANDDATELANDAPPRAISED>\n") .append("<BUILDINGLENDINGVALUE>125000.00</BUILDINGLENDINGVALUE>\n") .append("<COLDESC>BC</COLDESC>\n") .append("<LANDLENDINGVALUE>125000.00</LANDLENDINGVALUE>\n") .append("<TYPEOFSECURITY>1</TYPEOFSECURITY>\n") .append("<APPRAISEDBUILDINGVALUE>125000.00</APPRAISEDBUILDINGVALUE>\n") .append("<APPRAISEDLANDVALUE>125000.00</APPRAISEDLANDVALUE>\n") .append("<LENDINGVALUE>250000.00</LENDINGVALUE>\n") .append("</Collateral>\n") .append("<Phone>\n") .append("<HOMEPHONENUMBER>4166155465</HOMEPHONENUMBER>\n") .append("<WORKPHONENUMBER>4166455465</WORKPHONENUMBER>\n") .append("</Phone>\n") .append("</Account>\n") .append("</Item>\n") .append("</ResultSet>\n") .append("</ReplyValue>") .append("</COMServiceReply>") .append("<?sanchez msgid=\"2004-10-04 14:31:20.332UO\"?>"); return buffer.toString(); }

      }

      Thank you!

      Attachments

        1. Foo.java
          9 kB
          Karen Owen

        Activity

          People

            Unassigned Unassigned
            kowen Karen Owen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: