Cocoon
  1. Cocoon
  2. COCOON-2158

XMLByteStreamCompiler hard-coded limits of 0xffff Strings prevents large XML documents from being handled in Cocoon

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.1.6, 2.1.7, 2.1.8, 2.1.9, 2.1.10, 2.1.11, 2.1.12
    • Fix Version/s: 2.1.12, 2.2
    • Component/s: * Cocoon Core
    • Labels:
      None
    • Urgency:
      Urgent
    • Other Info:
      Patch available
    • Affects version (Component):
      Cocoon Core
    • Fix version (Component):
      Cocoon Core

      Description

      The hard-coded limits in XMLByteStreamCompiler prevent Cocoon from handling large XML documents.

      See the methods writeString and writeAttributes for the hard coded arbitrary maximums:

      if (i > 0xFFFF) throw new SAXException("Index too large");

      if (attributes > 0xFFFF) throw new SAXException("Too many attributes");

      Additionally, the hand-coded bit manipulation is pretty difficult to change in order to work around this.

      I am attaching a patch for 2.1.11 that updates the existing JUnit test case to reproduce the problem, as well as a fix to the problem that uses the DataInputStream and DataOutputStream for the low-level bit manipulation.

        Activity

        Hide
        Antonio Gallardo added a comment -
        Code from 2.2 is already backported. Also the testcase was extended to show this issue (in AbstractXMLTestCase.java replace line 59 with 58 - currently commented).
        Show
        Antonio Gallardo added a comment - Code from 2.2 is already backported. Also the testcase was extended to show this issue (in AbstractXMLTestCase.java replace line 59 with 58 - currently commented).
        Hide
        Eric Meyer added a comment -
        Updated to change string length limit to Integer.MAX_VALUE (0x7fffffff).
        Updated test case to show that there was a string length limit.
        Show
        Eric Meyer added a comment - Updated to change string length limit to Integer.MAX_VALUE (0x7fffffff). Updated test case to show that there was a string length limit.
        Hide
        Eric Meyer added a comment -
        Vadim - thank you for pointing out the 65k character limit - you saved me from finding that the hard way. Definitely a problem. That nuance sure escaped me in the original code... but that's why I gave up trying to rework the original code - it was darn challenging to see what exactly was going on at the bit low level. While it's not higher math, it wasn't obvious what it was doing or why. Also thanks for pointing out the work on the 2.2. branch. You applied many of the same refactorings that I did.

        Since the format is not binary backward compatible, I changed the prolog to CXML11. I couldn't find that defined as a standard anywhere.

        I'm attaching a new patch that removes the 65k limit by instead writing the integer length of the string followed by the characters. I also updated the test to show that it can successfully handle more than 65k of \uffff characters.

        Kind regards,
        Eric
        Show
        Eric Meyer added a comment - Vadim - thank you for pointing out the 65k character limit - you saved me from finding that the hard way. Definitely a problem. That nuance sure escaped me in the original code... but that's why I gave up trying to rework the original code - it was darn challenging to see what exactly was going on at the bit low level. While it's not higher math, it wasn't obvious what it was doing or why. Also thanks for pointing out the work on the 2.2. branch. You applied many of the same refactorings that I did. Since the format is not binary backward compatible, I changed the prolog to CXML11. I couldn't find that defined as a standard anywhere. I'm attaching a new patch that removes the 65k limit by instead writing the integer length of the string followed by the characters. I also updated the test to show that it can successfully handle more than 65k of \uffff characters. Kind regards, Eric
        Hide
        Vadim Gritsenko added a comment -
        I'd like to point out that attached patch introduces new limit (which is more likely to occur): text size is now limited to 65k bytes (32k characters or less of UTF-8).

        Another two issues with the patch are:
        * It does not take into account that this code was refactored in trunk
        * It does not recognize CXML10, introduces CXML11 as replacement.

        IMHO It would be easier - and will be compatible - to extend handling of the large index instead of applying patch as is.
        Show
        Vadim Gritsenko added a comment - I'd like to point out that attached patch introduces new limit (which is more likely to occur): text size is now limited to 65k bytes (32k characters or less of UTF-8). Another two issues with the patch are: * It does not take into account that this code was refactored in trunk * It does not recognize CXML10, introduces CXML11 as replacement. IMHO It would be easier - and will be compatible - to extend handling of the large index instead of applying patch as is.
        Hide
        Eric Meyer added a comment -
        Patch file includes both an update to the test cases to show the problem as well as the fix for the problem.
        Show
        Eric Meyer added a comment - Patch file includes both an update to the test cases to show the problem as well as the fix for the problem.

          People

          • Assignee:
            Antonio Gallardo
            Reporter:
            Eric Meyer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development