Derby
  1. Derby
  2. DERBY-428

NetworkClient PreparedStatement.executeBatch() hangs if batch is too large (ArrayIndexOutOfBoundsException in Network Server)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.1.3.2, 10.2.1.6
    • Labels:
      None
    • Environment:
      Linux atum01 2.4.20-31.9 #1 Tue Apr 13 18:04:23 EDT 2004 i686 i686 i386 GNU/Linux
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_03-b07)
      Java HotSpot(TM) Client VM (build 1.5.0_03-b07, mixed mode, sharing)

      Description

      When running

      s.executeUpdate("create table t (i integer)");
      PreparedStatement p = c.prepareStatement("insert into t values");
      for (int i=0; i<N; i++)

      { p.setInt(1,i); p.addBatch(); }

      System.out.println("Ok");
      p.executeBatch();

      If N is 9000
      The server reports:

      524272
      java.lang.ArrayIndexOutOfBoundsException: 524272
      at org.apache.derby.impl.drda.DDMWriter.startDdm(DDMWriter.java:315)
      at org.apache.derby.impl.drda.DRDAConnThread.writeSQLCARD(DRDAConnThread.java:4937)
      at org.apache.derby.impl.drda.DRDAConnThread.writeSQLCARDs(DRDAConnThread.java:4898)
      at org.apache.derby.impl.drda.DRDAConnThread.writeSQLCARDs(DRDAConnThread.java:4888)
      at org.apache.derby.impl.drda.DRDAConnThread.checkWarning(DRDAConnThread.java:7239)
      at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTT(DRDAConnThread.java:3605)
      at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnThread.java:859)
      at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:214)
      agentThread[DRDAConnThread_3,5,main]

      While the client hangs in executeBatch().

      If N is 8000, the client gets the following Exception:
      Exception in thread "main" org.apache.derby.client.am.BatchUpdateException: Non-atomic batch failure. The batch was submitted, but at least one exception occurred on an individual member of the batch. Use getNextException() to retrieve the exceptions for specific batched elements.
      at org.apache.derby.client.am.Agent.endBatchedReadChain(Agent.java:267)
      at org.apache.derby.client.am.PreparedStatement.executeBatchRequestX(PreparedStatement.java:1596)
      at org.apache.derby.client.am.PreparedStatement.executeBatchX(PreparedStatement.java:1467)
      at org.apache.derby.client.am.PreparedStatement.executeBatch(PreparedStatement.java:945)
      at AOIB.main(AOIB.java:24)

      1. derby428_10_1.stat
        0.2 kB
        Kathey Marsden
      2. derby428_10_1.diff
        5 kB
        Kathey Marsden
      3. b428.java
        1 kB
        Bryan Pendleton
      4. derby-428.diff
        5 kB
        Bryan Pendleton

        Issue Links

          Activity

          Hide
          Bernt M. Johnsen added a comment -

          BTW: When the BatchUpdateException is thrown, the connection is closed down (probably due to the ArrayIndexOutOfBoundsException on the server)!

          Show
          Bernt M. Johnsen added a comment - BTW: When the BatchUpdateException is thrown, the connection is closed down (probably due to the ArrayIndexOutOfBoundsException on the server)!
          Hide
          Gregory W. Inns added a comment -

          We're getting one similar to this one, but only sporadically.

          Does anyone know a workaround for it?

          Show
          Gregory W. Inns added a comment - We're getting one similar to this one, but only sporadically. Does anyone know a workaround for it?
          Hide
          Bryan Pendleton added a comment -

          startDDM() is missing a call to ensureLength(). Many of the other problems that Bernt was seeing were due to DERBY-125 and DERBY-491, I believe.

          With the ensureLength call added to startDDM, I can get up to batch size 65535, at which point I get a wholly different exception. I'll study that and see if it is related, or if it is an independent bug that I'm just seeing now due to clearing the other bugs that prevented us from getting this far.

          Show
          Bryan Pendleton added a comment - startDDM() is missing a call to ensureLength(). Many of the other problems that Bernt was seeing were due to DERBY-125 and DERBY-491 , I believe. With the ensureLength call added to startDDM, I can get up to batch size 65535, at which point I get a wholly different exception. I'll study that and see if it is related, or if it is an independent bug that I'm just seeing now due to clearing the other bugs that prevented us from getting this far.
          Hide
          Bryan Pendleton added a comment -

          The crash on batch element 65535 seems pretty straightforward: correlation IDs in DSS blocks are 2-byte unsigned integers, so of course there can only be 65536 total values. And since values 0, and "-1" (65535) are apparently reserved for some special purposes, that seems to mean that there is a hard limit in the DRDA protocol itself: there can be no more than 65534 elements in a single batch chain.

          From DRDA v.3, page 15:

          The value of the request correlation identifier is a unique non-negative binary number. Each RQSDSS
          in a DSS chain must have a unique correlation identifier. The correlation identifier is sent to the target
          agent that receives the request.

          And on page 772:

          The request correlator can be set to any positive (number greater than zero) binary number for the
          first request, or only request, in an RQSDSS chain. Each RQSDSS in an RQSDSS chain after the
          first one must have a request correlator that is greater than the previous RQSDSS. The request
          correlator can be set to any value.

          So, that leaves me with the following thoughts:
          1) Am I reading the DRDA spec correctly here (I'll study it some more)
          2) Is it OK for our implementation to have a hard limit of 65534 elements in a single batch chain?
          3) If so, what should the behavior be if an application tries to add more elements than that? Throw an exception?

          Show
          Bryan Pendleton added a comment - The crash on batch element 65535 seems pretty straightforward: correlation IDs in DSS blocks are 2-byte unsigned integers, so of course there can only be 65536 total values. And since values 0, and "-1" (65535) are apparently reserved for some special purposes, that seems to mean that there is a hard limit in the DRDA protocol itself: there can be no more than 65534 elements in a single batch chain. From DRDA v.3, page 15: The value of the request correlation identifier is a unique non-negative binary number. Each RQSDSS in a DSS chain must have a unique correlation identifier. The correlation identifier is sent to the target agent that receives the request. And on page 772: The request correlator can be set to any positive (number greater than zero) binary number for the first request, or only request, in an RQSDSS chain. Each RQSDSS in an RQSDSS chain after the first one must have a request correlator that is greater than the previous RQSDSS. The request correlator can be set to any value. So, that leaves me with the following thoughts: 1) Am I reading the DRDA spec correctly here (I'll study it some more) 2) Is it OK for our implementation to have a hard limit of 65534 elements in a single batch chain? 3) If so, what should the behavior be if an application tries to add more elements than that? Throw an exception?
          Hide
          Bryan Pendleton added a comment -

          I believe that the 2-byte limit to the number of correlated commands in a single DRDA request is a hard limit.

          I've thought of two possible ways to proceed:
          1) The network client could refuse to execute a batch of more than 64K commands.
          2) The network client could break such a giant batch into multiple DRDA requests.

          The second one is substantially more work, but potentially could support an arbitrarily
          large batch size. However, I'm not sure if it introduces some subtle semantic changes
          because the single user-level logical batch is now being decomposed into multiple
          physical batches.

          To misquote Bill Gates, it seems like 64K commands in a single batch ought to be
          enough for anyone, but I'd sure appreciate some more opinions on this topic.

          Show
          Bryan Pendleton added a comment - I believe that the 2-byte limit to the number of correlated commands in a single DRDA request is a hard limit. I've thought of two possible ways to proceed: 1) The network client could refuse to execute a batch of more than 64K commands. 2) The network client could break such a giant batch into multiple DRDA requests. The second one is substantially more work, but potentially could support an arbitrarily large batch size. However, I'm not sure if it introduces some subtle semantic changes because the single user-level logical batch is now being decomposed into multiple physical batches. To misquote Bill Gates, it seems like 64K commands in a single batch ought to be enough for anyone, but I'd sure appreciate some more opinions on this topic.
          Hide
          Kathey Marsden added a comment -

          Bryan said
          >I've thought of two possible ways to proceed:
          >1) The network client could refuse to execute a batch of more than 64K commands.
          >2) The network client could break such a giant batch into multiple DRDA requests.

          If I understand this correctly, if we go with 1) would the workaround just be for the user to break up their batches into a few batches each with less than 64K statements? If so, I think 1) is fine for now. We have bigger fish to fry with Network Server.

          Show
          Kathey Marsden added a comment - Bryan said >I've thought of two possible ways to proceed: >1) The network client could refuse to execute a batch of more than 64K commands. >2) The network client could break such a giant batch into multiple DRDA requests. If I understand this correctly, if we go with 1) would the workaround just be for the user to break up their batches into a few batches each with less than 64K statements? If so, I think 1) is fine for now. We have bigger fish to fry with Network Server.
          Hide
          Bryan Pendleton added a comment -

          Attached is a standalone test program, b428.java, for experimenting with the bug, and a patch proposal, derby-428.diff.

          The patch contains a server-side change, a client-side change, and a regression test.

          The server-side change is to call ensureLength() in DDMWriter.startDDM(). The DDMWriter working buffer is designed to dynamically grow to accomodate the data being written; this dynamic growth is implemented using a coding rule which requires that all DDMWriter internal routines must call ensureLength to communicate the buffer size requirements of that routine prior to writing bytes into the buffer. StartDDM was missing the call to ensureLength. It was just luck that this hadn't caused any problems in the past; this particular bug exposed the problem in startDDM by causing the server to write a tremendous number of very small DDM records in a single correlated chain, which meant that eventually (around batch element 9000), startDDM tried to write past the end of the buffer without calling ensureLength first. Simple change, even if my explanation is not so clear

          The client-side change is due to the fact that DRDA imposes a hard limit of 65535 elements in a single correlated request because the correlation identifier is a two byte unsigned integer. Without this change, what happens is that the correlation identifier wraps around when we go to write the 65536th element in the batch, and we start breaking DRDA protocol rules since DRDA requires that the correlation IDs in a single request be always increasing. The change in this patch proposal causes the client to throw an exception if it is asked to execute a batch containing more than 65534 elements. The reason for the number 65534, rather than 65535, is that the value 0xFFFF seems to be reserved for some special purpose.

          Experimenting with the JCC driver, I discovered that it seems to reserve more than just 0xFFFF, but also 0xFFFE and 0xFFFD as special values; the largest number of elements that I could succcessfully execute in a single batch with the JCC driver is 65532. I don't know what is going on with those special values, unfortunately.

          The regression test verifies that we can successfully execute a batch containing 65532 elements with both the Network Client and JCC drivers. The test also verifies that, if we are using the Network Client, then we get the expected exception if we try to execute a batch with more than 65534 elements.

          Comments, suggestions, and feedback are welcome!

          Show
          Bryan Pendleton added a comment - Attached is a standalone test program, b428.java, for experimenting with the bug, and a patch proposal, derby-428.diff. The patch contains a server-side change, a client-side change, and a regression test. The server-side change is to call ensureLength() in DDMWriter.startDDM(). The DDMWriter working buffer is designed to dynamically grow to accomodate the data being written; this dynamic growth is implemented using a coding rule which requires that all DDMWriter internal routines must call ensureLength to communicate the buffer size requirements of that routine prior to writing bytes into the buffer. StartDDM was missing the call to ensureLength. It was just luck that this hadn't caused any problems in the past; this particular bug exposed the problem in startDDM by causing the server to write a tremendous number of very small DDM records in a single correlated chain, which meant that eventually (around batch element 9000), startDDM tried to write past the end of the buffer without calling ensureLength first. Simple change, even if my explanation is not so clear The client-side change is due to the fact that DRDA imposes a hard limit of 65535 elements in a single correlated request because the correlation identifier is a two byte unsigned integer. Without this change, what happens is that the correlation identifier wraps around when we go to write the 65536th element in the batch, and we start breaking DRDA protocol rules since DRDA requires that the correlation IDs in a single request be always increasing. The change in this patch proposal causes the client to throw an exception if it is asked to execute a batch containing more than 65534 elements. The reason for the number 65534, rather than 65535, is that the value 0xFFFF seems to be reserved for some special purpose. Experimenting with the JCC driver, I discovered that it seems to reserve more than just 0xFFFF, but also 0xFFFE and 0xFFFD as special values; the largest number of elements that I could succcessfully execute in a single batch with the JCC driver is 65532. I don't know what is going on with those special values, unfortunately. The regression test verifies that we can successfully execute a batch containing 65532 elements with both the Network Client and JCC drivers. The test also verifies that, if we are using the Network Client, then we get the expected exception if we try to execute a batch with more than 65534 elements. Comments, suggestions, and feedback are welcome!
          Hide
          Bryan Pendleton added a comment -
          Show
          Bryan Pendleton added a comment - I've committed this patch: http://svn.apache.org/viewcvs?rev=387895&view=rev
          Hide
          Bryan Pendleton added a comment -

          Bernt, can you please verify this fix and close the bug if appropriate?

          Show
          Bryan Pendleton added a comment - Bernt, can you please verify this fix and close the bug if appropriate?
          Hide
          Bernt M. Johnsen added a comment -

          Verified. Works perfectly up to 65534 "commands" in batch. Above: Correctly raises org.apache.derby.client.am.BatchUpdateException: No more than 65,534 commands may be added to a single batch.

          Show
          Bernt M. Johnsen added a comment - Verified. Works perfectly up to 65534 "commands" in batch. Above: Correctly raises org.apache.derby.client.am.BatchUpdateException: No more than 65,534 commands may be added to a single batch.
          Hide
          Kathey Marsden added a comment -

          reopen to port to 10.1

          Show
          Kathey Marsden added a comment - reopen to port to 10.1
          Hide
          Kathey Marsden added a comment -

          reassign for port.

          Show
          Kathey Marsden added a comment - reassign for port.
          Hide
          Kathey Marsden added a comment -

          Attached is the patch derby428_10_1.diff and derby428_10_1.stat to port this fix to 10.1. I am seeing derbynetmats hang when running derbynet/testProperties.java. Before it ung on NSInSameJVM but ran ok on rerun. I am investigating that now. This patch should not be committed to 10.1 yet.

          Show
          Kathey Marsden added a comment - Attached is the patch derby428_10_1.diff and derby428_10_1.stat to port this fix to 10.1. I am seeing derbynetmats hang when running derbynet/testProperties.java. Before it ung on NSInSameJVM but ran ok on rerun. I am investigating that now. This patch should not be committed to 10.1 yet.
          Hide
          Kathey Marsden added a comment -

          After rebooting my computer I was no longer able to reproduce the testProperties hang. I think it may have been at least triggered by my firewall sofware and the code path for testProperties should not really trigger any problems with this change which would only be relevant when the DDMWriter buffer exceeds 32K.

          I think the hang may have been related to some interaction with my firewall software. I will post separately to derby-dev about that as I think there may be a Network Server issue there.

          Date: Thu Jul 20 04:47:54 2006
          New Revision: 423910

          URL: http://svn.apache.org/viewvc?rev=423910&view=rev
          Log:
          DERBY-428 NetworkClient PreparedStatement.executeBatch() hangs if batch is too large (ArrayIndexOutOfBoundsException in Network Server)

          Show
          Kathey Marsden added a comment - After rebooting my computer I was no longer able to reproduce the testProperties hang. I think it may have been at least triggered by my firewall sofware and the code path for testProperties should not really trigger any problems with this change which would only be relevant when the DDMWriter buffer exceeds 32K. I think the hang may have been related to some interaction with my firewall software. I will post separately to derby-dev about that as I think there may be a Network Server issue there. Date: Thu Jul 20 04:47:54 2006 New Revision: 423910 URL: http://svn.apache.org/viewvc?rev=423910&view=rev Log: DERBY-428 NetworkClient PreparedStatement.executeBatch() hangs if batch is too large (ArrayIndexOutOfBoundsException in Network Server)
          Hide
          Kathey Marsden added a comment -

          Reopen to port to 10.1

          Show
          Kathey Marsden added a comment - Reopen to port to 10.1
          Hide
          Kathey Marsden added a comment -

          errored reopening this issue

          Show
          Kathey Marsden added a comment - errored reopening this issue

            People

            • Assignee:
              Bryan Pendleton
              Reporter:
              Bernt M. Johnsen
            • Votes:
              2 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development