Derby
  1. Derby
  2. DERBY-3379

"No Current connection" on PooledConnection.getConnection() if pooled connection is reused during connectionClosed processing

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.2.0, 10.0.2.1, 10.1.1.0, 10.1.2.1, 10.1.3.1, 10.2.1.6, 10.2.2.0, 10.3.1.4, 10.3.2.1
    • Fix Version/s: 10.3.3.0, 10.4.1.3, 10.5.1.1
    • Component/s: Network Client
    • Labels:
      None

      Description

      This is the client version of bug DERBY-2142. It can be reproduced by enabling the test, DataSourceTest.testPooledReuseOnClose() for client. I am opening a new issue for client as the embedded fix was backported to 10.1 and I am guessing the client fix won't be backported that far. Better to keep it as a separate issue.

      1. derby-3379-1a-enforce_no_chaining.diff
        7 kB
        Kristian Waagan
      2. derby-3379-1a-enforce_no_chaining.diff
        7 kB
        Kristian Waagan
      3. derby-3379-1a-enforce_no_chaining.stat
        0.3 kB
        Kristian Waagan
      4. derby-3379-1a-enforce_no_chaining.stat
        0.3 kB
        Kristian Waagan
      5. derby-3379-1b-enforce_no_chaining.diff
        6 kB
        Kristian Waagan
      6. derby-3379-1c-enforce_no_chaining.diff
        6 kB
        Kristian Waagan
      7. derby-3379-1c-enforce_no_chaining.stat
        0.3 kB
        Kristian Waagan
      8. derby-3379-2a-comment_and_JavaDoc.diff
        3 kB
        Kristian Waagan
      9. derby-3379-3a-minimal_fix_and_test_enabling.diff
        3 kB
        Kristian Waagan

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment -

          Committed revision 651835 to 10.3

          Show
          Kathey Marsden added a comment - Committed revision 651835 to 10.3
          Hide
          Kristian Waagan added a comment -

          Added link to DERBY-3421. If you want to backport the fix for 3379 to the 10.3 branch, the fix for 3421 must be included as well. Failing to do so will cause Derby to hang instead of throwing the "No current connection" exception.

          I do not plan to backport to 10.3 unless someone makes a request to do so.

          Show
          Kristian Waagan added a comment - Added link to DERBY-3421 . If you want to backport the fix for 3379 to the 10.3 branch, the fix for 3421 must be included as well. Failing to do so will cause Derby to hang instead of throwing the "No current connection" exception. I do not plan to backport to 10.3 unless someone makes a request to do so.
          Hide
          Kristian Waagan added a comment -

          Committed 'derby-3379-3a-minimal_fix_and_test_enabling.diff' to trunk with revision 642942 and merged it to the 10.4 branch with revision 642944.

          It turned out that the deleted method 'cacheConnectBytes' changed the state in the request object, which caused the hang. The reason for the state change was that it needed the correct offsets to copy the raw connect bytes, and it obtained these offsets by finalizing the DSS. The chaining bits were set as part of this, but when there are no more commands to send, the server will hang waiting for the next command and the client will hang waiting for the reply to the first (and only) command. The bug will not be triggered if the statement being closed has open result sets, as that will cause another command to be written to the chain and the chaining bits are set correctly in this case.

          I'm of the opinion that the patch makes the driver behave more correctly, and therefore I proceeded with committing patch 3a.

          For reference, here is the stack trace for the client during the hang. The server stack trace doesn't show anything useful for this bug.
          "main" prio=10 tid=0x08070000 nid=0x2 runnable [0xfe4fe000..0xfe4fed48]
          java.lang.Thread.State: RUNNABLE
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(SocketInputStream.java:129)
          at org.apache.derby.client.net.Reply.fill(Reply.java:174)
          at org.apache.derby.client.net.Reply.ensureALayerDataInBuffer(Reply.java:215)
          at org.apache.derby.client.net.Reply.readDssHeader(Reply.java:317)
          at org.apache.derby.client.net.Reply.startSameIdChainParse(Reply.java:1147)
          at org.apache.derby.client.net.NetConnectionReply.readExchangeServerAttributes(NetConnectionReply.java:54)
          at org.apache.derby.client.net.NetConnection.readServerAttributesAndKeyExchange(NetConnection.java:862)
          at org.apache.derby.client.net.NetConnection.readAllConnectCommandsChained(NetConnection.java:832)
          at org.apache.derby.client.net.NetConnection.readDeferredReset(NetConnection.java:994)
          at org.apache.derby.client.net.NetAgent.readDeferredResetConnection(NetAgent.java:474)
          at org.apache.derby.client.net.NetAgent.beginReadChainOutsideUOW(NetAgent.java:490)
          at org.apache.derby.client.am.Agent.flowOutsideUOW(Agent.java:197)
          at org.apache.derby.client.am.Statement.flowCloseOutsideUOW(Statement.java:1686)
          at org.apache.derby.client.am.Statement.closeX(Statement.java:604)
          at org.apache.derby.client.am.Statement.close(Statement.java:564)

          • locked <0xf4157648> (a org.apache.derby.client.net.NetConnection40)
            at d3379.main(d3379.java:30)
          Show
          Kristian Waagan added a comment - Committed 'derby-3379-3a-minimal_fix_and_test_enabling.diff' to trunk with revision 642942 and merged it to the 10.4 branch with revision 642944. It turned out that the deleted method 'cacheConnectBytes' changed the state in the request object, which caused the hang. The reason for the state change was that it needed the correct offsets to copy the raw connect bytes, and it obtained these offsets by finalizing the DSS. The chaining bits were set as part of this, but when there are no more commands to send, the server will hang waiting for the next command and the client will hang waiting for the reply to the first (and only) command. The bug will not be triggered if the statement being closed has open result sets, as that will cause another command to be written to the chain and the chaining bits are set correctly in this case. I'm of the opinion that the patch makes the driver behave more correctly, and therefore I proceeded with committing patch 3a. For reference, here is the stack trace for the client during the hang. The server stack trace doesn't show anything useful for this bug. "main" prio=10 tid=0x08070000 nid=0x2 runnable [0xfe4fe000..0xfe4fed48] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.derby.client.net.Reply.fill(Reply.java:174) at org.apache.derby.client.net.Reply.ensureALayerDataInBuffer(Reply.java:215) at org.apache.derby.client.net.Reply.readDssHeader(Reply.java:317) at org.apache.derby.client.net.Reply.startSameIdChainParse(Reply.java:1147) at org.apache.derby.client.net.NetConnectionReply.readExchangeServerAttributes(NetConnectionReply.java:54) at org.apache.derby.client.net.NetConnection.readServerAttributesAndKeyExchange(NetConnection.java:862) at org.apache.derby.client.net.NetConnection.readAllConnectCommandsChained(NetConnection.java:832) at org.apache.derby.client.net.NetConnection.readDeferredReset(NetConnection.java:994) at org.apache.derby.client.net.NetAgent.readDeferredResetConnection(NetAgent.java:474) at org.apache.derby.client.net.NetAgent.beginReadChainOutsideUOW(NetAgent.java:490) at org.apache.derby.client.am.Agent.flowOutsideUOW(Agent.java:197) at org.apache.derby.client.am.Statement.flowCloseOutsideUOW(Statement.java:1686) at org.apache.derby.client.am.Statement.closeX(Statement.java:604) at org.apache.derby.client.am.Statement.close(Statement.java:564) locked <0xf4157648> (a org.apache.derby.client.net.NetConnection40) at d3379.main(d3379.java:30)
          Hide
          Kristian Waagan added a comment -

          Thanks for the verification and the detective work Knut Anders!

          I'll look into the patch I committed and see if I understand why it changes the behavior. That was not the intention, so I have to determine if the removed code was buggy or there is another reason for the changed behavior.

          Show
          Kristian Waagan added a comment - Thanks for the verification and the detective work Knut Anders! I'll look into the patch I committed and see if I understand why it changes the behavior. That was not the intention, so I have to determine if the removed code was buggy or there is another reason for the changed behavior.
          Hide
          Knut Anders Hatlen added a comment -

          It seems like the hang disappeared after this commit:

          ------------------------------------------------------------------------
          r628679 | kristwaa | 2008-02-18 11:56:23 +0100 (Mon, 18 Feb 2008) | 3 lines

          DERBY-3421: Remove unused code for caching of connect bytes.
          Patch file: derby-3421-1b-removal.diff

          ------------------------------------------------------------------------

          Show
          Knut Anders Hatlen added a comment - It seems like the hang disappeared after this commit: ------------------------------------------------------------------------ r628679 | kristwaa | 2008-02-18 11:56:23 +0100 (Mon, 18 Feb 2008) | 3 lines DERBY-3421 : Remove unused code for caching of connect bytes. Patch file: derby-3421-1b-removal.diff ------------------------------------------------------------------------
          Hide
          Knut Anders Hatlen added a comment -

          J2EEDataSourceTest ran cleanly in my environment, and testPooledReuseOnClose was invoked twice (embedded and client).

          Show
          Knut Anders Hatlen added a comment - J2EEDataSourceTest ran cleanly in my environment, and testPooledReuseOnClose was invoked twice (embedded and client).
          Hide
          Kristian Waagan added a comment -

          'derby-3379-3a-minimal_fix_and_test_enabling.diff' is a minimal fix (basically a one-liner) and it also enables the regression test that was commented out.

          Seems some other change has somehow changed the characteristics of the problem. Earlier I got a hang with only the fix for ClientPooledConnection. Now it works fine. Without the CPC-fix, I get a "No current connection".
          It is not clear to me what has caused the change. I have run the test both with and without statement pooling enabled, and also both with trunk and the 10.4 branch.

          Can someone please try this patch out and verify that jdbcapi.J2EEDataSourceTest runs cleanly, and that the repro is actually run?
          I have started the full regression suite, and will report back with the results tomorrow.
          If I get the cycles, I'll try to hunt down the commit that made the hang go away. Would be nice to know...

          Show
          Kristian Waagan added a comment - 'derby-3379-3a-minimal_fix_and_test_enabling.diff' is a minimal fix (basically a one-liner) and it also enables the regression test that was commented out. Seems some other change has somehow changed the characteristics of the problem. Earlier I got a hang with only the fix for ClientPooledConnection. Now it works fine. Without the CPC-fix, I get a "No current connection". It is not clear to me what has caused the change. I have run the test both with and without statement pooling enabled, and also both with trunk and the 10.4 branch. Can someone please try this patch out and verify that jdbcapi.J2EEDataSourceTest runs cleanly, and that the repro is actually run? I have started the full regression suite, and will report back with the results tomorrow. If I get the cycles, I'll try to hunt down the commit that made the hang go away. Would be nice to know...
          Hide
          Kristian Waagan added a comment -

          'derby-3379-2a-comment_and_JavaDoc.diff' cleans up some comments in buildDss.
          One of the comments confused me at first, because it said chaining bits were always set, followed by two 0xFF assignments in the byte array. Turned out this was the length bytes, and that the chaining bits are not unconditionally set (as part of this method).
          It also converts comments for method finalizeDssLength into JavaDoc.

          I committed 2a to trunk with revision 637805.
          I plan to merge all commits for this issue to the 10.4 branch when the issue has been resolved.

          My further work here looks like this:

          • Address Knut's comments and upload a new "base patch".
          • Upload two enabling patches, one broad (current) and one isolated fix.

          I will spend a little more time looking into the issue and see if the fix can be made smaller. If not, I hope the isolated fix can go in. The plan is to only change the chaining flag if there are no result sets to close, as opposed to checking and possibly resetting the chaining flag on every flush (broad approach).

          My opinion is that even if the isolated fix may not be perfect, it should go in. We have users seeing the problem, and the patch fixes it without breaking any of the regression tests.
          If others want to pursue a different fix, they are welcome to do so
          (but please let me know so we don't duplicate our efforts)

          I do plan to get this in for the 10.4 release.

          Show
          Kristian Waagan added a comment - 'derby-3379-2a-comment_and_JavaDoc.diff' cleans up some comments in buildDss. One of the comments confused me at first, because it said chaining bits were always set, followed by two 0xFF assignments in the byte array. Turned out this was the length bytes, and that the chaining bits are not unconditionally set (as part of this method). It also converts comments for method finalizeDssLength into JavaDoc. I committed 2a to trunk with revision 637805. I plan to merge all commits for this issue to the 10.4 branch when the issue has been resolved. My further work here looks like this: Address Knut's comments and upload a new "base patch". Upload two enabling patches, one broad (current) and one isolated fix. I will spend a little more time looking into the issue and see if the fix can be made smaller. If not, I hope the isolated fix can go in. The plan is to only change the chaining flag if there are no result sets to close, as opposed to checking and possibly resetting the chaining flag on every flush (broad approach). My opinion is that even if the isolated fix may not be perfect, it should go in. We have users seeing the problem, and the patch fixes it without breaking any of the regression tests. If others want to pursue a different fix, they are welcome to do so (but please let me know so we don't duplicate our efforts) I do plan to get this in for the 10.4 release.
          Hide
          Knut Anders Hatlen added a comment -

          I think the approach is OK, although it sounds better if the chaining bits were set correctly in the first place. It's not quite clear to me where/how the chaining bits are set the first time, but if I understand correctly, Statement.flowCloseOutsideUOW() will only be called if nothing needs to be sent to the server on Statement.close(). Could we somehow use this to send a flag down the call tree so that the chaining bits are set correctly?

          Should previousUsedDssMark have been reset in clearBuffer()?

          Should endDssChain() also check that (dssLengthLocation_ == offset_) so that we're completely sure the last DSS is empty/missing?

          Show
          Knut Anders Hatlen added a comment - I think the approach is OK, although it sounds better if the chaining bits were set correctly in the first place. It's not quite clear to me where/how the chaining bits are set the first time, but if I understand correctly, Statement.flowCloseOutsideUOW() will only be called if nothing needs to be sent to the server on Statement.close(). Could we somehow use this to send a flag down the call tree so that the chaining bits are set correctly? Should previousUsedDssMark have been reset in clearBuffer()? Should endDssChain() also check that (dssLengthLocation_ == offset_) so that we're completely sure the last DSS is empty/missing?
          Hide
          Kristian Waagan added a comment -

          'derby-3379-1c-enforce_no_chaining.diff' patches the new test (J2EEDataSourceTest) instead of the old one.
          No changes to the logic/functionality in the patch.

          Still awaiting comments on the approach taken.

          Show
          Kristian Waagan added a comment - 'derby-3379-1c-enforce_no_chaining.diff' patches the new test (J2EEDataSourceTest) instead of the old one. No changes to the logic/functionality in the patch. Still awaiting comments on the approach taken.
          Hide
          Knut Anders Hatlen added a comment -

          The 1b patch conflicts with the latest changes in DataSourceTest (testPooledReuseOnClose() has been moved to J2EEDataSourceTest) and doesn't apply cleanly.

          Show
          Knut Anders Hatlen added a comment - The 1b patch conflicts with the latest changes in DataSourceTest (testPooledReuseOnClose() has been moved to J2EEDataSourceTest) and doesn't apply cleanly.
          Hide
          Kristian Waagan added a comment -

          suites.All and derbyall ran successfully with Sun JDK 1.4.2, 1.5 and 1.6.

          Show
          Kristian Waagan added a comment - suites.All and derbyall ran successfully with Sun JDK 1.4.2, 1.5 and 1.6.
          Hide
          Kristian Waagan added a comment -

          'derby-3379-1b-enforce_no_chaining.diff' deprecates version 1a, which due to a last minute change broke quite a few tests...

          I'm still asking for comments on the approach, as there are other ways to solve it as well.
          I'm rerunning all tests, and expect these to be done by the end of the day.

          Show
          Kristian Waagan added a comment - 'derby-3379-1b-enforce_no_chaining.diff' deprecates version 1a, which due to a last minute change broke quite a few tests... I'm still asking for comments on the approach, as there are other ways to solve it as well. I'm rerunning all tests, and expect these to be done by the end of the day.
          Hide
          Kristian Waagan added a comment -

          Forgot to tick off the ASF license grant...

          Show
          Kristian Waagan added a comment - Forgot to tick off the ASF license grant...
          Hide
          Kristian Waagan added a comment -

          'derby-3379-1a-enforce_no_chaining.diff' is an attempt to fix both the NPE and the bug hiding underneath.

          The basic strategy is to make sure the last DSS has the chaining bits set to no chaining when endWriteChain is called and the data is flushed (sent over the wire).

          I am not sure if this is the best approach, but it fixes the bug. I have run suites.All successfully with an earlier version of the patch, and I'm now running both derbyall and suites.All with this version to confirm that nothing breaks.

          Patch is ready for review, but not for commit. There is an unnecessary method rename in there that will go into a separate commit.

          Show
          Kristian Waagan added a comment - 'derby-3379-1a-enforce_no_chaining.diff' is an attempt to fix both the NPE and the bug hiding underneath. The basic strategy is to make sure the last DSS has the chaining bits set to no chaining when endWriteChain is called and the data is flushed (sent over the wire). I am not sure if this is the best approach, but it fixes the bug. I have run suites.All successfully with an earlier version of the patch, and I'm now running both derbyall and suites.All with this version to confirm that nothing breaks. Patch is ready for review, but not for commit. There is an unnecessary method rename in there that will go into a separate commit.
          Hide
          Kristian Waagan added a comment -

          I had a look at this one.

          After some DRDA debugging and various ad-hoc tests, I think I have a basic idea of what is happening.
          Note that there are several problems with the current implementation.

          First of all, the "No current connection" exception is thrown in the client because the client side state of the connection is not maintained/handled properly. The variable 'open_' in the connection is false (but 'availableForReuse' is true) and the exception is raised when you try to create a new logical connection.

          Secondly, the client driver has the same bug as the embedded driver regarding nulling out the connection pointers.

          Finally, when getting past the obstacles above, the test causes Derby to hang.
          I found out that the reason is that the chaining bits indicate there is more to come from the client, so the server doesn't send the reply.
          This is in fact wrong, and both sides end up waiting for each other.

          After some more testing and code review, I found out that opening a ResultSet before closing the statement in the test makes the hang go away. This is because the client has to send a command to the server asking it to close the ResultSet, and this time the chaining bit is apparently set correctly.

          I'm a bit unsure how to proceed with fixing this bug. I'm assuming the reset is required, which indicates we somehow have to make sure the chaining bit of the deferred reset is set correctly. The most obvious fix now I can see right away, is to make sure we end the chain if there are no other commands going to the server.
          For instance, can we force the last chaining bit to indicate no chaining in Agent.endWriteChain, or will this break something?
          (transmission of large continued objects, bigger than 32K?)
          Maybe we have to take the continuation bits into account as well?

          If someone wants to have a closer look, the reset commands are constructed as part of the ClientPooledConnection.close method, and the commands are pushed onto the wire as part of Statement.close in the test (DataSourceTest.testPooledReuseOnClose).

          I haven't worked much with the client code, so some advice would be helpful.

          Show
          Kristian Waagan added a comment - I had a look at this one. After some DRDA debugging and various ad-hoc tests, I think I have a basic idea of what is happening. Note that there are several problems with the current implementation. First of all, the "No current connection" exception is thrown in the client because the client side state of the connection is not maintained/handled properly. The variable 'open_' in the connection is false (but 'availableForReuse' is true) and the exception is raised when you try to create a new logical connection. Secondly, the client driver has the same bug as the embedded driver regarding nulling out the connection pointers. Finally, when getting past the obstacles above, the test causes Derby to hang. I found out that the reason is that the chaining bits indicate there is more to come from the client, so the server doesn't send the reply. This is in fact wrong, and both sides end up waiting for each other. After some more testing and code review, I found out that opening a ResultSet before closing the statement in the test makes the hang go away. This is because the client has to send a command to the server asking it to close the ResultSet, and this time the chaining bit is apparently set correctly. I'm a bit unsure how to proceed with fixing this bug. I'm assuming the reset is required, which indicates we somehow have to make sure the chaining bit of the deferred reset is set correctly. The most obvious fix now I can see right away, is to make sure we end the chain if there are no other commands going to the server. For instance, can we force the last chaining bit to indicate no chaining in Agent.endWriteChain, or will this break something? (transmission of large continued objects, bigger than 32K?) Maybe we have to take the continuation bits into account as well? If someone wants to have a closer look, the reset commands are constructed as part of the ClientPooledConnection.close method, and the commands are pushed onto the wire as part of Statement.close in the test (DataSourceTest.testPooledReuseOnClose). I haven't worked much with the client code, so some advice would be helpful.

            People

            • Assignee:
              Kristian Waagan
              Reporter:
              Kathey Marsden
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development