Derby
  1. Derby
  2. DERBY-666

Enhance derby.locks.deadlockTrace to print stack traces for all threads involved in a deadlock

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.1.1.0
    • Fix Version/s: None
    • Component/s: Store
    • Labels:
      None
    • Issue & fix info:
      Newcomer

      Description

      I was reading http://www.linux-mag.com/content/view/2134/ (good article, btw!), and it says:

      > The next two properties are needed to diagnose concurrency (locking and deadlock) problems.
      >
      > *derby.locks.monitor=true logs all deadlocks that occur in the system.
      > *derby.locks.deadlockTrace=true log a stack trace of all threads involved in lock-related rollbacks.

      It seems, that, in my environment, the deadlockTrace property does not log a stack trace of all threads involved in the deadlock.

      Instead, it only logs a stack trace of the victim thread involved in the deadlock.

      I think it would be very useful if the derby.locks.deadlockTrace setting could in fact log a stack trace of all involved threads.

      In a posting to derby-dev, Mike Matrigali noted that an earlier implementation of a similar feature had to be removed because it was too expensive in both time and space, but he suggested that there might be several possible ways to implement this in an acceptably efficient manner:

      > A long time ago there use to be room in each lock to point at a
      > stack trace for each lock, but that was removed to optimize the size
      > of the lock data structure which can have many objects outstanding.
      > And creating and storing the stack for every lock was incredibly slow
      > and just was not very useful for any very active application. I think
      > I was the only one who ever used it.
      >
      > The plan was sometime to add a per user data structure which could be
      > filled in when it was about to wait on a lock, which would give most of what is interesting in a deadlock.
      >
      > The current deadlockTrace is meant to dump the lock table out to derby.log when a deadlock is encountered.
      >
      > I agree getting a dump of all stack traces would be very useful, and
      > with the later jvm debug interfaces may now be possible - in earlier
      > JVM's there weren't any java interfaces to do so. Does anyone have
      > the code to donate to dump all thread stacks to a buffer?

      Mike also suggested a manual technique as a workaround; it would be useful to put this into the documentation somewhere, perhaps on the page which documents derby.locks.deadlockTrace? Here's Mike's suggestion:

      > What I do if I can reproduce easily is set try to catch the wait by
      > hand and then depending on the environment either send the magic
      > signal or hit ctrl-break in the server window which will send the JVM
      > specific thread dumps to derby.log.

      The magic signal, btw, is 'kill -QUIT', at least with Sun JVMs in my experience.

      1. experimental_jdk_1_5.diff
        1 kB
        Bryan Pendleton
      2. repro.java
        3 kB
        Bryan Pendleton

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment -

          I think that using reflection we could have the stack traces print for JVM's > 1.5 like we do for the stress.multi test.

          Show
          Kathey Marsden added a comment - I think that using reflection we could have the stack traces print for JVM's > 1.5 like we do for the stress.multi test.
          Hide
          Bryan Pendleton added a comment -

          Attached is experimental_jdk_1_5.diff, which may be useful to others.

          THIS PATCH IS NOT FOR COMMIT.

          This patch contains JDK 1.5-specific code, copied almost verbatim
          from the comments above, to use Thread.getAllStackTraces() to include
          complete JVM stack trace information in the deadlock exception.

          To build and run with this code, here's what I did:
          1) Built a vanilla copy of the trunk using the standard (JDK 1.4-based) build environment
          2) Applied the patch to create a modified Deadlock.java
          3) Compiled just Deadlock.java using JDK 1.5 using the command:

          javac -d classes -classpath classes -sourcepath java/engine -g java/engine/org/apache/derby/impl/services/locks/Deadlock.java

          4) Then, if you are running using JAR files, you can run 'ant buildjars' to build the jars.
          5) When you run your Derby application, you need to run with JDK 1.5, since Deadlock.class has been build with JDK 1.5 and requires JDK 1.5.

          By using the attached Repro.java program, I verified that the deadlock exception which is provoked by the test program does indeed contain the stack traces of all the threads in the JVM at the instant that the deadlock is detected.

          Running the test program, first, is a good way to verify that you've built the patch correctly and have your JDK 1.5 environment operational. When you run the test program you should get a deadlock exception and the exception text should contain a set of stack traces that makes sense (i.e., they show the Thread1 and Thread0 from the test program involved in the deadlock).

          Assuming that you are running in an Embedded environment, so that the stack traces will be meaningful, and can run with JDK 1.5, in order to use this code, this should be a useful tool for diagnosing application deadlocks.

          I'd be glad to help explain this process further, if these steps aren't clear.

          Show
          Bryan Pendleton added a comment - Attached is experimental_jdk_1_5.diff, which may be useful to others. THIS PATCH IS NOT FOR COMMIT. This patch contains JDK 1.5-specific code, copied almost verbatim from the comments above, to use Thread.getAllStackTraces() to include complete JVM stack trace information in the deadlock exception. To build and run with this code, here's what I did: 1) Built a vanilla copy of the trunk using the standard (JDK 1.4-based) build environment 2) Applied the patch to create a modified Deadlock.java 3) Compiled just Deadlock.java using JDK 1.5 using the command: javac -d classes -classpath classes -sourcepath java/engine -g java/engine/org/apache/derby/impl/services/locks/Deadlock.java 4) Then, if you are running using JAR files, you can run 'ant buildjars' to build the jars. 5) When you run your Derby application, you need to run with JDK 1.5, since Deadlock.class has been build with JDK 1.5 and requires JDK 1.5. By using the attached Repro.java program, I verified that the deadlock exception which is provoked by the test program does indeed contain the stack traces of all the threads in the JVM at the instant that the deadlock is detected. Running the test program, first, is a good way to verify that you've built the patch correctly and have your JDK 1.5 environment operational. When you run the test program you should get a deadlock exception and the exception text should contain a set of stack traces that makes sense (i.e., they show the Thread1 and Thread0 from the test program involved in the deadlock). Assuming that you are running in an Embedded environment, so that the stack traces will be meaningful, and can run with JDK 1.5, in order to use this code, this should be a useful tool for diagnosing application deadlocks. I'd be glad to help explain this process further, if these steps aren't clear.
          Hide
          Bryan Pendleton added a comment -

          Kurt, I'm sorry, I no longer have the experimental changes that I was working with in 2005. As I recall, I pretty much took the code from Mike's 11-Nov-2005 comment above, and pasted it into Deadlock.java, as he suggested, and it worked just fine, so I should think you'd be able to build a custom version of the Derby code for yourself with such a change without too much work. Give it a try, and if you run into trouble, I'm sure the community will be glad to help.

          Show
          Bryan Pendleton added a comment - Kurt, I'm sorry, I no longer have the experimental changes that I was working with in 2005. As I recall, I pretty much took the code from Mike's 11-Nov-2005 comment above, and pasted it into Deadlock.java, as he suggested, and it worked just fine, so I should think you'd be able to build a custom version of the Derby code for yourself with such a change without too much work. Give it a try, and if you run into trouble, I'm sure the community will be glad to help.
          Hide
          Kurt Huwig added a comment -

          Bryan, I am using embedded Derby and have this issue. Could you post what you did, please?

          Show
          Kurt Huwig added a comment - Bryan, I am using embedded Derby and have this issue. Could you post what you did, please?
          Hide
          Bryan Pendleton added a comment -

          I'm not actively working on this. There were two problems with the solution I was pursuing:
          1) It required JDK 1.5 features
          2) It didn't work in the Network Server case, only in the embedded case

          Mike suggested some other implementations that might have resolved these problems, but I'm not currently able to spend enough time to pursue these solutions, so I'm unassigning myself from the issue in the hopes that somebody else might be interested.

          If, after a while, nobody else appears to be interested, I'll close this as Wont-Fix.

          Show
          Bryan Pendleton added a comment - I'm not actively working on this. There were two problems with the solution I was pursuing: 1) It required JDK 1.5 features 2) It didn't work in the Network Server case, only in the embedded case Mike suggested some other implementations that might have resolved these problems, but I'm not currently able to spend enough time to pursue these solutions, so I'm unassigning myself from the issue in the hopes that somebody else might be interested. If, after a while, nobody else appears to be interested, I'll close this as Wont-Fix.
          Hide
          Bryan Pendleton added a comment -

          The strange thing is, when I run the program in Embedded mode, and I get the lock timeout, the program is sensitive to the value of derby.locks.deadlockTimeout! If I set derby.locks.deadlockTimeout to 5, the program gives the "A lock could not be obtained within the time requested." error after 5 seconds, rather than after the standard 20 seconds. So it seems that the Embedded configuration is detecting the problem as a deadlock, fundamentally; it's just reporting it back out to my program as a lock timeout rather than as a deadlock.

          I think I'd better stop now. I've succeeded in getting myself thoroughly confused

          Show
          Bryan Pendleton added a comment - The strange thing is, when I run the program in Embedded mode, and I get the lock timeout, the program is sensitive to the value of derby.locks.deadlockTimeout! If I set derby.locks.deadlockTimeout to 5, the program gives the "A lock could not be obtained within the time requested." error after 5 seconds, rather than after the standard 20 seconds. So it seems that the Embedded configuration is detecting the problem as a deadlock, fundamentally; it's just reporting it back out to my program as a lock timeout rather than as a deadlock. I think I'd better stop now. I've succeeded in getting myself thoroughly confused
          Hide
          Bryan Pendleton added a comment -

          Simple (I think) program to reproduce a deadlock. Except, it seems to only work for me in client/server mode. In embedded mode, I always get a lock timeout instead.

          Show
          Bryan Pendleton added a comment - Simple (I think) program to reproduce a deadlock. Except, it seems to only work for me in client/server mode. In embedded mode, I always get a lock timeout instead.
          Hide
          Bryan Pendleton added a comment -

          Hmmm.. I have no trouble writing a program which causes a deadlock to be detected when running in a Client/Server configuration, but when I run in an Embedded configuration I simply cannot cause a deadlock to be detected. I always get a lock timeout instead. I get a lock timeout in Embedded mode even when I set derby.locks.waitTimeout to -1.

          I'll attach the program I'm using to try to force a deadlock in Embedded mode; maybe somebody will spot my bug.

          Show
          Bryan Pendleton added a comment - Hmmm.. I have no trouble writing a program which causes a deadlock to be detected when running in a Client/Server configuration, but when I run in an Embedded configuration I simply cannot cause a deadlock to be detected. I always get a lock timeout instead. I get a lock timeout in Embedded mode even when I set derby.locks.waitTimeout to -1. I'll attach the program I'm using to try to force a deadlock in Embedded mode; maybe somebody will spot my bug.
          Hide
          Bryan Pendleton added a comment -

          Argh. I hate getting old. Anyway, after following about 50 dead-end ratholes involving tweaking the build scripts, I finally took Mike's advice and just hand-compiled the one file with the suggested code to call Thread.getAllStackTraces, and built a system that I could play with.

          The good news is that this worked, and I was able to run this code, and sure enough it produces the stack traces!

          The bad news is that, in doing the experiment, I realized that in my environment this feature is of no use, for I am running in a Network Server environment, and so the result is that I get the stack traces of the server, and what I want are the stack traces of my various client threads.

          Duh. Should have been obvious, but, as I say, I'm getting old.

          I'm going to try writing a simple JDBC embedded app to verify that this does actually work correctly in the embedded case, and that it gets useful stack traces in that case, and then I'll at least have done something useful, even if I now realize that I can't use this feature for the case that I really cared about (my production system, which runs in NetworkServer mode).

          Thanks to everybody for their help so far.

          Show
          Bryan Pendleton added a comment - Argh. I hate getting old. Anyway, after following about 50 dead-end ratholes involving tweaking the build scripts, I finally took Mike's advice and just hand-compiled the one file with the suggested code to call Thread.getAllStackTraces, and built a system that I could play with. The good news is that this worked, and I was able to run this code, and sure enough it produces the stack traces! The bad news is that, in doing the experiment, I realized that in my environment this feature is of no use, for I am running in a Network Server environment, and so the result is that I get the stack traces of the server , and what I want are the stack traces of my various client threads. Duh. Should have been obvious, but, as I say, I'm getting old. I'm going to try writing a simple JDBC embedded app to verify that this does actually work correctly in the embedded case, and that it gets useful stack traces in that case, and then I'll at least have done something useful, even if I now realize that I can't use this feature for the case that I really cared about (my production system, which runs in NetworkServer mode). Thanks to everybody for their help so far.
          Hide
          Bryan Pendleton added a comment -

          Thanks Mike – I'm thinking about giving this a try.

          How do I write some JDK 1.5 dependent code such that Derby can still build and run successfully in JDK 1.3 and 1.4 environments? Do I have to use Reflection to do this? Is there some better technique used elsewhere in Derby?

          Also, does adding JDK 1.5 dependent code into Derby mean that I'd need to adjust the build and test instructions and Ant scripts so that developers could build and test Derby with JDK 1.5?

          Show
          Bryan Pendleton added a comment - Thanks Mike – I'm thinking about giving this a try. How do I write some JDK 1.5 dependent code such that Derby can still build and run successfully in JDK 1.3 and 1.4 environments? Do I have to use Reflection to do this? Is there some better technique used elsewhere in Derby? Also, does adding JDK 1.5 dependent code into Derby mean that I'd need to adjust the build and test instructions and Ant scripts so that developers could build and test Derby with JDK 1.5?
          Hide
          Mike Matrigali added a comment -

          Here are some pointers for anyone interested in picking up this project. It seems like a good 1st project for someone interested in doing derby development:

          1) use the interfaces as described by Francois's comments above.
          2) The code must be within a securiity block, I believe Dan will be providing some more info on this area soon.
          3) See posting by Dan on list under subject "Re: is getAllStackTraces() something we are allowed to call from the server given the recent SecurityManager changes?"
          4) I would sugest coding a try/catch block around the security exception and not fail the operation if we get a security exception.
          In this case add the extra information to the lock error message if you can, otherwise let the system continue. You might
          print a message saying stack traces not available due to not enough permission granted by security manager.
          5) The code wants to go in opensource/java/engine/org/apache/derby/impl/services/locks/Deadlock.java, in the buildException() routine.

          Lars posted some code, it would need to get the traces into a string buffer:
          The following ought to do it:

          public static void dumpTraces(PrintStream out) {
          Map<Thread,StackTraceElement[]> threadtraces =
          Thread.getAllStackTraces();
          for (Map.Entry<Thread,StackTraceElement[]> threadtrace
          : threadtraces.entrySet()) {
          out.println(threadtrace.getKey().getName());
          StackTraceElement[] traces = threadtrace.getValue();
          for (StackTraceElement trace : traces)

          { out.println(trace); }

          }
          }

          The locktable could possibly remember which thread corresponds to each
          transaction and just dump those.

          Mapping threads to transactions is hard as it is possible for the thread of a transaction to change with every
          jdbc interaction if a connection pool is involved. The interesting stack traces are the ones that are waiting and those transactions could note their thread id before waiting.

          Show
          Mike Matrigali added a comment - Here are some pointers for anyone interested in picking up this project. It seems like a good 1st project for someone interested in doing derby development: 1) use the interfaces as described by Francois's comments above. 2) The code must be within a securiity block, I believe Dan will be providing some more info on this area soon. 3) See posting by Dan on list under subject "Re: is getAllStackTraces() something we are allowed to call from the server given the recent SecurityManager changes?" 4) I would sugest coding a try/catch block around the security exception and not fail the operation if we get a security exception. In this case add the extra information to the lock error message if you can, otherwise let the system continue. You might print a message saying stack traces not available due to not enough permission granted by security manager. 5) The code wants to go in opensource/java/engine/org/apache/derby/impl/services/locks/Deadlock.java, in the buildException() routine. Lars posted some code, it would need to get the traces into a string buffer: The following ought to do it: public static void dumpTraces(PrintStream out) { Map<Thread,StackTraceElement[]> threadtraces = Thread.getAllStackTraces(); for (Map.Entry<Thread,StackTraceElement[]> threadtrace : threadtraces.entrySet()) { out.println(threadtrace.getKey().getName()); StackTraceElement[] traces = threadtrace.getValue(); for (StackTraceElement trace : traces) { out.println(trace); } } } The locktable could possibly remember which thread corresponds to each transaction and just dump those. Mapping threads to transactions is hard as it is possible for the thread of a transaction to change with every jdbc interaction if a connection pool is involved. The interesting stack traces are the ones that are waiting and those transactions could note their thread id before waiting.
          Hide
          Francois Orsini added a comment -

          The new J2SE 5.0 has some new API to dump individual or all threads' stracktrace running in a JVM - There is a new notion of StackTraceElement object which represent a stack frame and can be output'ed as a String....

          So you can get all frames of a particular thread's stack dump as well as for all threads in the JVM.

          http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html#getStackTrace()
          http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html#getAllStackTraces()

          http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StackTraceElement.html

          Thread(s) stack dumps can also be performed on the command line using 'jstack' (1.5) utility to dump all the JVM's thread stack traces given a JVM pid.
          http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstack.html

          fyi.

          Show
          Francois Orsini added a comment - The new J2SE 5.0 has some new API to dump individual or all threads' stracktrace running in a JVM - There is a new notion of StackTraceElement object which represent a stack frame and can be output'ed as a String.... So you can get all frames of a particular thread's stack dump as well as for all threads in the JVM. http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html#getStackTrace( ) http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html#getAllStackTraces( ) http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StackTraceElement.html Thread(s) stack dumps can also be performed on the command line using 'jstack' (1.5) utility to dump all the JVM's thread stack traces given a JVM pid. http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstack.html fyi.

            People

            • Assignee:
              Unassigned
              Reporter:
              Bryan Pendleton
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development