Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.1
    • Fix Version/s: 1.1.2
    • Labels:
      None
    • Environment:

      Java 1.5, Windows

      Description

      Reached a deadlock inside common-logging while concurrently re-deploying 2 WARs.
      In each WAR there is an attempt to get a logger:

      private final Log logger = LogFactory.getLog(ContextLoader.class);

      Thread dump:

      [deadlocked thread] Thread-96:
      -----------------------------
      Thread 'Thread-96' is waiting to acquire lock 'java.lang.ref.ReferenceQueue@5266e0' that is held by thread 'Thread-102'
      Stack trace:
      ------------
      org.apache.commons.logging.impl.WeakHashtable.purge(WeakHashtable.java:323)
      org.apache.commons.logging.impl.WeakHashtable.rehash(WeakHashtable.java:312)
      java.util.Hashtable.put(Hashtable.java:414)
      org.apache.commons.logging.impl.WeakHashtable.put(WeakHashtable.java:242)
      org.apache.commons.logging.LogFactory.cacheFactory(LogFactory.java:1004)
      org.apache.commons.logging.LogFactory.getFactory(LogFactory.java:657)
      org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
      org.springframework.web.context.ContextLoader.<init>(ContextLoader.java:145)

      [deadlocked thread] Thread-102:
      ------------------------------
      Thread 'Thread-102' is waiting to acquire lock 'org.apache.commons.logging.impl.
      WeakHashtable@1e02138' that is held by thread 'Thread-96'
      Stack trace:
      ------------
      java.util.Hashtable.remove(Hashtable.java:437)
      org.apache.commons.logging.impl.WeakHashtable.purgeOne(WeakHashtable.java:338)
      org.apache.commons.logging.impl.WeakHashtable.put(WeakHashtable.java:238)
      org.apache.commons.logging.LogFactory.cacheFactory(LogFactory.java:1004)
      org.apache.commons.logging.LogFactory.getFactory(LogFactory.java:657)
      org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
      org.springframework.web.context.ContextLoader.<init>(ContextLoader.java:145)

      1. BugDeadlock.java
        0.8 kB
        Philippe Mouawad
      2. Patch-WeakHashtable-1.1.1.txt
        2 kB
        Philippe Mouawad

        Issue Links

          Activity

          Hide
          Jesse Glick added a comment -

          1.1.2 seems to be released, thanks. Will be in Wagon: https://jira.codehaus.org/browse/WAGON-394

          Show
          Jesse Glick added a comment - 1.1.2 seems to be released, thanks. Will be in Wagon: https://jira.codehaus.org/browse/WAGON-394
          Hide
          Thomas Neidhart added a comment -

          Well, for 1.1.2 it does not make much of a difference if we change the target. We have to stick to Hashtables for now to keep binary compatibility. The problem here should be solved, and for future versions (1.2?, 2.0) we will surely use existing stuff like WeakHashMap.

          Show
          Thomas Neidhart added a comment - Well, for 1.1.2 it does not make much of a difference if we change the target. We have to stick to Hashtables for now to keep binary compatibility. The problem here should be solved, and for future versions (1.2?, 2.0) we will surely use existing stuff like WeakHashMap.
          Hide
          Gary Gregory added a comment -

          and it's February 2013!

          I think we can bump the target to 1.4 without controversy.

          The super conservative route would be to

          1. release 1.1.2 as is, no platform changes.
          2. update the platform requirement to whatever we agree on for 1.2, Java 1.4, 5, 6, or 7.
          Show
          Gary Gregory added a comment - and it's February 2013! I think we can bump the target to 1.4 without controversy. The super conservative route would be to release 1.1.2 as is, no platform changes. update the platform requirement to whatever we agree on for 1.2, Java 1.4, 5, 6, or 7.
          Hide
          Olivier Lamy (*$^¨%`£) added a comment -

          jdk 1.1 target ? really in 2013 ?

          Show
          Olivier Lamy (*$^¨%`£) added a comment - jdk 1.1 target ? really in 2013 ?
          Hide
          Thomas Neidhart added a comment - - edited

          commons-logging was created and still targets for JDK 1.1. Map and WeakHashMap did not yet exist at the time, so there was a need for an own implementation of a WeakHashtable.

          It was decided to keep the compatibility settings as is for the 1.1.2 bugfix release, especially to keep binary compatibility.
          Changing the internal factories to a Map would break compatibility.

          What will happen after 1.1.2 is unclear. Either commons-logging will die, or we will modernize it and release a non binary compatible version 2.

          Show
          Thomas Neidhart added a comment - - edited commons-logging was created and still targets for JDK 1.1. Map and WeakHashMap did not yet exist at the time, so there was a need for an own implementation of a WeakHashtable. It was decided to keep the compatibility settings as is for the 1.1.2 bugfix release, especially to keep binary compatibility. Changing the internal factories to a Map would break compatibility. What will happen after 1.1.2 is unclear. Either commons-logging will die, or we will modernize it and release a non binary compatible version 2.
          Hide
          Jesse Glick added a comment -

          FYI, affects Jenkins with embedded Maven: https://issues.jenkins-ci.org/browse/JENKINS-15846; might affect command-line Maven but the trigger conditions may not be present.

          Two points related to this bug:

          • Why does LogFactory require the value of its ….LogFactory.HashtableImpl property to be a Hashtable rather than simply a Map?
          • If the answer to the first question is “no good reason”, why do you not simply use java.util.WeakHashMap (possibly with a synchronization wrapper) rather than writing your own WeakHashtable?
          Show
          Jesse Glick added a comment - FYI, affects Jenkins with embedded Maven: https://issues.jenkins-ci.org/browse/JENKINS-15846 ; might affect command-line Maven but the trigger conditions may not be present. Two points related to this bug: Why does LogFactory require the value of its ….LogFactory.HashtableImpl property to be a Hashtable rather than simply a Map ? If the answer to the first question is “no good reason”, why do you not simply use java.util.WeakHashMap (possibly with a synchronization wrapper) rather than writing your own WeakHashtable ?
          Hide
          Thomas Neidhart added a comment -

          just fyi: the release is in the pipeline for the coming days / weeks, but we have to first sort out our site publish procedure (has changed the last weeks).

          Please discuss / follow-up such requests on the dev mailinglist, as this will reach a bigger audience.

          Show
          Thomas Neidhart added a comment - just fyi: the release is in the pipeline for the coming days / weeks, but we have to first sort out our site publish procedure (has changed the last weeks). Please discuss / follow-up such requests on the dev mailinglist, as this will reach a bigger audience.
          Hide
          Philippe Mouawad added a comment -

          +1

          Show
          Philippe Mouawad added a comment - +1
          Hide
          Jesse Glick added a comment -

          Since this is a serious bug, and it has apparently been over five years since the last release, what about releasing 1.1.2-beta-1 or similar?

          Show
          Jesse Glick added a comment - Since this is a serious bug, and it has apparently been over five years since the last release, what about releasing 1.1.2-beta-1 or similar?
          Hide
          Thomas Neidhart added a comment -

          Applied a modified patch in r1435077.

          The following changes have been made:

          • synchronize the put and remove methods to prevent issues with changeCount and the actual bug wrt purge / purgeOne. The call to super.put() will be synchronized anyway, so there should be no performance hit. This already solves the bug.
          • apply the idea from the patch to the purge method, as it is also used in non-synchronized methods like values(), elements(). These methods are not actually used in commons-logging, but this has been added to prevent any future problems in case this changes. The method purgeOne has been kept as is, as it is only used in the now synchronized put and remove methods.
          Show
          Thomas Neidhart added a comment - Applied a modified patch in r1435077. The following changes have been made: synchronize the put and remove methods to prevent issues with changeCount and the actual bug wrt purge / purgeOne. The call to super.put() will be synchronized anyway, so there should be no performance hit. This already solves the bug. apply the idea from the patch to the purge method, as it is also used in non-synchronized methods like values(), elements(). These methods are not actually used in commons-logging, but this has been added to prevent any future problems in case this changes. The method purgeOne has been kept as is, as it is only used in the now synchronized put and remove methods.
          Hide
          Thomas Neidhart added a comment -

          Looking at the patch, the original deadlock problem should be fixed by it, but there are other flaws in WeakHashtable that should be addressed:

          • put and remove have the following code snippet:
                  // for performance reasons, only purge every
                  // MAX_CHANGES_BEFORE_PURGE times
                  if (changeCount++ > MAX_CHANGES_BEFORE_PURGE) {
                      purge();
                      changeCount = 0;
                  }
                  // do a partial purge more often
                  else if (changeCount % PARTIAL_PURGE_COUNT == 0) {
                      purgeOne();
                  }
          

          which is not-synchronized, thus the changeCount check may be successful for two concurrent threads at the same time, calling purge / purgeOne multiple times.

          • the underlying Hashtable is synchronized, while none of the overriden methods are
          Show
          Thomas Neidhart added a comment - Looking at the patch, the original deadlock problem should be fixed by it, but there are other flaws in WeakHashtable that should be addressed: put and remove have the following code snippet: // for performance reasons, only purge every // MAX_CHANGES_BEFORE_PURGE times if (changeCount++ > MAX_CHANGES_BEFORE_PURGE) { purge(); changeCount = 0; } // do a partial purge more often else if (changeCount % PARTIAL_PURGE_COUNT == 0) { purgeOne(); } which is not-synchronized, thus the changeCount check may be successful for two concurrent threads at the same time, calling purge / purgeOne multiple times. the underlying Hashtable is synchronized, while none of the overriden methods are
          Hide
          radai rosenblatt added a comment - - edited

          i've managed to reproduce the issue on my machine (windows 7 x64 java 6u33 x64) with the trunk version of commons-logging (checked out of svn).
          i can also confirm the attached patch resolves the issue.
          i did have to add a few zeros to the thread count and per-thread try count in the test-case to reproduce reliably though.

          Show
          radai rosenblatt added a comment - - edited i've managed to reproduce the issue on my machine (windows 7 x64 java 6u33 x64) with the trunk version of commons-logging (checked out of svn). i can also confirm the attached patch resolves the issue. i did have to add a few zeros to the thread count and per-thread try count in the test-case to reproduce reliably though.
          Hide
          Sebb added a comment -

          Also fails on Gump.

          Show
          Sebb added a comment - Also fails on Gump.
          Hide
          Sebb added a comment -

          An alternative fix would be to drop the direct synch on queue, and synchronise the methods that call it.

          Eclipse complains about many of the methods, for example:

          The method WeakHashtable.get(Object) is overriding a synchronized method without being synchronized

          I don't think there are any issues here, but most of the methods don't do much except delegate to the super method, at which point the synch. lock will be acquired anyway. So the additional synch. code would be minimal.

          Fixing the Eclipse warnings leaves the following methods that call purge without being synchronized:

          entrySet()
          keySet()
          values()
          rehash()

          Synchronising these methods seems unlikely to adversely affect performance compared with synchronising just the queue.
          There is not much additional code included in the synch. block.

          In the case of rehash(), it is only called by Hashtable#put() which is synch. anyway.

          ==

          It's not clear to me that the existing patch is 100% thread-safe.
          There is a window between removing the item from the reference queue (poll) and removing the referenced item from the hash table.
          If another thread modifies the hashtable between the poll and the remove, can that cause any problems?
          Also, not all references to the queue field are protected by the same lock; the put() method references the queue without first aquiring the queue lock. Potentially this can cause a data publication issue due to the Java Memory Model.

          Show
          Sebb added a comment - An alternative fix would be to drop the direct synch on queue, and synchronise the methods that call it. Eclipse complains about many of the methods, for example: The method WeakHashtable.get(Object) is overriding a synchronized method without being synchronized I don't think there are any issues here, but most of the methods don't do much except delegate to the super method, at which point the synch. lock will be acquired anyway. So the additional synch. code would be minimal. Fixing the Eclipse warnings leaves the following methods that call purge without being synchronized: entrySet() keySet() values() rehash() Synchronising these methods seems unlikely to adversely affect performance compared with synchronising just the queue. There is not much additional code included in the synch. block. In the case of rehash(), it is only called by Hashtable#put() which is synch. anyway. == It's not clear to me that the existing patch is 100% thread-safe. There is a window between removing the item from the reference queue (poll) and removing the referenced item from the hash table. If another thread modifies the hashtable between the poll and the remove, can that cause any problems? Also, not all references to the queue field are protected by the same lock; the put() method references the queue without first aquiring the queue lock. Potentially this can cause a data publication issue due to the Java Memory Model.
          Hide
          Sebb added a comment -

          Fails on my WinXP system with Java 1.4/1.5/1.6/1.7.

          Also tried a test on Jenkins (ubuntu) that failed too.

          However, so far the Continuum builds don't seem to tigger the problem.

          It remains to be seen how Gump reacts.

          Show
          Sebb added a comment - Fails on my WinXP system with Java 1.4/1.5/1.6/1.7. Also tried a test on Jenkins (ubuntu) that failed too. However, so far the Continuum builds don't seem to tigger the problem. It remains to be seen how Gump reacts.
          Hide
          Sebb added a comment -

          @Stefan

          Thanks.

          I've been able to reproduce the error now as well using a slightly modified version of the test code and Java 1.4.2_19-b04.
          I'll tidy up my test code and add to SVN later.

          Show
          Sebb added a comment - @Stefan Thanks. I've been able to reproduce the error now as well using a slightly modified version of the test code and Java 1.4.2_19-b04. I'll tidy up my test code and add to SVN later.
          Hide
          Stefan Eissing added a comment -

          @Sebb I used the code attached to this ticket. Run it from the shell in an endless loop. Process deadlocked for me about once every 80 invocations. Host: WinXP on Parallels, Java VM 1.4.2_17-b06 (yeah, I know it's antique).

          Customer experienced the deadlock on his production system, running 1.4 VM on Solaris (don't know the exact version).

          Show
          Stefan Eissing added a comment - @Sebb I used the code attached to this ticket. Run it from the shell in an endless loop. Process deadlocked for me about once every 80 invocations. Host: WinXP on Parallels, Java VM 1.4.2_17-b06 (yeah, I know it's antique). Customer experienced the deadlock on his production system, running 1.4 VM on Solaris (don't know the exact version).
          Hide
          Sebb added a comment -

          @Stefan

          What code did you use to show the error?
          Which version of Java was it exactly?

          Show
          Sebb added a comment - @Stefan What code did you use to show the error? Which version of Java was it exactly?
          Hide
          Stefan Eissing added a comment -

          I can confirm that this issue happens together with a 1.4 java runtime (tested on WinXP). I am unable to reproduce this with a Java6 runtime (on WinXP and OS X).

          The attached patch fixes this issue for me.

          Show
          Stefan Eissing added a comment - I can confirm that this issue happens together with a 1.4 java runtime (tested on WinXP). I am unable to reproduce this with a Java6 runtime (on WinXP and OS X). The attached patch fixes this issue for me.
          Hide
          Anthony Orapallo added a comment -

          Has anyone been able to reliably reproduce the issue. Or get a patch working? We appear to be experiencing this issue as well and need a fix. will upgrading commons-logging.jar fix the issue?

          All threads get blocked waiting for the WeakHashTable object. The thread holding the object is this:
          "btpool0-193" prio=10 tid=0x00002aaabc7e7800 nid=0x468 waiting for monitor entry [0x0000000050a7d000]
          java.lang.Thread.State: BLOCKED (on object monitor)
          at org.apache.commons.logging.impl.WeakHashtable.purge(WeakHashtable.java:323)

          • waiting to lock <0x00000006e37d3a18> (a java.lang.ref.ReferenceQueue)
            at org.apache.commons.logging.impl.WeakHashtable.rehash(WeakHashtable.java:312)
            at java.util.Hashtable.put(Hashtable.java:412)
          • locked <0x00000006e37cf4d0> (a org.apache.commons.logging.impl.WeakHashtable)

          Note that it is waiting to lock a ReferenceQueue object. The ReferenceQueue object is held by:

          "btpool0-229" prio=10 tid=0x00002aaac0c69800 nid=0x6b07 waiting for monitor entry [0x0000000043fd2000]
          java.lang.Thread.State: BLOCKED (on object monitor)
          at java.util.Hashtable.remove(Hashtable.java:435)

          • waiting to lock <0x00000006e37cf4d0> (a org.apache.commons.logging.impl.WeakHashtable)
            at org.apache.commons.logging.impl.WeakHashtable.purgeOne(WeakHashtable.java:338)
          • locked <0x00000006e37d3a18> (a java.lang.ref.ReferenceQueue)

          Note that this one is waiting to lock the WeakHashTable object, so the thread lock will never break, and all the other btpool threads will back up behind it.

          Show
          Anthony Orapallo added a comment - Has anyone been able to reliably reproduce the issue. Or get a patch working? We appear to be experiencing this issue as well and need a fix. will upgrading commons-logging.jar fix the issue? All threads get blocked waiting for the WeakHashTable object. The thread holding the object is this: "btpool0-193" prio=10 tid=0x00002aaabc7e7800 nid=0x468 waiting for monitor entry [0x0000000050a7d000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.commons.logging.impl.WeakHashtable.purge(WeakHashtable.java:323) waiting to lock <0x00000006e37d3a18> (a java.lang.ref.ReferenceQueue) at org.apache.commons.logging.impl.WeakHashtable.rehash(WeakHashtable.java:312) at java.util.Hashtable.put(Hashtable.java:412) locked <0x00000006e37cf4d0> (a org.apache.commons.logging.impl.WeakHashtable) Note that it is waiting to lock a ReferenceQueue object. The ReferenceQueue object is held by: "btpool0-229" prio=10 tid=0x00002aaac0c69800 nid=0x6b07 waiting for monitor entry [0x0000000043fd2000] java.lang.Thread.State: BLOCKED (on object monitor) at java.util.Hashtable.remove(Hashtable.java:435) waiting to lock <0x00000006e37cf4d0> (a org.apache.commons.logging.impl.WeakHashtable) at org.apache.commons.logging.impl.WeakHashtable.purgeOne(WeakHashtable.java:338) locked <0x00000006e37d3a18> (a java.lang.ref.ReferenceQueue) Note that this one is waiting to lock the WeakHashTable object, so the thread lock will never break, and all the other btpool threads will back up behind it.
          Hide
          Ralph Goers added a comment -

          I also ran the test case on OSX 10.6 with Java 1.6 and can't don't get a deadlock. I also upped the number of threads and the loop count to try to make it happen but was still unable to reproduce it. Of course, that doesn't mean the bug isn't valid.

          Show
          Ralph Goers added a comment - I also ran the test case on OSX 10.6 with Java 1.6 and can't don't get a deadlock. I also upped the number of threads and the loop count to try to make it happen but was still unable to reproduce it. Of course, that doesn't mean the bug isn't valid.
          Hide
          Dennis Lundberg added a comment -

          I've tested the Test Case, but it runs without deadlock for me on Java 1.4, 1.5 and 1.6 on Windows.

          Show
          Dennis Lundberg added a comment - I've tested the Test Case, but it runs without deadlock for me on Java 1.4, 1.5 and 1.6 on Windows.
          Hide
          Philippe Mouawad added a comment -

          The bug is due to 2 different ways of taking locks :

          Purge or purgeOne:
          1) locks queue
          2) lock this through super.remove

          Put the calls super.put()
          1) lock this through super.put()
          2) call to rehash in super.put()
          3) rehash calls purge (which locks queue) => YOU GOT THE DEAD LOCK

          The solution is to call super#remove in purge and purgeOne outside of the synchronized (queue) lock.

          Show
          Philippe Mouawad added a comment - The bug is due to 2 different ways of taking locks : Purge or purgeOne: 1) locks queue 2) lock this through super.remove Put the calls super.put() 1) lock this through super.put() 2) call to rehash in super.put() 3) rehash calls purge (which locks queue) => YOU GOT THE DEAD LOCK The solution is to call super#remove in purge and purgeOne outside of the synchronized (queue) lock.
          Hide
          Philippe Mouawad added a comment -

          Here is a patch to the deadlock bug.
          Philippe.
          www.ubik-ingenierie.com

          Show
          Philippe Mouawad added a comment - Here is a patch to the deadlock bug. Philippe. www.ubik-ingenierie.com
          Hide
          Philippe Mouawad added a comment -

          Here is a Test case to reproduce it.
          I launch it on a JDK1.6 and it deadlocks each 6 to 10 run.

          Show
          Philippe Mouawad added a comment - Here is a Test case to reproduce it. I launch it on a JDK1.6 and it deadlocks each 6 to 10 run.

            People

            • Assignee:
              Unassigned
              Reporter:
              Nitzan Niv
            • Votes:
              3 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development