Tapestry
  1. Tapestry
  2. TAPESTRY-2473

Application locks up, with all threads waiting for read lock inside ConcurrentBarrier

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 5.0.11, 5.0.12, 5.0.13
    • Fix Version/s: 5.0.14
    • Component/s: Framework
    • Labels:
      None
    • Environment:
      Red Hat, Tomcat, java version "1.5.0_10"

      Description

      We are running in production, and pretty regularly the system seems to hang up. After I do a jstack it looks like all the threads are tied up against some lock.

      Can you help me figure out the root cause?

      I will be attaching the full jstack the the stuck threads all look like:

      Thread 25014: (state = BLOCKED)

      • sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
      • java.util.concurrent.locks.LockSupport.park() @bci=5, line=118 (Compiled frame)
      • java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1124 (Compiled frame)
      • org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame)
      • $RequestHandler_11aa7f96e6c.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response) @bci=10 (Compiled frame)
      • org.apache.tapestry.services.TapestryModule$12.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=40, line=924 (Compiled frame)
      • $HttpServletRequestHandler_11aa7f96e65.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10 (Compiled frame)
      • org.apache.tapestry.internal.services.IgnoredPathsFilter.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, org.apache.tapestry.services.HttpServletRequestHandler) @bci
        =95, line=62 (Compiled frame)
      • $HttpServletRequestFilter_11aa7f96e63.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, org.apache.tapestry.services.HttpServletRequestHandler) @bci=7 (Compiled frame)
      • $HttpServletRequestHandler_11aa7f96e65.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10 (Compiled frame)
      • $HttpServletRequestHandler_11aa7f96e65.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10 (Compiled frame)
      • org.apache.tapestry.TapestryFilter.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=12, line=168 (Compiled frame)
        ......
      1. js2.txt
        2.52 MB
        Fernando Padilla
      2. js.txt
        2.20 MB
        Fernando Padilla

        Activity

        Hide
        Fernando Padilla added a comment -

        a jstack when the server is stuck.

        Show
        Fernando Padilla added a comment - a jstack when the server is stuck.
        Hide
        Fernando Padilla added a comment -

        another stack trace that just happened

        Show
        Fernando Padilla added a comment - another stack trace that just happened
        Hide
        Howard M. Lewis Ship added a comment -

        What's odd here is that all the threads are blocked in withRead() (i.e., acquiring the shared read lock). That's very odd because I don't see any thread inside withWrite() (i.e., with the exclusive write lock). I remember there was a earlier hickup where Tapestry's use of ReentrantReadWriteLock tripped across a JDK bug.

        Show
        Howard M. Lewis Ship added a comment - What's odd here is that all the threads are blocked in withRead() (i.e., acquiring the shared read lock). That's very odd because I don't see any thread inside withWrite() (i.e., with the exclusive write lock). I remember there was a earlier hickup where Tapestry's use of ReentrantReadWriteLock tripped across a JDK bug.
        Hide
        Howard M. Lewis Ship added a comment -

        Another possibility is the ThreadLocal bug (in the JDK 1.5). May have to synchronize access to the ThreadLocal object inside ConcurrentBarrier.

        Show
        Howard M. Lewis Ship added a comment - Another possibility is the ThreadLocal bug (in the JDK 1.5). May have to synchronize access to the ThreadLocal object inside ConcurrentBarrier.
        Hide
        Howard M. Lewis Ship added a comment -

        The above trace omits a key detail:

        Thread 20716: (state = BLOCKED)

        • sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
        • java.util.concurrent.locks.LockSupport.park() @bci=5, line=118 (Compiled frame)
        • org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame)
        • org.apache.tapestry.internal.services.CheckForUpdatesFilter.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response, org.apache.tapestry.services.RequestHandler) @bci=30, line=106 (Compiled frame)
        • $RequestHandler_11aa3e523f4.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response) @bci=10 (Compiled frame)
        • $RequestHandler_11aa3e523ec.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response) @bci=6 (Compiled frame)
        • org.apache.tapestry.services.TapestryModule$12.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=40, line=924 (Compiled frame)

        The problem is certainly inside CheckForUpdatesFilter.

        Show
        Howard M. Lewis Ship added a comment - The above trace omits a key detail: Thread 20716: (state = BLOCKED) sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) java.util.concurrent.locks.LockSupport.park() @bci=5, line=118 (Compiled frame) org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame) org.apache.tapestry.internal.services.CheckForUpdatesFilter.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response, org.apache.tapestry.services.RequestHandler) @bci=30, line=106 (Compiled frame) $RequestHandler_11aa3e523f4.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response) @bci=10 (Compiled frame) $RequestHandler_11aa3e523ec.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response) @bci=6 (Compiled frame) org.apache.tapestry.services.TapestryModule$12.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=40, line=924 (Compiled frame) The problem is certainly inside CheckForUpdatesFilter.
        Hide
        Fernando Padilla added a comment -

        I did a simple grep/sort/count on the two files:

        It does look like they are also stuck on a few restoreReadLock, so that means that the writeLocks aren't releasing properly??

        hala:~> grep ConcurrentBarrier js.txt | sort | uniq -c
        2 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.restoreReadLock(boolean) @bci=13, line=158 (Compiled frame)
        354 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame)
        3 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=44, line=77 (Compiled frame)

        hala:~> grep ConcurrentBarrier js2.txt | sort | uniq -c
        5 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.restoreReadLock(boolean) @bci=13, line=158 (Compiled frame)
        393 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame)
        6 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=44, line=77 (Compiled frame)

        Show
        Fernando Padilla added a comment - I did a simple grep/sort/count on the two files: It does look like they are also stuck on a few restoreReadLock, so that means that the writeLocks aren't releasing properly?? hala:~> grep ConcurrentBarrier js.txt | sort | uniq -c 2 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.restoreReadLock(boolean) @bci=13, line=158 (Compiled frame) 354 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame) 3 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=44, line=77 (Compiled frame) hala:~> grep ConcurrentBarrier js2.txt | sort | uniq -c 5 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.restoreReadLock(boolean) @bci=13, line=158 (Compiled frame) 393 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame) 6 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=44, line=77 (Compiled frame)
        Hide
        Fernando Padilla added a comment -

        Yes, CheckForUpdatesFilter is partially responsible, but I think mostly because it is the only user of the tryWithRead function.

        At the moment the code in ConcurrentBarrier just looks too complicated for me.. I would propose for you to put some thought reviewing that code and see if you still want locking code that is so complicated ( interactions between read lock, write lock, read lock Flag and a read to write lock upgrade support ). I bet you've already put lots of thought on it and it's most likely correct, but it might just be brittle for long term support.

        here is some food for thought:

        Does the checking for updates really need to be done inline for every request (and upgrading a read lock to a write lock)? Could it be launched as a side thread with it's own locking, then only block requests while it's rebuilding caches or while it's reseting page caches...

        Show
        Fernando Padilla added a comment - Yes, CheckForUpdatesFilter is partially responsible, but I think mostly because it is the only user of the tryWithRead function. At the moment the code in ConcurrentBarrier just looks too complicated for me.. I would propose for you to put some thought reviewing that code and see if you still want locking code that is so complicated ( interactions between read lock, write lock, read lock Flag and a read to write lock upgrade support ). I bet you've already put lots of thought on it and it's most likely correct, but it might just be brittle for long term support. here is some food for thought: Does the checking for updates really need to be done inline for every request (and upgrading a read lock to a write lock)? Could it be launched as a side thread with it's own locking, then only block requests while it's rebuilding caches or while it's reseting page caches...
        Hide
        Fernando Padilla added a comment -

        but if synchronizing access to the threadLocal fixes it, and you're happy with that. then we're all set!!

        Show
        Fernando Padilla added a comment - but if synchronizing access to the threadLocal fixes it, and you're happy with that. then we're all set!!
        Hide
        Howard M. Lewis Ship added a comment -

        This message (about a year old) looks related:

        http://markmail.org/message/gzx5akfhcjcpis3z

        Which leads to this bug:

        http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6571733

        which claims not to exist in JDK 1.5, just in 1.6 and above.

        Show
        Howard M. Lewis Ship added a comment - This message (about a year old) looks related: http://markmail.org/message/gzx5akfhcjcpis3z Which leads to this bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6571733 which claims not to exist in JDK 1.5, just in 1.6 and above.
        Hide
        Fernando Padilla added a comment -

        hmm. you know your test cases better, could you somehow create a test case in tapestry to try to catch this? I know if it's VM related you might not hit it, but maybe someone will..

        Show
        Fernando Padilla added a comment - hmm. you know your test cases better, could you somehow create a test case in tapestry to try to catch this? I know if it's VM related you might not hit it, but maybe someone will..
        Hide
        Howard M. Lewis Ship added a comment -

        I've checked in an attempt at a fix that synchronized access to the ThreadLocal.

        The file, tapestry-ioc-5.0.14-SNAPSHOT.jar is available here: http://senduit.com/767cd1

        Please give it a try and get back to us about whether it resolves the problem.

        As to your question about the complexity: I believe a later JDK understands the need to upgrade a read-lock to a write-lock, which would simplify this code a bit. Having a single filter than can reduce the system momentarily to a single thread (one that checks for underlying file updates and, as necessary, clears caches) is very important: it simplifies the rest of the code base and ensures that the application operates in a predictable way.

        Show
        Howard M. Lewis Ship added a comment - I've checked in an attempt at a fix that synchronized access to the ThreadLocal. The file, tapestry-ioc-5.0.14-SNAPSHOT.jar is available here: http://senduit.com/767cd1 Please give it a try and get back to us about whether it resolves the problem. As to your question about the complexity: I believe a later JDK understands the need to upgrade a read-lock to a write-lock, which would simplify this code a bit. Having a single filter than can reduce the system momentarily to a single thread (one that checks for underlying file updates and, as necessary, clears caches) is very important: it simplifies the rest of the code base and ensures that the application operates in a predictable way.
        Hide
        Howard M. Lewis Ship added a comment -

        Let's just assume this is the fix, and re-open or create a new bug if it is not.

        Show
        Howard M. Lewis Ship added a comment - Let's just assume this is the fix, and re-open or create a new bug if it is not.

          People

          • Assignee:
            Howard M. Lewis Ship
            Reporter:
            Fernando Padilla
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development