Tapestry
  1. Tapestry
  2. TAPESTRY-2473

Application locks up, with all threads waiting for read lock inside ConcurrentBarrier

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 5.0.11, 5.0.12, 5.0.13
    • Fix Version/s: 5.0.14
    • Component/s: Framework
    • Labels:
      None
    • Environment:
      Red Hat, Tomcat, java version "1.5.0_10"

      Description

      We are running in production, and pretty regularly the system seems to hang up. After I do a jstack it looks like all the threads are tied up against some lock.

      Can you help me figure out the root cause?

      I will be attaching the full jstack the the stuck threads all look like:

      Thread 25014: (state = BLOCKED)

      • sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
      • java.util.concurrent.locks.LockSupport.park() @bci=5, line=118 (Compiled frame)
      • java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1124 (Compiled frame)
      • org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame)
      • $RequestHandler_11aa7f96e6c.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response) @bci=10 (Compiled frame)
      • org.apache.tapestry.services.TapestryModule$12.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=40, line=924 (Compiled frame)
      • $HttpServletRequestHandler_11aa7f96e65.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10 (Compiled frame)
      • org.apache.tapestry.internal.services.IgnoredPathsFilter.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, org.apache.tapestry.services.HttpServletRequestHandler) @bci
        =95, line=62 (Compiled frame)
      • $HttpServletRequestFilter_11aa7f96e63.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, org.apache.tapestry.services.HttpServletRequestHandler) @bci=7 (Compiled frame)
      • $HttpServletRequestHandler_11aa7f96e65.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10 (Compiled frame)
      • $HttpServletRequestHandler_11aa7f96e65.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10 (Compiled frame)
      • org.apache.tapestry.TapestryFilter.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=12, line=168 (Compiled frame)
        ......
      1. js2.txt
        2.52 MB
        Fernando Padilla
      2. js.txt
        2.20 MB
        Fernando Padilla

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open In Progress In Progress
        2h 27m 1 Howard M. Lewis Ship 23/Jun/08 17:34
        In Progress In Progress Closed Closed
        8d 22h 30m 1 Howard M. Lewis Ship 02/Jul/08 16:05
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12568425 ] jira [ 12591480 ]
        Mark Thomas made changes -
        Workflow jira [ 12433875 ] Default workflow, editable Closed status [ 12568425 ]
        Howard M. Lewis Ship made changes -
        Fix Version/s 5.0.14 [ 12313214 ]
        Status In Progress [ 3 ] Closed [ 6 ]
        Resolution Fixed [ 1 ]
        Hide
        Howard M. Lewis Ship added a comment -

        Let's just assume this is the fix, and re-open or create a new bug if it is not.

        Show
        Howard M. Lewis Ship added a comment - Let's just assume this is the fix, and re-open or create a new bug if it is not.
        Hide
        Howard M. Lewis Ship added a comment -

        I've checked in an attempt at a fix that synchronized access to the ThreadLocal.

        The file, tapestry-ioc-5.0.14-SNAPSHOT.jar is available here: http://senduit.com/767cd1

        Please give it a try and get back to us about whether it resolves the problem.

        As to your question about the complexity: I believe a later JDK understands the need to upgrade a read-lock to a write-lock, which would simplify this code a bit. Having a single filter than can reduce the system momentarily to a single thread (one that checks for underlying file updates and, as necessary, clears caches) is very important: it simplifies the rest of the code base and ensures that the application operates in a predictable way.

        Show
        Howard M. Lewis Ship added a comment - I've checked in an attempt at a fix that synchronized access to the ThreadLocal. The file, tapestry-ioc-5.0.14-SNAPSHOT.jar is available here: http://senduit.com/767cd1 Please give it a try and get back to us about whether it resolves the problem. As to your question about the complexity: I believe a later JDK understands the need to upgrade a read-lock to a write-lock, which would simplify this code a bit. Having a single filter than can reduce the system momentarily to a single thread (one that checks for underlying file updates and, as necessary, clears caches) is very important: it simplifies the rest of the code base and ensures that the application operates in a predictable way.
        Hide
        Fernando Padilla added a comment -

        hmm. you know your test cases better, could you somehow create a test case in tapestry to try to catch this? I know if it's VM related you might not hit it, but maybe someone will..

        Show
        Fernando Padilla added a comment - hmm. you know your test cases better, could you somehow create a test case in tapestry to try to catch this? I know if it's VM related you might not hit it, but maybe someone will..
        Hide
        Howard M. Lewis Ship added a comment -

        This message (about a year old) looks related:

        http://markmail.org/message/gzx5akfhcjcpis3z

        Which leads to this bug:

        http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6571733

        which claims not to exist in JDK 1.5, just in 1.6 and above.

        Show
        Howard M. Lewis Ship added a comment - This message (about a year old) looks related: http://markmail.org/message/gzx5akfhcjcpis3z Which leads to this bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6571733 which claims not to exist in JDK 1.5, just in 1.6 and above.
        Hide
        Fernando Padilla added a comment -

        but if synchronizing access to the threadLocal fixes it, and you're happy with that. then we're all set!!

        Show
        Fernando Padilla added a comment - but if synchronizing access to the threadLocal fixes it, and you're happy with that. then we're all set!!
        Hide
        Fernando Padilla added a comment -

        Yes, CheckForUpdatesFilter is partially responsible, but I think mostly because it is the only user of the tryWithRead function.

        At the moment the code in ConcurrentBarrier just looks too complicated for me.. I would propose for you to put some thought reviewing that code and see if you still want locking code that is so complicated ( interactions between read lock, write lock, read lock Flag and a read to write lock upgrade support ). I bet you've already put lots of thought on it and it's most likely correct, but it might just be brittle for long term support.

        here is some food for thought:

        Does the checking for updates really need to be done inline for every request (and upgrading a read lock to a write lock)? Could it be launched as a side thread with it's own locking, then only block requests while it's rebuilding caches or while it's reseting page caches...

        Show
        Fernando Padilla added a comment - Yes, CheckForUpdatesFilter is partially responsible, but I think mostly because it is the only user of the tryWithRead function. At the moment the code in ConcurrentBarrier just looks too complicated for me.. I would propose for you to put some thought reviewing that code and see if you still want locking code that is so complicated ( interactions between read lock, write lock, read lock Flag and a read to write lock upgrade support ). I bet you've already put lots of thought on it and it's most likely correct, but it might just be brittle for long term support. here is some food for thought: Does the checking for updates really need to be done inline for every request (and upgrading a read lock to a write lock)? Could it be launched as a side thread with it's own locking, then only block requests while it's rebuilding caches or while it's reseting page caches...
        Hide
        Fernando Padilla added a comment -

        I did a simple grep/sort/count on the two files:

        It does look like they are also stuck on a few restoreReadLock, so that means that the writeLocks aren't releasing properly??

        hala:~> grep ConcurrentBarrier js.txt | sort | uniq -c
        2 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.restoreReadLock(boolean) @bci=13, line=158 (Compiled frame)
        354 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame)
        3 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=44, line=77 (Compiled frame)

        hala:~> grep ConcurrentBarrier js2.txt | sort | uniq -c
        5 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.restoreReadLock(boolean) @bci=13, line=158 (Compiled frame)
        393 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame)
        6 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=44, line=77 (Compiled frame)

        Show
        Fernando Padilla added a comment - I did a simple grep/sort/count on the two files: It does look like they are also stuck on a few restoreReadLock, so that means that the writeLocks aren't releasing properly?? hala:~> grep ConcurrentBarrier js.txt | sort | uniq -c 2 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.restoreReadLock(boolean) @bci=13, line=158 (Compiled frame) 354 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame) 3 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=44, line=77 (Compiled frame) hala:~> grep ConcurrentBarrier js2.txt | sort | uniq -c 5 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.restoreReadLock(boolean) @bci=13, line=158 (Compiled frame) 393 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame) 6 - org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=44, line=77 (Compiled frame)
        Hide
        Howard M. Lewis Ship added a comment -

        The above trace omits a key detail:

        Thread 20716: (state = BLOCKED)

        • sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
        • java.util.concurrent.locks.LockSupport.park() @bci=5, line=118 (Compiled frame)
        • org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame)
        • org.apache.tapestry.internal.services.CheckForUpdatesFilter.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response, org.apache.tapestry.services.RequestHandler) @bci=30, line=106 (Compiled frame)
        • $RequestHandler_11aa3e523f4.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response) @bci=10 (Compiled frame)
        • $RequestHandler_11aa3e523ec.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response) @bci=6 (Compiled frame)
        • org.apache.tapestry.services.TapestryModule$12.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=40, line=924 (Compiled frame)

        The problem is certainly inside CheckForUpdatesFilter.

        Show
        Howard M. Lewis Ship added a comment - The above trace omits a key detail: Thread 20716: (state = BLOCKED) sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) java.util.concurrent.locks.LockSupport.park() @bci=5, line=118 (Compiled frame) org.apache.tapestry.ioc.internal.util.ConcurrentBarrier.withRead(org.apache.tapestry.ioc.internal.util.Invokable) @bci=27, line=70 (Compiled frame) org.apache.tapestry.internal.services.CheckForUpdatesFilter.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response, org.apache.tapestry.services.RequestHandler) @bci=30, line=106 (Compiled frame) $RequestHandler_11aa3e523f4.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response) @bci=10 (Compiled frame) $RequestHandler_11aa3e523ec.service(org.apache.tapestry.services.Request, org.apache.tapestry.services.Response) @bci=6 (Compiled frame) org.apache.tapestry.services.TapestryModule$12.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=40, line=924 (Compiled frame) The problem is certainly inside CheckForUpdatesFilter.
        Howard M. Lewis Ship made changes -
        Summary Thread Lock Ups, unkonwn cause Application locks up, with all threads waiting for read lock inside ConcurrentBarrier
        Howard M. Lewis Ship made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Howard M. Lewis Ship made changes -
        Assignee Howard M. Lewis Ship [ hlship ]
        Hide
        Howard M. Lewis Ship added a comment -

        Another possibility is the ThreadLocal bug (in the JDK 1.5). May have to synchronize access to the ThreadLocal object inside ConcurrentBarrier.

        Show
        Howard M. Lewis Ship added a comment - Another possibility is the ThreadLocal bug (in the JDK 1.5). May have to synchronize access to the ThreadLocal object inside ConcurrentBarrier.
        Hide
        Howard M. Lewis Ship added a comment -

        What's odd here is that all the threads are blocked in withRead() (i.e., acquiring the shared read lock). That's very odd because I don't see any thread inside withWrite() (i.e., with the exclusive write lock). I remember there was a earlier hickup where Tapestry's use of ReentrantReadWriteLock tripped across a JDK bug.

        Show
        Howard M. Lewis Ship added a comment - What's odd here is that all the threads are blocked in withRead() (i.e., acquiring the shared read lock). That's very odd because I don't see any thread inside withWrite() (i.e., with the exclusive write lock). I remember there was a earlier hickup where Tapestry's use of ReentrantReadWriteLock tripped across a JDK bug.
        Fernando Padilla made changes -
        Attachment js2.txt [ 12384501 ]
        Hide
        Fernando Padilla added a comment -

        another stack trace that just happened

        Show
        Fernando Padilla added a comment - another stack trace that just happened
        Fernando Padilla made changes -
        Field Original Value New Value
        Attachment js.txt [ 12384500 ]
        Hide
        Fernando Padilla added a comment -

        a jstack when the server is stuck.

        Show
        Fernando Padilla added a comment - a jstack when the server is stuck.
        Fernando Padilla created issue -

          People

          • Assignee:
            Howard M. Lewis Ship
            Reporter:
            Fernando Padilla
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development