Tapestry
  1. Tapestry
  2. TAPESTRY-2530

Possible performance bottleneck in PageSource.makeObject()

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 4.1.2, 4.1.5
    • Fix Version/s: 4.1.6
    • Component/s: None
    • Labels:
      None
    • Environment:
      JDK 1.5.0_14 64 bit, Tomcat 5.5.26

      Description

      We recently migrated from Tapestry 3 to Tapestry 4.1.2 and are experiencing severe performance problems with mediocre load.

      At some point our servers start creating new threads until the Tomcat reaches the maximum of 500 live threads. When examining the thread dump, we find one thread holding a lock at the GenericKeyedObjectPool, while hundreds of other threads are waiting to aquire just this lock.

      It seems the locking thread is constructig a page from a synchronized context and needs some time to do so, and this leads to the complete application locking up in no time. See the threaddump attached.

      We already tried an upgrade to commons-pool 1.4 with no real effect.

      1. dump_415.zip
        21 kB
        Stefan Koelmann
      2. page-pool-patch-1.0-SNAPSHOT.jar
        17 kB
        Marcus Schulte
      3. page-pool-patch-1.0-SNAPSHOT-sources.jar
        17 kB
        Marcus Schulte
      4. threaddump.zip
        29 kB
        Stefan Koelmann
      5. threaddumps_with_poolfix.zip
        266 kB
        Stefan Koelmann

        Issue Links

          Activity

          Hide
          Stefan Koelmann added a comment -

          Thread dump illustrating the problem with the locked threads.

          Show
          Stefan Koelmann added a comment - Thread dump illustrating the problem with the locked threads.
          Hide
          Andreas Andreou added a comment -

          Do you know / is it possible to check with T4.1.5 or T4.1.6-SNAPSHOT (which also update the versions of several dependencies)?

          I think i remember some performance problems mentioned in the ML (in certain situations) a year ago with 4.1.2,
          but 4.1.3 had fixed on them.

          Show
          Andreas Andreou added a comment - Do you know / is it possible to check with T4.1.5 or T4.1.6-SNAPSHOT (which also update the versions of several dependencies)? I think i remember some performance problems mentioned in the ML (in certain situations) a year ago with 4.1.2, but 4.1.3 had fixed on them.
          Hide
          Stefan Koelmann added a comment -

          Upgrading the application to 4.1.5 would be some effort. We once tried, but were getting OGNL-Exceptions, and then had to stick with the old version because time was pressing. Since we were not able to reproduce the issue in a testing environment yet, we would only be able to check the new version in production.

          I did not find any performance-related bugs in Release 4.1.3 or 4.1.5. The only interesting bug I found in the mailing list was Hivemind-162, which was fixed in Hivemind 1.1.1 - already used by T4.1.2

          Show
          Stefan Koelmann added a comment - Upgrading the application to 4.1.5 would be some effort. We once tried, but were getting OGNL-Exceptions, and then had to stick with the old version because time was pressing. Since we were not able to reproduce the issue in a testing environment yet, we would only be able to check the new version in production. I did not find any performance-related bugs in Release 4.1.3 or 4.1.5. The only interesting bug I found in the mailing list was Hivemind-162, which was fixed in Hivemind 1.1.1 - already used by T4.1.2
          Hide
          Andreas Andreou added a comment -

          I went over the commit logs - possible performance related changes between 4.1.3 and now are at revisions:
          r595338, r613299, r614722 and r646069

          Looking at them from http://svn.apache.org/viewvc?view=rev&revision=646069

          Show
          Andreas Andreou added a comment - I went over the commit logs - possible performance related changes between 4.1.3 and now are at revisions: r595338, r613299, r614722 and r646069 Looking at them from http://svn.apache.org/viewvc?view=rev&revision=646069
          Hide
          Stefan Koelmann added a comment -

          Hello,

          we were able to reproduce this problem on our testing machine. Although the machine does not run out of threads, we can see that a lot of threads get blocked at the GenericKeyedObjectPool and get stuck there. Some threads are blocked for more than 90 seconds, which would lead to users refreshing the page and making the situation even worse.

          An additional problem seems to be that one has to aquire the lock at the object pool to return a page to the tool; so if one thread renders a page; the whole application gets stuck immediately.

          We also retested with Tapestry 4.1.5, it does not solve the problem. I attached a threaddump of the tapestry 4.1.5 version, if you need more information, don't bother to ask.

          Show
          Stefan Koelmann added a comment - Hello, we were able to reproduce this problem on our testing machine. Although the machine does not run out of threads, we can see that a lot of threads get blocked at the GenericKeyedObjectPool and get stuck there. Some threads are blocked for more than 90 seconds, which would lead to users refreshing the page and making the situation even worse. An additional problem seems to be that one has to aquire the lock at the object pool to return a page to the tool; so if one thread renders a page; the whole application gets stuck immediately. We also retested with Tapestry 4.1.5, it does not solve the problem. I attached a threaddump of the tapestry 4.1.5 version, if you need more information, don't bother to ask.
          Hide
          Andreas Andreou added a comment -

          Is it possible that you've limited the maximum number of total active pages in the pool?

          http://tapestry.apache.org/tapestry4.1/developmentguide/performancetuning.html

          Show
          Andreas Andreou added a comment - Is it possible that you've limited the maximum number of total active pages in the pool? http://tapestry.apache.org/tapestry4.1/developmentguide/performancetuning.html
          Hide
          Stefan Koelmann added a comment -

          We are using the default settings. I guess that are those defined in the tapestry.page.xml; if so, the max-active and max-idle values are set to -1 (unlimited).

          Show
          Stefan Koelmann added a comment - We are using the default settings. I guess that are those defined in the tapestry.page.xml; if so, the max-active and max-idle values are set to -1 (unlimited).
          Hide
          Marcus Schulte added a comment -

          From looking at dump_415.zip, one can see that thread "http-11080-Processor198" holds a lock on object <0x7bf1ecb0> (the page-pool) while constructing a new instance of that page.
          A whole lot of threads can be seen waiting for that exact lock. The reason for this is that all the relevant code in GenericKeyedObjectPool.borrowObject() as well as in GKOP.returnObject() is synchronized on the pool itself.
          Thus, the construction of a new page currently blocks any other thread from borrowing any other page, existant or not, from the pool. This is obviously unnecessary and I'd expect it to be fatal under very high load.
          Stefan, I could provide you with a quick patch without any warranty to try out. Are you interested?

          Show
          Marcus Schulte added a comment - From looking at dump_415.zip, one can see that thread "http-11080-Processor198" holds a lock on object <0x7bf1ecb0> (the page-pool) while constructing a new instance of that page. A whole lot of threads can be seen waiting for that exact lock. The reason for this is that all the relevant code in GenericKeyedObjectPool.borrowObject() as well as in GKOP.returnObject() is synchronized on the pool itself. Thus, the construction of a new page currently blocks any other thread from borrowing any other page, existant or not, from the pool. This is obviously unnecessary and I'd expect it to be fatal under very high load. Stefan, I could provide you with a quick patch without any warranty to try out. Are you interested?
          Hide
          Stefan Koelmann added a comment -

          Hi Marcus,

          you are right, the real bottleneck is located in the GenericKeyedObjectPool. I would be glad to retest with your patch and will post the results asap.

          What I do not understand is why other tapestry users are not experiencing the problems we do - there have to be websites built on tapestry with much higher load. Maybe there is a problem with our page-Structure leading to long loading times.

          Show
          Stefan Koelmann added a comment - Hi Marcus, you are right, the real bottleneck is located in the GenericKeyedObjectPool. I would be glad to retest with your patch and will post the results asap. What I do not understand is why other tapestry users are not experiencing the problems we do - there have to be websites built on tapestry with much higher load. Maybe there is a problem with our page-Structure leading to long loading times.
          Hide
          Marcus Schulte added a comment -

          Hi Stefan, yes, maybe you can check your page-loading times by enabling the appropriate logger ... . Nevertheless I'll provide the patch, because there definitively is unnecassary lock-contention which probably impacts all Tap-4 users, if only to a lesser extent.

          Show
          Marcus Schulte added a comment - Hi Stefan, yes, maybe you can check your page-loading times by enabling the appropriate logger ... . Nevertheless I'll provide the patch, because there definitively is unnecassary lock-contention which probably impacts all Tap-4 users, if only to a lesser extent.
          Hide
          Marcus Schulte added a comment -

          A quick, exploratory fix. Drop into WEB-INF/lib to try it.

          Show
          Marcus Schulte added a comment - A quick, exploratory fix. Drop into WEB-INF/lib to try it.
          Hide
          Stefan Koelmann added a comment -

          Hi Marcus,

          it looks a lot better with your fix. I attached a new Logfile containing 9 Threaddumps of the application during our load test. Dump #1 is in idle state. We used samurai to visualize the results.

          It is quite good to see that we got no more threads waiting for the lock of the GKOP, but we have not tested if our software is still functional (i.e. if pages get mixed up).

          Could you please provide the source code of your fix? This would allow us and others to counter-check it.

          I guess you split the lock in the GKOP to make the call to the Factory unsynchronized. This will most likely lead to concurrent accesses to the PageSource; are PageSource and the dependent classes designed to be thread-safe?

          Thanks a lot,

          Stefan

          Show
          Stefan Koelmann added a comment - Hi Marcus, it looks a lot better with your fix. I attached a new Logfile containing 9 Threaddumps of the application during our load test. Dump #1 is in idle state. We used samurai to visualize the results. It is quite good to see that we got no more threads waiting for the lock of the GKOP, but we have not tested if our software is still functional (i.e. if pages get mixed up). Could you please provide the source code of your fix? This would allow us and others to counter-check it. I guess you split the lock in the GKOP to make the call to the Factory unsynchronized. This will most likely lead to concurrent accesses to the PageSource; are PageSource and the dependent classes designed to be thread-safe? Thanks a lot, Stefan
          Hide
          Marcus Schulte added a comment -

          Attached sources for the exploratory patch

          Show
          Marcus Schulte added a comment - Attached sources for the exploratory patch
          Hide
          Marcus Schulte added a comment -

          PageSource is definitively thread-safe. It actually was used concurrently before commons-pool was introduced at some point in 4.1. So please check, whether your problems are gone. If so, I will incorporate a cleaner version of the patch into the Tapestry trunk.

          Show
          Marcus Schulte added a comment - PageSource is definitively thread-safe. It actually was used concurrently before commons-pool was introduced at some point in 4.1. So please check, whether your problems are gone. If so, I will incorporate a cleaner version of the patch into the Tapestry trunk.
          Hide
          Stefan Koelmann added a comment -

          Hi Marcus,

          we are using the patch you provided; it definitely fixed our problem. Thanks a lot, it would be a great bugfix to include for the next 4.1 release.

          Show
          Stefan Koelmann added a comment - Hi Marcus, we are using the patch you provided; it definitely fixed our problem. Thanks a lot, it would be a great bugfix to include for the next 4.1 release.

            People

            • Assignee:
              Marcus Schulte
              Reporter:
              Stefan Koelmann
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development