Jetspeed 2
  1. Jetspeed 2
  2. JS2-682

Jetspeed thread waiting to lock for infinity time

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1-dev
    • Fix Version/s: 2.1.2
    • Component/s: Components Core
    • Labels:
      None
    • Environment:
      Apache Tomcat 5.5.20; Suse Linux 10.1; log4j 1.2.11

      Description

      Sometimes our production server falls down.

      I have analyzed full thread stack dump after fall down, and I have found that server falls down because of one of thread waits for synchronized code in IsolatedLog4JLogger.getLogger (<0x00002b35dd5ba6c8>). And other threads are waiting for lock that was locked by this thread (<0x00002b35ec0aa1e8>).

      The dump of this thread (in infinitly waiting state):
      "http-8080-Processor499" daemon prio=1 tid=0x00002aaab0a38d10 nid=0x23b3 waiting for monitor entry [0x00000000554a0000..0x00000000554a2da0]
      at org.apache.jetspeed.webapp.logging.IsolatedLog4JLogger.getLogger(IsolatedLog4JLogger.java:158)

      • waiting to lock <0x00002b35dd5ba6c8> (a java.lang.Class)
        at org.apache.jetspeed.webapp.logging.IsolatedLog4JLogger.debug(IsolatedLog4JLogger.java:166)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.triggerAfterCompletion(AbstractPlatformTransactionManager.java:568)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:429)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.doCommitTransactionAfterReturning(TransactionAspectSupport.java:243)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:66)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:144)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:174)
        at $Proxy7.getChildren(Unknown Source)
        at org.apache.jetspeed.prefs.impl.PreferencesImpl.childrenNamesSpi(PreferencesImpl.java:129)
        at java.util.prefs.AbstractPreferences.childrenNames(AbstractPreferences.java:699)
      • locked <0x00002b35ec0aa1e8> (a java.lang.Object)
        at org.apache.jetspeed.om.preference.impl.PrefsPreferenceSetImpl.size(PrefsPreferenceSetImpl.java:271)
        at org.apache.jetspeed.om.preference.impl.PrefsPreferenceSetImpl$PortletPrefsIterator.hasNext(PrefsPreferenceSetImpl.java:364)
        at org.apache.jetspeed.om.preference.impl.FragmentPortletPreferenceSet.<init>(FragmentPortletPreferenceSet.java:62)
        at org.apache.jetspeed.om.portlet.impl.FragmentPortletDefinition.getPreferenceSet(FragmentPortletDefinition.java:229)
        at org.apache.pluto.core.impl.PortletPreferencesImpl.<init>(PortletPreferencesImpl.java:73)
        at org.apache.pluto.factory.impl.PortletPreferencesFactoryImpl.getPortletPreferences(PortletPreferencesFactoryImpl.java:39)
        at org.apache.pluto.factory.PortletObjectAccess.getPortletPreferences(PortletObjectAccess.java:127)
        at org.apache.pluto.core.impl.RenderRequestImpl.getPreferences(RenderRequestImpl.java:74)
        at $PortletRequest_111c59420ea.getPreferences($PortletRequest_111c59420ea.java)
        at $PortletRequest_111c59420e7.getPreferences($PortletRequest_111c59420e7.java)
        at net.mycompany.portal.news.newslist.NewsListPage.pageBeginRender(NewsListPage.java:792)
        at org.apache.tapestry.AbstractPage.firePageBeginRender(AbstractPage.java:478)
        at org.apache.tapestry.AbstractPage.renderPage(AbstractPage.java:268)
        at org.apache.tapestry.engine.RequestCycle.renderPage(RequestCycle.java:366)
        at org.apache.tapestry.portlet.PortletRendererImpl.renderPage(PortletRendererImpl.java:76)
        at $PortletRenderer_111c59420d9.renderPage($PortletRenderer_111c59420d9.java)
        at org.apache.tapestry.portlet.PortletHomeService.service(PortletHomeService.java:80)
        at $IEngineService_111c59420b9.service($IEngineService_111c59420b9.java)
        at org.apache.tapestry.services.impl.EngineServiceOuterProxy.service(EngineServiceOuterProxy.java:66)
        at org.apache.tapestry.engine.AbstractEngine.service(AbstractEngine.java:248)
        at org.apache.tapestry.services.impl.InvokeEngineTerminator.service(InvokeEngineTerminator.java:60)
        at $WebRequestServicer_111c5942059.service($WebRequestServicer_111c5942059.java)
        at $WebRequestServicer_111c5942057.service($WebRequestServicer_111c5942057.java)
        at org.apache.tapestry.portlet.RenderRequestServicerToWebRequestServicerBridge.service(RenderRequestServicerToWebRequestServicerBridge.java:49)
        at $RenderRequestServicer_111c5942051.service($RenderRequestServicer_111c5942051.java)
        at $RenderRequestServicer_111c594204b.service($RenderRequestServicer_111c594204b.java)
        at org.apache.tapestry.portlet.ApplicationPortlet.render(ApplicationPortlet.java:161)
        at org.apache.jetspeed.factory.JetspeedPortletInstance.render(JetspeedPortletInstance.java:102)
        at org.apache.jetspeed.container.JetspeedContainerServlet.doGet(JetspeedContainerServlet.java:230)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:672)
        at org.apache.catalina.core.ApplicationDispatcher.doInclude(ApplicationDispatcher.java:574)
        at org.apache.catalina.core.ApplicationDispatcher.include(ApplicationDispatcher.java:499)
        at org.apache.jetspeed.container.invoker.ServletPortletInvoker.invoke(ServletPortletInvoker.java:215)
        at org.apache.jetspeed.container.invoker.ServletPortletInvoker.render(ServletPortletInvoker.java:126)
        at org.apache.pluto.PortletContainerImpl.renderPortlet(PortletContainerImpl.java:119)
        at org.apache.jetspeed.container.JetspeedPortletContainerWrapper.renderPortlet(JetspeedPortletContainerWrapper.java:120)
        at org.apache.jetspeed.aggregator.impl.RenderingJobImpl.execute(RenderingJobImpl.java:121)
        at org.apache.jetspeed.aggregator.impl.PortletRendererImpl.renderNow(PortletRendererImpl.java:120)
        at org.apache.jetspeed.aggregator.impl.PageAggregatorImpl.aggregateAndRender(PageAggregatorImpl.java:199)
        at org.apache.jetspeed.aggregator.impl.PageAggregatorImpl.aggregateAndRender(PageAggregatorImpl.java:182)
        at org.apache.jetspeed.aggregator.impl.PageAggregatorImpl.aggregateAndRender(PageAggregatorImpl.java:182)
        at org.apache.jetspeed.aggregator.impl.PageAggregatorImpl.build(PageAggregatorImpl.java:106)
        at org.apache.jetspeed.aggregator.AggregatorValve.invoke(AggregatorValve.java:48)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.decoration.DecorationValve.invoke(DecorationValve.java:110)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.pipeline.valve.impl.ActionValveImpl.invoke(ActionValveImpl.java:147)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.container.ContainerValve.invoke(ContainerValve.java:76)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.profiler.impl.ProfilerValveImpl.invoke(ProfilerValveImpl.java:255)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.security.impl.LoginValidationValveImpl.invoke(LoginValidationValveImpl.java:159)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.security.impl.PasswordCredentialValveImpl.invoke(PasswordCredentialValveImpl.java:149)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.localization.impl.LocalizationValveImpl.invoke(LocalizationValveImpl.java:169)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.security.impl.AbstractSecurityValve$1.run(AbstractSecurityValve.java:117)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAsPrivileged(Subject.java:454)
        at org.apache.jetspeed.security.impl.AbstractSecurityValve.invoke(AbstractSecurityValve.java:111)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.container.url.impl.PortalURLValveImpl.invoke(PortalURLValveImpl.java:67)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.capabilities.impl.CapabilityValveImpl.invoke(CapabilityValveImpl.java:128)
        at org.apache.jetspeed.pipeline.JetspeedPipeline$Invocation.invokeNext(JetspeedPipeline.java:166)
        at org.apache.jetspeed.pipeline.JetspeedPipeline.invoke(JetspeedPipeline.java:145)
        at org.apache.jetspeed.engine.JetspeedEngine.service(JetspeedEngine.java:214)
        at org.apache.jetspeed.engine.JetspeedServlet.doGet(JetspeedServlet.java:238)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at com.opensymphony.oscache.web.filter.CacheFilter.doFilter(CacheFilter.java:168)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:432)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
        at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
        at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
        at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
        at java.lang.Thread.run(Thread.java:595)

      So, this thread is waiting for <0x00002b35dd5ba6c8> lock, but there is no one other thread use this lock! (Full thread stack dump is attached).

      Method IsolatedLog4JLogger.getLogger, when thread inifinitly waiting for synchronized object::
      private Log4JLogger getLogger()
      {
      synchronized (IsolatedLog4JLogger.class)

      { return logger; }

      }

      IsolatedLog4JLogger class with all parts of code, that use "synchronized (IsolatedLog4JLogger.class)":
      public class IsolatedLog4JLogger implements Log
      {
      private static Hierarchy hierarchy;
      private static HashMap notIsolatedLoggers = new HashMap();

      private Log4JLogger logger; // the wrapped Log4JLogger

      public static void setHierarchy(Hierarchy hierarchy)
      {
      synchronized (IsolatedLog4JLogger.class)
      {
      if ( IsolatedLog4JLogger.hierarchy == null )
      {
      IsolatedLog4JLogger.hierarchy = hierarchy;
      if ( notIsolatedLoggers.size() > 0 )
      {
      // Reroute existing IsolatedLog4JLogger instances
      // which were created before the new LoggerRepository.
      // Note: This situation should be prevented as much as
      // possible by calling setHierarchy from
      // a ServletContextListener or a Servlet its init method
      // which has a load-on-startup value of 0 (zero).
      Iterator iter = notIsolatedLoggers.entrySet().iterator();
      while (iter.hasNext())

      { Map.Entry entry = (Map.Entry)iter.next(); IsolatedLog4JLogger logger = (IsolatedLog4JLogger)entry.getKey(); logger.setLogger(new Log4JLogger(hierarchy.getLogger((String)entry.getValue()))); }

      }
      notIsolatedLoggers = null;
      }
      }
      }

      public IsolatedLog4JLogger(String name)
      {
      synchronized (IsolatedLog4JLogger.class)
      {
      if ( hierarchy == null )

      { // A LogFactory.getLog(name) is called before // our ContextClassLoader Hierarchy could be set. // Temporarily save this instance so it can be // rerouted one the Hierarchy is set. logger = new Log4JLogger(name); notIsolatedLoggers.put(this,name); }

      else

      { logger = new Log4JLogger(hierarchy.getLogger(name)); }

      }
      }
      ................
      }

      1. stackdump.zip
        73 kB
        Vitaly Baranovsky

        Activity

        Hide
        David Sean Taylor added a comment -

        this is passing through our load tests here for a while now

        Show
        David Sean Taylor added a comment - this is passing through our load tests here for a while now
        Hide
        Vitaly Baranovsky added a comment -

        Sorry, I had not time for rebuild jetspeed with this patch.

        It's looks like Ate right with his comment:
        > But the issue I'm really interested in is the fact that Vitaly saw no locks on the object and still the application crashed while waiting for one...
        > So I'm still not sure the real problem lies with this code itself but could be caused by a completely unrelated issue, like possibly a OS kernel
        > configuration problem.
        >That is why I responsed initially to look at the kernel semaphore sets configuration.

        But I would try your suggestion when I will find time for this.

        Show
        Vitaly Baranovsky added a comment - Sorry, I had not time for rebuild jetspeed with this patch. It's looks like Ate right with his comment: > But the issue I'm really interested in is the fact that Vitaly saw no locks on the object and still the application crashed while waiting for one... > So I'm still not sure the real problem lies with this code itself but could be caused by a completely unrelated issue, like possibly a OS kernel > configuration problem. >That is why I responsed initially to look at the kernel semaphore sets configuration. But I would try your suggestion when I will find time for this.
        Hide
        David Sean Taylor added a comment -

        Vitaly, Im reviewing this issue. Is it still open or did our patch solve this problem?

        Show
        David Sean Taylor added a comment - Vitaly, Im reviewing this issue. Is it still open or did our patch solve this problem?
        Hide
        Vitaly Baranovsky added a comment -

        2 Ate Douma:
        > That is why I responsed initially to look at the kernel semaphore sets configuration.

        Admin of our portal have looked at the kernel semaphore sets configuration. It was configured by default (SuSe Linux) as optimized for highly concurrenced application. And lock count setting is bigger that recommended setting for Oracle 10G and WebSphere. So, it seems like lock count isn't a problem.

        But he have increased lock count. So, I will say about results.

        Show
        Vitaly Baranovsky added a comment - 2 Ate Douma: > That is why I responsed initially to look at the kernel semaphore sets configuration. Admin of our portal have looked at the kernel semaphore sets configuration. It was configured by default (SuSe Linux) as optimized for highly concurrenced application. And lock count setting is bigger that recommended setting for Oracle 10G and WebSphere. So, it seems like lock count isn't a problem. But he have increased lock count. So, I will say about results.
        Hide
        Ate Douma added a comment -

        The current synchronization inside the getLogger method is done because of the Log4J reconfiguration done in the setHierarchy method:

        public static void setHierarchy(Hierarchy hierarchy)
        {
        synchronized (IsolatedLog4JLogger.class)
        {
        if ( IsolatedLog4JLogger.hierarchy == null )
        {
        IsolatedLog4JLogger.hierarchy = hierarchy;
        if ( notIsolatedLoggers.size() > 0 )
        {
        // Reroute existing IsolatedLog4JLogger instances
        // which were created before the new LoggerRepository.
        // Note: This situation should be prevented as much as
        // possible by calling setHierarchy from
        // a ServletContextListener or a Servlet its init method
        // which has a load-on-startup value of 0 (zero).
        Iterator iter = notIsolatedLoggers.entrySet().iterator();
        while (iter.hasNext())

        { Map.Entry entry = (Map.Entry)iter.next(); IsolatedLog4JLogger logger = (IsolatedLog4JLogger)entry.getKey(); logger.setLogger(new Log4JLogger(hierarchy.getLogger((String)entry.getValue()))); }

        }
        notIsolatedLoggers = null;
        }
        }
        }

        If a logger is retrieved (and used) at the same time that the reconfiguration takes places, exceptions might take place.
        But I see the synchronization actually isn't properly guarding against that anyway, so it just as well can be removed indeed
        So +1 on removing the synchronization.

        But the issue I'm really interested in is the fact that Vitaly saw no locks on the object and still the application crashed while waiting for one...
        So I'm still not sure the real problem lies with this code itself but could be caused by a completely unrelated issue, like possibly a OS kernel configuration problem.
        That is why I responsed initially to look at the kernel semaphore sets configuration.

        Show
        Ate Douma added a comment - The current synchronization inside the getLogger method is done because of the Log4J reconfiguration done in the setHierarchy method: public static void setHierarchy(Hierarchy hierarchy) { synchronized (IsolatedLog4JLogger.class) { if ( IsolatedLog4JLogger.hierarchy == null ) { IsolatedLog4JLogger.hierarchy = hierarchy; if ( notIsolatedLoggers.size() > 0 ) { // Reroute existing IsolatedLog4JLogger instances // which were created before the new LoggerRepository. // Note: This situation should be prevented as much as // possible by calling setHierarchy from // a ServletContextListener or a Servlet its init method // which has a load-on-startup value of 0 (zero). Iterator iter = notIsolatedLoggers.entrySet().iterator(); while (iter.hasNext()) { Map.Entry entry = (Map.Entry)iter.next(); IsolatedLog4JLogger logger = (IsolatedLog4JLogger)entry.getKey(); logger.setLogger(new Log4JLogger(hierarchy.getLogger((String)entry.getValue()))); } } notIsolatedLoggers = null; } } } If a logger is retrieved (and used) at the same time that the reconfiguration takes places, exceptions might take place. But I see the synchronization actually isn't properly guarding against that anyway, so it just as well can be removed indeed So +1 on removing the synchronization. But the issue I'm really interested in is the fact that Vitaly saw no locks on the object and still the application crashed while waiting for one... So I'm still not sure the real problem lies with this code itself but could be caused by a completely unrelated issue, like possibly a OS kernel configuration problem. That is why I responsed initially to look at the kernel semaphore sets configuration.
        Hide
        Santiago Gala added a comment -

        Not sure what sense makes the synchronized there (in both cases). What is it guarding against?

        I mean, synchronizing an atomic op (get or set an object reference) does not seem to make any sense.
        The only thing it does is ensuring that only one thread is getting a reference to the loader/setting a new logger at the same time.

        I guess removing the synchronization would be a noop, but I can be missing something.

        Show
        Santiago Gala added a comment - Not sure what sense makes the synchronized there (in both cases). What is it guarding against? I mean, synchronizing an atomic op (get or set an object reference) does not seem to make any sense. The only thing it does is ensuring that only one thread is getting a reference to the loader/setting a new logger at the same time. I guess removing the synchronization would be a noop, but I can be missing something.
        Hide
        David Sean Taylor added a comment -

        I think there is a simple fix here.
        I propose changing the code from:

        private void setLogger(Log4JLogger logger)

        { this.logger = logger; }

        private Log4JLogger getLogger()
        {
        synchronized (IsolatedLog4JLogger.class)

        { return logger; }

        }

        to

        private void setLogger(Log4JLogger logger)
        {
        synchronized (IsolatedLog4JLogger.class)

        { this.logger = logger; }

        }

        private Log4JLogger getLogger()

        { return logger; }
        Show
        David Sean Taylor added a comment - I think there is a simple fix here. I propose changing the code from: private void setLogger(Log4JLogger logger) { this.logger = logger; } private Log4JLogger getLogger() { synchronized (IsolatedLog4JLogger.class) { return logger; } } to private void setLogger(Log4JLogger logger) { synchronized (IsolatedLog4JLogger.class) { this.logger = logger; } } private Log4JLogger getLogger() { return logger; }
        Hide
        Ate Douma added a comment -

        The first advise I received here from a collegue is checking/increasing your kernel semaphore sets: he has had problems like this before with the OS running out of available locks.
        So maybe you can check/improve your system configuration at /proc/sys/kernel/sem
        No garantee this will solve it or even is causing it, but it might be worth a try.

        Show
        Ate Douma added a comment - The first advise I received here from a collegue is checking/increasing your kernel semaphore sets: he has had problems like this before with the OS running out of available locks. So maybe you can check/improve your system configuration at /proc/sys/kernel/sem No garantee this will solve it or even is causing it, but it might be worth a try.
        Hide
        Vitaly Baranovsky added a comment -

        So, what can Jetspeed people say about this waiting to lock for infinity time?

        Show
        Vitaly Baranovsky added a comment - So, what can Jetspeed people say about this waiting to lock for infinity time?
        Hide
        Vitaly Baranovsky added a comment -

        There is full thread stack dump attached to this issue.
        As You can see, no one thread locks <0x00002b35dd5ba6c8> object.

        Show
        Vitaly Baranovsky added a comment - There is full thread stack dump attached to this issue. As You can see, no one thread locks <0x00002b35dd5ba6c8> object.

          People

          • Assignee:
            David Sean Taylor
            Reporter:
            Vitaly Baranovsky
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development