Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-20646

Large Long Running Requests Can Slow Down the ActionScheduler

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.4.0
    • 2.5.1
    • ambari-server
    • None

    Description

      When creating a massive request (a rolling upgrade on a cluster with 1000 nodes), the size of the request seems to slow down the ActionScheduler. Each command was taking between 1 to 2 minutes to run (even server-side tasks).

      The cause of this can be seen in the following two stack traces:

      ActionSchedulerImpl
      	at org.apache.ambari.server.orm.dao.DaoUtils.selectList(DaoUtils.java:60)
      	at org.apache.ambari.server.orm.dao.HostRoleCommandDAO.findByPKs(HostRoleCommandDAO.java:293)
      	at org.apache.ambari.server.orm.dao.HostRoleCommandDAO$$EnhancerByGuice$$21789cd1.CGLIB$findByPKs$7(<generated>)
      	at org.apache.ambari.server.orm.dao.HostRoleCommandDAO$$EnhancerByGuice$$21789cd1$$FastClassByGuice$$aa975e7f.invoke(<generated>)
      	at com.google.inject.internal.cglib.proxy.$MethodProxy.invokeSuper(MethodProxy.java:228)
      	at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
      	at org.apache.ambari.server.orm.AmbariLocalSessionInterceptor.invoke(AmbariLocalSessionInterceptor.java:53)
      	at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
      	at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)
      	at org.apache.ambari.server.orm.dao.HostRoleCommandDAO$$EnhancerByGuice$$21789cd1.findByPKs(<generated>)
      	at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
      	at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
      	at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157)
      	at org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
      	at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
      	at org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
      	at org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
      	at java.lang.Thread.run(Thread.java:745)
      
      Server Action Executor
      	at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
      	at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
      	at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157)
      	at org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
      	at org.apache.ambari.server.actionmanager.Request.<init>(Request.java:199)
      	at org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance(<generated>)
      	at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
      	at com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
      	at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
      	at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
      	at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
      	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
      	at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
      	at com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
      	at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
      	at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784)
      	at org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259)
      	- locked <0x00007ff0a14083c8> (a java.util.HashMap)
      	at org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454)
      	at org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160)
      	at java.lang.Thread.run(Thread.java:745)
      

      It's clear from these stacks that every PENDING stage (roughly 15,000) were being loaded into memory every second (and their accompanying task as well). This makes no sense as these methods don't need all stages - just the next stage. This is because all stages are synchronous within a single request.

      The proposed solution is to fix the StageEntity.findByCommandStatuses call so it doesn't return every stage:

      SELECT stage.requestid, 
             MIN(stage.stageid) 
      FROM   stageentity stage, 
             hostrolecommandentity hrc 
      WHERE  hrc.status IN :statuses 
             AND hrc.stageid = stage.stageid 
             AND hrc.requestid = stage.requestid 
      GROUP  BY stage.requestid 
      

      Note that this might not appear on trunk due to AMBARI-18868

      Attachments

        1. AMBARI-20646.patch
          47 kB
          Jonathan Hurley

        Issue Links

          Activity

            People

              jonathanhurley Jonathan Hurley
              jonathanhurley Jonathan Hurley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: