Uploaded image for project: 'Stratos'
  1. Stratos
  2. STRATOS-1293

Stratos should remove instance in ERROR state before trying to re-launch instance

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0.0, 4.1.0 Alpha
    • Fix Version/s: None
    • Component/s: Cloud Controller
    • Labels:
      None
    • Environment:
      Openstack Icehouse, Stratos 4.0.0 GA

      Description

      On my setup with Icehouse and Stratos 4.0.0GA, I observed there was one particular cartridge with one running instance and multiple instances in ERROR state. Upon checking wso2carbon.log, I found several instances of the exception below. Looked like when Stratos launched the cartridge, the instance didn't achieve running state, so Stratos tried to launch another instance. This kept on going until eventually one instance of the cartridge achieved running status.

      We need to make sure when this condition occurs, Stratos will remove the instances that are in ERROR state before attempting to re-launch. The instances in ERROR state can exhaust resources on the underline Iaas cluster (Openstack in this case)

      1) IllegalStateException on node RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10:java.lang.IllegalStateException: node(RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10) didn't achieve the status running; aborting after 1 seconds with final status: ERROR
      at org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:72)
      at org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:45) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:121) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
      at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711) at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

      1 error[s] at org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient.spawnAnInstance(CloudControllerClient.java:174)
      at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.delegateSpawn(RuleTasksDelegator.java:87)
      ... 22 moreCaused by: org.apache.axis2.AxisFault: Failed to start an instance. MemberContext [memberId=lb01.lb01.domaincde68af9-d82f-42c3-9c01-4fd8da565867, nodeId=null, clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null, publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177492261, lbClusterId=null, networkPartitionId=N1] Cause: error running 1 node group(lb01lb01) location(RegionOne) image(0299be4d-a743-4424-ae28-f40bd4faa669) size(2e2e2b47-9f40-4bd8-9777-83e802f5f1cd) options({inboundPorts=[], autoAssignFloatingIp=false, securityGroupNames=[default], keyPairName=phoenix, userData=[B@14edd531, configDrive=false, novaNetworks=[Network

      {networkUuid=42c4a88d-0d59-4fbb-90f0-9b9806f9c17c, portUuid=null, fixedIp=172.16.2.201}, Network{networkUuid=6a2615e4-760c-4c93-895d-b4b16e550193, portUuid=null, fixedIp=10.81.69.201}, Network{networkUuid=670550f0-67fc-48ff-a33c-e184a7908247, portUuid=null, fixedIp=10.13.5.81}]})
      Execution failures:

      0 error[s]Node failures:

      1) IllegalStateException on node RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10:java.lang.IllegalStateException: node(RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10) didn't achieve the status running; aborting after 1 seconds with final status: ERROR
      at org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:72)
      at org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:45) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:121) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
      at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711) at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

      1 error[s]
      at org.apache.axis2.util.Utils.getInboundFaultFromMessageContext(Utils.java:531)
      at org.apache.axis2.description.OutInAxisOperationClient.handleResponse(OutInAxisOperation.java:370)
      at org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:445)
      at org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:225)
      at org.apache.axis2.client.OperationClient.execute(OperationClient.java:149)
      at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.startInstance(CloudControllerServiceStub.java:1407) at org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient.spawnAnInstance(CloudControllerClient.java:162)
      ... 23 moreTID: [0] [STRATOS] [2015-03-24 06:13:04,511] INFO {org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient} - Trying to spawn an instance via cloud controller: [cluster] lb01.lb01.domain [partition] RegionOne-Core [lb-cluster] null [network-partition-id] N1 {org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient}TID: [0] [STRATOS] [2015-03-24 06:13:09,114] INFO {org.wso2.carbon.databridge.core.DataBridge} - admin connected {org.wso2.carbon.databridge.core.DataBridge}TID: [0] [STRATOS] [2015-03-24 06:13:16,902] INFO {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} - Instance is successfully starting up. MemberContext [memberId=lb01.lb01.domain64df2bd7-ee48-4ca2-9dec-362b72543d86, nodeId=RegionOne/aa1a0e56-a722-444a-99f5-080ef844fb2d, clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null, publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177584511, lbClusterId=null, networkPartitionId=N1] {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
      at org.drools.common.AbstractWorkingMemory.fireAllRules(AbstractWorkingMemory.java:674)
      at org.drools.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:230)
      at org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.evaluateMinCheck(AutoscalerRuleEvaluator.java:94)
      at org.apache.stratos.autoscaler.monitor.ClusterMonitor.monitor(ClusterMonitor.java:157)
      at org.apache.stratos.autoscaler.monitor.ClusterMonitor.run(ClusterMonitor.java:86)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.axis2.AxisFault: Failed to start an instance. MemberContext [memberId=lb01.lb01.domaincde68af9-d82f-42c3-9c01-4fd8da565867, nodeId=null, clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null, publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177492261, lbClusterId=null, networkPartitionId=N1] Cause: error running 1 node group(lb01lb01) location(RegionOne) image(0299be4d-a743-4424-ae28-f40bd4faa669) size(2e2e2b47-9f40-4bd8-9777-83e802f5f1cd) options({inboundPorts=[], autoAssignFloatingIp=false, securityGroupNames=[default], keyPairName=phoenix, userData=[B@14edd531, configDrive=false, novaNetworks=[Network{networkUuid=42c4a88d-0d59-4fbb-90f0-9b9806f9c17c, portUuid=null, fixedIp=172.16.2.201}

      , Network

      {networkUuid=6a2615e4-760c-4c93-895d-b4b16e550193, portUuid=null, fixedIp=10.81.69.201}, Network{networkUuid=670550f0-67fc-48ff-a33c-e184a7908247, portUuid=null, fixedIp=10.13.5.81}]})
      Execution failures:

      0 error[s]
      Node failures:

      1) IllegalStateException on node RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10:
      java.lang.IllegalStateException: node(RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10) didn't achieve the status running; aborting after 1 seconds with final status: ERROR
      at org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:72)
      at org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:45)
      at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:121)
      at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
      at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
      at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
      at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)


      1 error[s]
      at org.apache.axis2.util.Utils.getInboundFaultFromMessageContext(Utils.java:531)
      at org.apache.axis2.description.OutInAxisOperationClient.handleResponse(OutInAxisOperation.java:370)
      at org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:445)
      at org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:225)
      at org.apache.axis2.client.OperationClient.execute(OperationClient.java:149)
      at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.startInstance(CloudControllerServiceStub.java:1407)
      at org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient.spawnAnInstance(CloudControllerClient.java:162)
      ... 23 moreTID: [0] [STRATOS] [2015-03-24 06:11:34,509] ERROR {org.apache.stratos.autoscaler.monitor.ClusterMonitor} - Cluster monitor: Monitor failed.ClusterMonitor [clusterId=lb01.lb01.domain, serviceId=lb01, deploymentPolicy=Deployment Policy [id]static-1-Core [partitions] [org.apache.stratos.cloud.controller.stub.deployment.partition.Partition@fb6144], autoscalePolicy=ASPolicy [id=economyPolicy, displayName=null, description=null], lbReferenceType=null, hasPrimary=false ] {org.apache.stratos.autoscaler.monitor.ClusterMonitor}Exception executing consequence for rule "Minimum Rule" in org.apache.stratos.autoscaler.rule: java.lang.RuntimeException: cannot invoke method: delegateSpawn at org.drools.runtime.rule.impl.DefaultConsequenceExceptionHandler.handleException(DefaultConsequenceExceptionHandler.java:39)
      at org.drools.common.DefaultAgenda.fireActivation(DefaultAgenda.java:1297)
      at org.drools.common.DefaultAgenda.fireNextItem(DefaultAgenda.java:1221)
      at org.drools.common.DefaultAgenda.fireAllRules(DefaultAgenda.java:1456)
      at org.drools.common.AbstractWorkingMemory.fireAllRules(AbstractWorkingMemory.java:710)
      at org.drools.common.AbstractWorkingMemory.fireAllRules(AbstractWorkingMemory.java:674)
      at org.drools.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:230)
      at org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.evaluateMinCheck(AutoscalerRuleEvaluator.java:94)
      at org.apache.stratos.autoscaler.monitor.ClusterMonitor.monitor(ClusterMonitor.java:157)
      at org.apache.stratos.autoscaler.monitor.ClusterMonitor.run(ClusterMonitor.java:86)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: cannot invoke method: delegateSpawn
      at org.mvel2.optimizers.impl.refl.nodes.MethodAccessor.getValue(MethodAccessor.java:63)
      at org.mvel2.optimizers.impl.refl.nodes.VariableAccessor.getValue(VariableAccessor.java:37)
      at org.mvel2.ast.ASTNode.getReducedValueAccelerated(ASTNode.java:108) at org.mvel2.MVELRuntime.execute(MVELRuntime.java:85)
      at org.mvel2.compiler.CompiledExpression.getDirectValue(CompiledExpression.java:123)
      at org.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:119) at org.mvel2.MVEL.executeExpression(MVEL.java:930) at org.drools.base.mvel.MVELConsequence.evaluate(MVELConsequence.java:104) at org.drools.common.DefaultAgenda.fireActivation(DefaultAgenda.java:1287)
      ... 9 moreCaused by: java.lang.reflect.InvocationTargetException
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.mvel2.optimizers.impl.refl.nodes.MethodAccessor.getValue(MethodAccessor.java:48)
      ... 17 more
      Caused by: java.lang.RuntimeException: Cannot spawn an instance
      at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.delegateSpawn(RuleTasksDelegator.java:107)
      ... 22 moreCaused by: org.apache.stratos.autoscaler.exception.SpawningException: Failed to start an instance. MemberContext [memberId=lb01.lb01.domaincde68af9-d82f-42c3-9c01-4fd8da565867, nodeId=null, clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null, publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177492261, lbClusterId=null, networkPartitionId=N1] Cause: error running 1 node group(lb01lb01) location(RegionOne) image(0299be4d-a743-4424-ae28-f40bd4faa669) size(2e2e2b47-9f40-4bd8-9777-83e802f5f1cd) options({inboundPorts=[], autoAssignFloatingIp=false, securityGroupNames=[default], keyPairName=phoenix, userData=[B@14edd531, configDrive=false, novaNetworks=[Network{networkUuid=42c4a88d-0d59-4fbb-90f0-9b9806f9c17c, portUuid=null, fixedIp=172.16.2.201}, Network{networkUuid=6a2615e4-760c-4c93-895d-b4b16e550193, portUuid=null, fixedIp=10.81.69.201}

      , Network

      {networkUuid=670550f0-67fc-48ff-a33c-e184a7908247, portUuid=null, fixedIp=10.13.5.81}]})
      Execution failures:

      0 error[s]
      Node failures:

      1) IllegalStateException on node RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10:java.lang.IllegalStateException: node(RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10) didn't achieve the status running; aborting after 1 seconds with final status: ERROR
      at org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:72)
      at org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:45) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:121) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
      at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711) at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

      1 error[s] at org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient.spawnAnInstance(CloudControllerClient.java:174)
      at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.delegateSpawn(RuleTasksDelegator.java:87)
      ... 22 moreCaused by: org.apache.axis2.AxisFault: Failed to start an instance. MemberContext [memberId=lb01.lb01.domaincde68af9-d82f-42c3-9c01-4fd8da565867, nodeId=null, clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null, publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177492261, lbClusterId=null, networkPartitionId=N1] Cause: error running 1 node group(lb01lb01) location(RegionOne) image(0299be4d-a743-4424-ae28-f40bd4faa669) size(2e2e2b47-9f40-4bd8-9777-83e802f5f1cd) options({inboundPorts=[], autoAssignFloatingIp=false, securityGroupNames=[default], keyPairName=phoenix, userData=[B@14edd531, configDrive=false, novaNetworks=[Network{networkUuid=42c4a88d-0d59-4fbb-90f0-9b9806f9c17c, portUuid=null, fixedIp=172.16.2.201}, Network{networkUuid=6a2615e4-760c-4c93-895d-b4b16e550193, portUuid=null, fixedIp=10.81.69.201}, Network{networkUuid=670550f0-67fc-48ff-a33c-e184a7908247, portUuid=null, fixedIp=10.13.5.81}

      ]})
      Execution failures:

      0 error[s]Node failures:

      1) IllegalStateException on node RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10:java.lang.IllegalStateException: node(RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10) didn't achieve the status running; aborting after 1 seconds with final status: ERROR
      at org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:72)
      at org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:45) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:121) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146) at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
      at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711) at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

      1 error[s]
      at org.apache.axis2.util.Utils.getInboundFaultFromMessageContext(Utils.java:531)
      at org.apache.axis2.description.OutInAxisOperationClient.handleResponse(OutInAxisOperation.java:370)
      at org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:445)
      at org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:225)
      at org.apache.axis2.client.OperationClient.execute(OperationClient.java:149)
      at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.startInstance(CloudControllerServiceStub.java:1407) at org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient.spawnAnInstance(CloudControllerClient.java:162)
      ... 23 more
      TID: [0] [STRATOS] [2015-03-24 06:13:04,511] INFO

      {org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient} - Trying to spawn an instance via cloud controller: [cluster] lb01.lb01.domain [partition] RegionOne-Core [lb-cluster] null [network-partition-id] N1 {org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient}

      TID: [0] [STRATOS] [2015-03-24 06:13:09,114] INFO

      {org.wso2.carbon.databridge.core.DataBridge} - admin connected {org.wso2.carbon.databridge.core.DataBridge}

      TID: [0] [STRATOS] [2015-03-24 06:13:16,902] INFO

      {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} - Instance is successfully starting up. MemberContext [memberId=lb01.lb01.domain64df2bd7-ee48-4ca2-9dec-362b72543d86, nodeId=RegionOne/aa1a0e56-a722-444a-99f5-080ef844fb2d, clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null, publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177584511, lbClusterId=null, networkPartitionId=N1] {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jeffrngu Jeffrey Nguyen
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: