Uploaded image for project: 'CloudStack'
  1. CloudStack
  2. CLOUDSTACK-7489

Unable to expunge VM due to failing to revoke all static nat rules

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.4.0
    • None
    • None
    • Security Level: Public (Anyone can view this level - this is the default.)
    • None

    Description

      I've seen these errors a few times in the management-server.log now. It has happened on a few different networks and ips, but we have not located a root cause for it yet. It basically prevents the VM from being expunged.

      # grep 'ctx-2a7eee77' /var/log/cloudstack/management/management-server.log
      2014-09-04 10:00:41,982 INFO  [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Found 1 vms to expunge.
      2014-09-04 10:00:41,994 WARN  [o.a.c.f.j.AsyncJobExecutionContext] (UserVm-Scavenger-10:ctx-2a7eee77) Job is executed without a context, setup psudo job for the executing thread
      2014-09-04 10:00:42,160 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Sync job-170172 execution on object VmWorkJobQueue.6715
      2014-09-04 10:00:42,170 WARN  [c.c.u.d.Merovingian2] (UserVm-Scavenger-10:ctx-2a7eee77) Was unable to find lock for the key vm_instance6715 and thread id 745113246
      2014-09-04 10:00:45,678 DEBUG [c.c.c.CapacityManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) VM state transitted from :Expunging to Expunging with event: ExpungeOperationvm's original host id: 12 new host id: null host id before state transition: null
      2014-09-04 10:00:45,680 DEBUG [c.c.v.VirtualMachineManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Destroying vm VM[User|i-258-6715-VM]
      2014-09-04 10:00:45,681 DEBUG [c.c.v.VirtualMachineManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Cleaning up NICS
      2014-09-04 10:00:45,681 DEBUG [o.a.c.e.o.NetworkOrchestrator] (UserVm-Scavenger-10:ctx-2a7eee77) Cleaning network for vm: 6715
      2014-09-04 10:00:45,688 DEBUG [c.c.v.VirtualMachineManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Cleaning up hypervisor data structures (ex. SRs in XenServer) for managed storage
      2014-09-04 10:00:45,745 DEBUG [o.a.c.e.o.VolumeOrchestrator] (UserVm-Scavenger-10:ctx-2a7eee77) Cleaning storage for vm: 6715
      2014-09-04 10:00:45,759 DEBUG [c.c.v.VirtualMachineManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Expunged VM[User|i-258-6715-VM]
      2014-09-04 10:00:45,759 DEBUG [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Starting cleaning up vm VM[User|i-258-6715-VM] resources...
      2014-09-04 10:00:45,830 DEBUG [c.c.n.f.FirewallManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) No firewall rules are found for vm id=6715
      2014-09-04 10:00:45,885 DEBUG [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Firewall rules are removed successfully as a part of vm id=6715 expunge
      2014-09-04 10:00:45,896 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) No port forwarding rules are found for vm id=6715
      2014-09-04 10:00:45,896 DEBUG [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Port forwarding rules are removed successfully as a part of vm id=6715 expunge
      2014-09-04 10:00:45,898 DEBUG [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Removed vm id=6715 from all load balancers as a part of expunge process
      2014-09-04 10:00:45,914 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Revoking all Firewallrules as a part of disabling static nat for public IP id=235
      2014-09-04 10:00:45,940 DEBUG [c.c.n.f.FirewallManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Releasing 0 firewall rules for ip id=235
      2014-09-04 10:00:45,943 DEBUG [c.c.n.f.FirewallManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) There are no firewall rules to apply
      2014-09-04 10:00:45,945 DEBUG [c.c.n.f.FirewallManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Successfully released firewall rules for ip id=235 and # of rules now = 0
      2014-09-04 10:00:45,969 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Releasing 0 port forwarding rules for ip id=235
      2014-09-04 10:00:45,972 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Releasing 0 static nat rules for ip id=235
      2014-09-04 10:00:45,999 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) There are no port forwarding rules to apply for ip id=235
      2014-09-04 10:00:46,001 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) There are no static nat rules to apply for ip id=235
      2014-09-04 10:00:46,056 INFO  [o.a.c.e.o.NetworkOrchestrator] (UserVm-Scavenger-10:ctx-2a7eee77) Let VirtualRouter handle StaticNat in network 1734
      2014-09-04 10:00:46,069 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Applying static nat rules in network Ntwk[effabcf3-9e09-4402-8340-52e0a6b32d60|Guest|8]
      2014-09-04 10:00:46,090 DEBUG [c.c.a.m.ClusteredAgentAttache] (UserVm-Scavenger-10:ctx-2a7eee77) Seq 14-202943458208397152: Forwarding Seq 14-202943458208397152:  { Cmd , MgmtId: 182571079363322, via: 14(hv-11-1.phx), Ver: v1, Flags: 100001, [{"com.cloud.agent.api.routing.SetStaticNatRulesCommand":{"rules":[{"dstIp":"10.244.0.162","id":0,"srcIp":"100.100.100.242","revoked":true,"alreadyAdded":false,"purpose":"StaticNat","icmpType":0,"icmpCode":0,"defaultEgressPolicy":false}],"accessDetails":{"zone.network.type":"Advanced","router.name":"r-6578-VM","router.ip":"169.254.3.202","router.guest.ip":"10.244.0.1"},"wait":0}}] } to 33862771676063
      2014-09-04 10:00:47,068 DEBUG [c.c.a.t.Request] (UserVm-Scavenger-10:ctx-2a7eee77) Seq 14-202943458208397152: Received:  { Ans: , MgmtId: 182571079363322, via: 14, Ver: v1, Flags: 0, { Answer } }
      2014-09-04 10:00:47,121 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Applying ip association in network Ntwk[effabcf3-9e09-4402-8340-52e0a6b32d60|Guest|8]
      2014-09-04 10:00:47,188 DEBUG [c.c.a.m.ClusteredAgentAttache] (UserVm-Scavenger-10:ctx-2a7eee77) Seq 14-202943458208397153: Forwarding Seq 14-202943458208397153:  { Cmd , MgmtId: 182571079363322, via: 14(hv-11-1.phx), Ver: v1, Flags: 100001, [{"com.cloud.agent.api.routing.IpAssocCommand":{"ipAddresses":[{"accountId":258,"publicIp":"100.100.176.192","sourceNat":true,"add":true,"oneToOneNat":false,"firstIP":true,"broadcastUri":"vlan://310","vlanGateway":"100.100.176.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:67:cc:00:3b:cf","networkRate":1000,"trafficType":"Public","networkName":"cloud-mgmt","newNic":false},{"accountId":258,"publicIp":"100.100.176.43","sourceNat":false,"add":true,"oneToOneNat":true,"firstIP":false,"broadcastUri":"vlan://310","vlanGateway":"100.100.176.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:8e:ae:00:3b:cf","networkRate":1000,"trafficType":"Public","networkName":"cloud-mgmt","newNic":false},{"accountId":258,"publicIp":"100.100.176.41","sourceNat":false,"add":true,"oneToOneNat":true,"firstIP":false,"broadcastUri":"vlan://310","vlanGateway":"100.100.176.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:8e:ae:00:3b:cf","networkRate":1000,"trafficType":"Public","networkName":"cloud-mgmt","newNic":false},{"accountId":258,"publicIp":"100.100.176.42","sourceNat":false,"add":true,"oneToOneNat":true,"firstIP":false,"broadcastUri":"vlan://310","vlanGateway":"100.100.176.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:8e:ae:00:3b:cf","networkRate":1000,"trafficType":"Public","networkName":"cloud-mgmt","newNic":false}],"accessDetails":{"zone.network.type":"Advanced","router.name":"r-6578-VM","router.ip":"169.254.3.202","router.guest.ip":"10.244.0.1"},"wait":0}},{"com.cloud.agent.api.routing.IpAssocCommand":{"ipAddresses":[{"accountId":258,"publicIp":"100.100.100.242","sourceNat":true,"add":false,"oneToOneNat":true,"firstIP":true,"broadcastUri":"vlan://31","vlanGateway":"100.100.100.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:8c:cf:00:3b:cf","networkRate":1000,"trafficType":"Public","networkName":"cloud-mgmt","newNic":false}],"accessDetails":{"zone.network.type":"Advanced","router.name":"r-6578-VM","router.ip":"169.254.3.202","router.guest.ip":"10.244.0.1"},"wait":0}}] } to 33862771676063
      2014-09-04 10:00:55,319 DEBUG [c.c.a.t.Request] (UserVm-Scavenger-10:ctx-2a7eee77) Seq 14-202943458208397153: Received:  { Ans: , MgmtId: 182571079363322, via: 14, Ver: v1, Flags: 0, { GroupAnswer, Answer } }
      2014-09-04 10:00:55,319 WARN  [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Failed to create static nat rule due to
      2014-09-04 10:00:55,325 WARN  [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Unable to revoke all static nat rules for ip Ip[100.100.100.242-1]
      2014-09-04 10:00:55,325 WARN  [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Failed to disable one to one nat for the ip address id235
      2014-09-04 10:00:55,325 WARN  [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Failed to disable static nat for ip address Ip[100.100.100.242-1] as a part of vm id=6715 expunge
      2014-09-04 10:00:55,325 WARN  [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Failed to cleanup resources as a part of vm VM[User|i-258-6715-VM] expunge
      2014-09-04 10:00:55,326 WARN  [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Unable to expunge VM[User|i-258-6715-VM]
      

      The two exceptions that also happen but aren't visible with the above grep:

      2014-09-04 10:00:55,319 WARN  [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Failed to create static nat rule due to
      com.cloud.exception.ResourceUnavailableException: Resource [DataCenter:1] is unreachable: Unable to apply ip association on router
          at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyRules(VirtualNetworkApplianceManagerImpl.java:4018)
          at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.associatePublicIP(VirtualNetworkApplianceManagerImpl.java:3812)
          at com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl.associatePublicIP(VpcVirtualNetworkApplianceManagerImpl.java:496)
          at sun.reflect.GeneratedMethodAccessor392.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:606)
          at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
          at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
          at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
          at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
          at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
          at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
          at com.sun.proxy.$Proxy206.associatePublicIP(Unknown Source)
          at com.cloud.network.element.VirtualRouterElement.applyIps(VirtualRouterElement.java:473)
          at com.cloud.network.IpAddressManagerImpl.applyIpAssociations(IpAddressManagerImpl.java:977)
          at com.cloud.network.IpAddressManagerImpl.applyStaticNats(IpAddressManagerImpl.java:1750)
          at com.cloud.network.rules.RulesManagerImpl.applyStaticNatForIp(RulesManagerImpl.java:1325)
          at com.cloud.network.rules.RulesManagerImpl.revokeAllPFAndStaticNatRulesForIp(RulesManagerImpl.java:1104)
          at com.cloud.network.rules.RulesManagerImpl.disableStaticNat(RulesManagerImpl.java:1269)
          at sun.reflect.GeneratedMethodAccessor346.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:606)
          at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
          at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
          at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
          at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
          at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
          at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
          at com.sun.proxy.$Proxy122.disableStaticNat(Unknown Source)
          at com.cloud.vm.UserVmManagerImpl.cleanupVmResources(UserVmManagerImpl.java:1742)
          at com.cloud.vm.UserVmManagerImpl.expunge(UserVmManagerImpl.java:1681)
          at com.cloud.vm.UserVmManagerImpl.expungeVm(UserVmManagerImpl.java:3683)
          at com.cloud.vm.UserVmManagerImpl$ExpungeTask.runInContext(UserVmManagerImpl.java:1817)
          at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
          at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
          at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
          at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
          at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:745)
      
      2014-09-04 10:00:55,326 WARN  [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Unable to expunge VM[User|i-258-6715-VM]
      com.cloud.utils.exception.CloudRuntimeException: Failed to expunge vm with specified vmId
          at com.cloud.vm.UserVmManagerImpl.expungeVm(UserVmManagerImpl.java:3687)
          at com.cloud.vm.UserVmManagerImpl$ExpungeTask.runInContext(UserVmManagerImpl.java:1817)
          at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
          at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
          at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
          at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
          at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:745)
      

      The i-258-6715-VM vm mentioned is stuck in the state 'Expunging'. If I log in to the virtual router for this network, I do still see the .242 IP listed when I do 'ip addr show' so it indeed did not get removed from the VR. The IP was/is a static_nat for the vm trying to be expunged and also seems to be stuck in a 'Releasing' state.

      I do not, however, see any nat rules for this public ip in 'iptables -t nat -vnL'.

      Rebooting the virtual router might fix this, but I'm not sure what other steps to take from here to get a permanent fix in place.

      Attachments

        Activity

          People

            Unassigned Unassigned
            justyns Justyn Shull
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: