Uploaded image for project: 'CloudStack'
  1. CloudStack
  2. CLOUDSTACK-5688

NPE when the KVM host is rebooted on the upgraded environment

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 4.3.0
    • 4.4.0
    • KVM, Upgrade
    • Security Level: Public (Anyone can view this level - this is the default.)
    • None
    • upgraded from 4.2.1 to 4.3

    Description

      Steps:
      1. Have CS 4.2.1 with KVM host.---advanced zone.
      2. Upgrade the CS to 4.3.
      3. Reboot the KVM host(did not enable maintenance)
      4. The KVM host is connected and then disconnecting from the CS.
      Observing the following NPE in CS logs:

      2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to BaremetalDhcpManagerImpl
      2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to NetworkUsageManagerImpl
      2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to BaremetalPxeManagerImpl
      2013-12-31 18:03:38,128 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to PaloAltoExternalFirewallElement
      2013-12-31 18:03:38,128 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to KvmServerDiscoverer
      2013-12-31 18:03:38,172 DEBUG [c.c.r.ResourceState] (AgentConnectTaskPool-194:ctx-00137ad5) Resource state update: [id = 1; name = RHLE64; old state = Enabled; event = InternalCreated; new state = Enabled]
      2013-12-31 18:03:38,173 DEBUG [c.c.h.Status] (AgentConnectTaskPool-194:ctx-00137ad5) Transition:[Resource state = Enabled, Agent event = AgentConnected, Host id = 1, name = RHLE64]
      2013-12-31 18:03:38,185 DEBUG [c.c.h.Status] (AgentConnectTaskPool-194:ctx-00137ad5) Agent status update: [id = 1; name = RHLE64; old status = Alert; event = AgentConnected; new status = Connecting; old update count = 390; new update count = 391]
      2013-12-31 18:03:38,185 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) create ClusteredAgentAttache for 1
      2013-12-31 18:03:38,188 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: XcpServerDiscoverer
      2013-12-31 18:03:38,189 DEBUG [c.c.h.x.d.XcpServerDiscoverer] (AgentConnectTaskPool-194:ctx-00137ad5) Not XenServer so moving on.
      2013-12-31 18:03:38,189 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: HypervServerDiscoverer
      2013-12-31 18:03:38,189 DEBUG [c.c.h.h.d.HypervServerDiscoverer] (AgentConnectTaskPool-194:ctx-00137ad5) Not Hyper-V hypervisor, so moving on.
      2013-12-31 18:03:38,189 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: StoragePoolMonitor
      2013-12-31 18:03:38,202 DEBUG [c.c.s.l.StoragePoolMonitor] (AgentConnectTaskPool-194:ctx-00137ad5) Host 1 connected, sending down storage pool information ...
      2013-12-31 18:03:38,205 DEBUG [c.c.s.StorageManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Adding pool null to host 1
      2013-12-31 18:03:38,215 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387585: Sending { Cmd , MgmtId: 6758231703598, via: 1(RHLE64), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":true,"pool":

      {"id":1,"uuid":"efbe1d5a-12c4-3a30-bef6-a219dfe81249","host":"10.147.40.27","path":"/export/home/manasa/primaryKVM","port":2049,"type":"NetworkFilesystem"}

      ,"localPath":"/mnt//efbe1d5a-12c4-3a30-bef6-a219dfe81249","wait":0}}] }
      2013-12-31 18:03:38,220 DEBUG [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-6:null) Ping from 1
      2013-12-31 18:03:38,228 DEBUG [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-6:null) Not processing PingRoutingCommand for agent id=0; can't find the host in the DB
      2013-12-31 18:03:38,233 DEBUG [c.c.a.t.Request] (AgentManager-Handler-4:null) Seq 1-920387585: Processing: { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.ModifyStoragePoolAnswer":{"poolInfo":

      {"host":"10.147.40.27","localPath":"/mnt//efbe1d5a-12c4-3a30-bef6-a219dfe81249","hostPath":"/export/home/manasa/primaryKVM","poolType":"NetworkFilesystem","capacityBytes":52844687360,"availableBytes":40225341440}

      ,"templateInfo":{},"result":true,"wait":0}}] }
      2013-12-31 18:03:38,234 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387585: Received: { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 10,

      { ModifyStoragePoolAnswer }

      }
      2013-12-31 18:03:38,273 INFO [o.a.c.s.d.p.DefaultHostListener] (AgentConnectTaskPool-194:ctx-00137ad5) Connection established between org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@22daf59c host + 1
      2013-12-31 18:03:38,283 DEBUG [c.c.s.StorageManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Successfully set Capacity - 105689374720 for capacity type - 3 , DataCenterId - 1, HostOrPoolId - 1, PodId null
      2013-12-31 18:03:38,283 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: DeploymentPlanningManagerImpl
      2013-12-31 18:03:38,284 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: NetworkOrchestrator
      2013-12-31 18:03:38,286 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-194:ctx-00137ad5) Host's hypervisorType is: KVM
      2013-12-31 18:03:38,293 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-194:ctx-00137ad5) Sending CheckNetworkCommand to check the Network is setup correctly on Agent
      2013-12-31 18:03:38,306 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387586: Sending { Cmd , MgmtId: 6758231703598, via: 1(RHLE64), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.CheckNetworkCommand":{"networkInfoList":[

      {"physicalNetworkId":200}

      ],"wait":0}}] }
      2013-12-31 18:03:38,360 DEBUG [c.c.a.t.Request] (AgentManager-Handler-9:null) Seq 1-920387586: Processing: { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 110, [{"com.cloud.agent.api.CheckNetworkAnswer":{"_reconnect":false,"result":true,"wait":0}}] }
      2013-12-31 18:03:38,360 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387586: Received: { Ans: , MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 110,

      { CheckNetworkAnswer }

      }
      2013-12-31 18:03:38,360 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-194:ctx-00137ad5) Network setup is correct on Agent
      2013-12-31 18:03:38,361 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: VmwareManagerImpl
      2013-12-31 18:03:38,362 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: SecondaryStorageListener
      2013-12-31 18:03:38,362 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: ClusteredVirtualMachineManagerImpl
      2013-12-31 18:03:38,362 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Received startup command from hypervisor host. host id: 1
      2013-12-31 18:03:38,362 INFO [c.c.v.VirtualMachinePowerStateSyncImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Reset VM power state sync for host: 1
      2013-12-31 18:03:38,361 DEBUG [c.c.a.m.AgentAttache] (AgentManager-Handler-9:null) Seq 1-920387586: No more commands found
      2013-12-31 18:03:38,383 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Found 8 VMs for host 1
      2013-12-31 18:03:38,385 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM s-1-VM: cs state = Running and realState = Stopped
      2013-12-31 18:03:38,385 DEBUG [c.c.v.VirtualMachineManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM s-1-VM: cs state = Running and realState = Stopped
      2013-12-31 18:03:38,386 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM does not require investigation so I'm marking it as Stopped: VM[SecondaryStorageVm|s-1-VM]
      2013-12-31 18:03:38,386 WARN [o.a.c.alerts] (AgentConnectTaskPool-194:ctx-00137ad5) alertType:: 19 // dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name: s-1-VM, id: 1) stopped unexpectedly on host id:1, availability zone id:1, pod id:1
      2013-12-31 18:03:38,396 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) VM is not HA enabled so we're done.
      2013-12-31 18:03:38,396 ERROR [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Monitor ClusteredVirtualMachineManagerImpl says there is an error in the connect process for 1 due to null
      java.lang.NullPointerException
      at com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1264)
      at com.cloud.ha.HighAvailabilityManagerImpl.scheduleRestart(HighAvailabilityManagerImpl.java:346)
      at com.cloud.vm.VirtualMachineManagerImpl.compareState(VirtualMachineManagerImpl.java:2715)
      at com.cloud.vm.VirtualMachineManagerImpl.fullHostSync(VirtualMachineManagerImpl.java:2278)
      at com.cloud.vm.VirtualMachineManagerImpl.processConnect(VirtualMachineManagerImpl.java:2929)
      at com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:495)
      at com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:999)
      at com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:117)
      at com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1082)
      at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
      at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
      at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
      at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
      at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      at java.lang.Thread.run(Thread.java:722)
      2013-12-31 18:03:38,398 INFO [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Host 1 is disconnecting with event AgentDisconnected
      2013-12-31 18:03:38,401 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) The next status of agent 1is Alert, current status is Connecting
      2013-12-31 18:03:38,402 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Deregistering link for 1 with state Alert
      2013-12-31 18:03:38,402 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Remove Agent : 1
      2013-12-31 18:03:38,402 DEBUG [c.c.a.m.ConnectedAgentAttache] (AgentConnectTaskPool-194:ctx-00137ad5) Processing Disconnect.
      2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer
      2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
      2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.storage.listener.StoragePoolMonitor
      2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.deploy.DeploymentPlanningManagerImpl
      2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
      2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.hypervisor.vmware.manager.VmwareManagerImpl
      2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.storage.secondary.SecondaryStorageListener
      2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
      2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.network.security.SecurityGroupListener
      2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.consoleproxy.ConsoleProxyListener
      2013-12-31 18:03:38,407 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
      2013-12-31 18:03:38,407 DEBUG [c.c.n.NetworkUsageManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Disconnected called on 1 with status Alert
      2013-12-31 18:03:38,407 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.storage.LocalStoragePoolListener
      2013-12-31 18:03:38,408 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: com.cloud.agent.manager.AgentManager

      The KVM host will be in Alert state:

      The log snippet from agent log:

      2013-12-31 13:06:17,978 INFO [cloud.agent.Agent] (Agent-Handler-1:null) Reconnecting...
      2013-12-31 13:06:17,979 INFO [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.147.38.164:8250
      2013-12-31 13:06:18,070 INFO [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
      2013-12-31 13:06:18,070 INFO [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.147.38.164:8250
      2013-12-31 13:06:18,124 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Proccess agent startup answer, agent id = 0
      2013-12-31 13:06:18,125 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Set agent id 0
      2013-12-31 13:06:18,125 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Startup Response Received: agent id = 0
      2013-12-31 13:06:23,070 INFO [cloud.agent.Agent] (Agent-Handler-1:null) Connected to the server
      2013-12-31 13:06:23,378 INFO [cloud.agent.Agent] (Agent-Handler-1:null) Lost connection to the server. Dealing with the remaining commands...
      2013-12-31 13:06:28,379 INFO [cloud.agent.Agent] (Agent-Handler-1:null) Reconnecting...
      2013-12-31 13:06:28,380 INFO [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.147.38.164:8250
      2013-12-31 13:06:28,473 INFO [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
      2013-12-31 13:06:28,473 INFO [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.147.38.164:8250
      2013-12-31 13:06:28,528 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Proccess agent startup answer, agent id = 0
      2013-12-31 13:06:28,528 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Set agent id 0
      2013-12-31 13:06:28,529 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Startup Response Received: agent id = 0

      Attaching the MS and agent logs.
      Dumps before and after upgrade.

      Attachments

        1. agent.log
          1.43 MB
          manasaveloori
        2. management-server.rar
          1.04 MB
          manasaveloori
        3. mysqldump421.dmp
          858 kB
          manasaveloori
        4. mysqldump43.dmp
          914 kB
          manasaveloori

        Activity

          People

            kelveny Kelven Yang
            t_manasave manasaveloori
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: