Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-25613

Concurrent Host Modification exception while sending INSTALL/START Host request

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.6
    • Fix Version/s: None
    • Component/s: ambari-server
    • Labels:
      None

      Description

       

      java.lang.RuntimeException: START Host request submission failed: java.lang.RuntimeException: Update Host request submission failed: java.util.ConcurrentModificationException
      at org.apache.ambari.server.topology.AmbariContext.startHost(AmbariContext.java:497)
      at org.apache.ambari.server.topology.ClusterTopologyImpl.startHost(ClusterTopologyImpl.java:268)
      at org.apache.ambari.server.topology.tasks.StartHostTask.runTask(StartHostTask.java:51)
      at org.apache.ambari.server.topology.tasks.TopologyHostTask.run(TopologyHostTask.java:55)
      at org.apache.ambari.server.topology.HostOfferResponse$1.run(HostOfferResponse.java:85)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.RuntimeException: Update Host request submission failed: java.util.ConcurrentModificationException
      at org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:865)
      at org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:852)
      at org.apache.ambari.server.controller.internal.AbstractResourceProvider.invokeWithRetry(AbstractResourceProvider.java:465)
      at org.apache.ambari.server.controller.internal.AbstractResourceProvider.modifyResources(AbstractResourceProvider.java:346)
      at org.apache.ambari.server.controller.internal.HostComponentResourceProvider.doUpdateResources(HostComponentResourceProvider.java:852)
      at org.apache.ambari.server.controller.internal.HostComponentResourceProvider.start(HostComponentResourceProvider.java:492)
      at org.apache.ambari.server.topology.AmbariContext.startHost(AmbariContext.java:494)
      at org.apache.ambari.server.topology.ClusterTopologyImpl.startHost(ClusterTopologyImpl.java:268)
      at org.apache.ambari.server.topology.tasks.StartHostTask.runTask(StartHostTask.java:51)
      at org.apache.ambari.server.topology.tasks.TopologyHostTask.run(TopologyHostTask.java:55)
      at org.apache.ambari.server.topology.HostOfferResponse$1.run(HostOfferResponse.java:85)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: java.util.ConcurrentModificationException: NA
      at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
      at java.util.HashMap$EntryIterator.next(HashMap.java:1479)
      at java.util.HashMap$EntryIterator.next(HashMap.java:1477)
      at java.util.HashMap.putMapEntries(HashMap.java:512)
      at java.util.HashMap.<init>(HashMap.java:490)
      at org.apache.ambari.server.topology.HostRequest.getPhysicalTaskMapping(HostRequest.java:458)
      at org.apache.ambari.server.topology.LogicalRequest.getStageSummaries(LogicalRequest.java:286)
      at org.apache.ambari.server.topology.TopologyManager.getPendingHostComponents(TopologyManager.java:823)
      at org.apache.ambari.server.utils.StageUtils.getClusterHostInfo(StageUtils.java:306)
      at org.apache.ambari.server.controller.AmbariManagementControllerImpl.doStageCreation(AmbariManagementControllerImpl.java:2788)
      at org.apache.ambari.server.controller.AmbariManagementControllerImpl.addStages(AmbariManagementControllerImpl.java:3513)
      at org.apache.ambari.server.controller.internal.HostComponentResourceProvider.updateHostComponents(HostComponentResourceProvider.java:707)
      at org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:857)
      at org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:852)
      at org.apache.ambari.server.controller.internal.AbstractResourceProvider.invokeWithRetry(AbstractResourceProvider.java:465)
      at org.apache.ambari.server.controller.internal.AbstractResourceProvider.modifyResources(AbstractResourceProvider.java:346)
      at org.apache.ambari.server.controller.internal.HostComponentResourceProvider.doUpdateResources(HostComponentResourceProvider.java:852)
      at org.apache.ambari.server.controller.internal.HostComponentResourceProvider.start(HostComponentResourceProvider.java:492)
      at org.apache.ambari.server.topology.AmbariContext.startHost(AmbariContext.java:494)
      at org.apache.ambari.server.topology.ClusterTopologyImpl.startHost(ClusterTopologyImpl.java:268)
      at org.apache.ambari.server.topology.tasks.StartHostTask.runTask(StartHostTask.java:51)
      at org.apache.ambari.server.topology.tasks.TopologyHostTask.run(TopologyHostTask.java:55)
      at org.apache.ambari.server.topology.HostOfferResponse$1.run(HostOfferResponse.java:85)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      

       

       

       

      My teammate Ramkrishna S. Vasudevan did some analysis on this one by adding logs and latches and found that the installation and registration though done parallely each thread tries to get the entire cluster’s view of the current physical tasks. So it is bound to happen that when a registration is happening the other thread can do a getPhysicalTaskMapping().  (leading to CME)

       

       
       
       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Suraj Naik Suraj Naik
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m