Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-15141

Start all services request aborts in the middle and hosts go into heartbeat-lost state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.2.0
    • 2.2.2
    • ambari-server
    • None

    Description

      On the 1600 node cluster I attempted to do Stop-All and Start-All actions. During both actions, the request would abort itself - sometimes even when there were no failures or timeouts. Also, during this time hosts would go into heartbeat-lost state and the cluster would look like its broken. After like 5-10 minutes, the hosts slowly come back online and correct host-status is got. But due to request abort, some components would be stopped/started when they should not be.

      Attachments

        1. AMBARI-15141_branch-2.2.patch
          215 kB
          Papirkovskyy Myroslav
        2. AMBARI-15141.patch
          216 kB
          Papirkovskyy Myroslav

        Issue Links

          Activity

            People

              mpapirkovskyy Papirkovskyy Myroslav
              mpapirkovskyy Papirkovskyy Myroslav
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: