Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-20593

EU/RU Auto-Retry does not reschedule task when host is not heartbeating before task is scheduled and doesn't have a start time

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.5.0
    • 2.5.1
    • ambari-server
    • rolling upgrade

    Description

      STR:
      1) Install ambari 2.5.0.1
      In the ambari.properties file, set
      stack.upgrade.auto.retry.timeout.mins=6
      stack.upgrade.auto.retry.check.interval.secs=30

      2) Install HDP with any set of services
      3) Add NameNode HA
      4) Register and install new HDP stack version
      5) Start RU
      5) Corrupt one step from Core Masters group (e.g., stop ambari-agent on a node while the command is running)
      Ambari will restart Restarting NN Batch 1
      6) Fix corrupted step (e.g., start ambari-agent again)
      7) Corrupt another step from before the command is scheduled (e.g., stop ambari-agent on a node)
      8) Fix corrupted step (e.g., start ambari-agent agent)

      The expectation is that Ambari Server should schedule the command on the 2nd node. However, because the command never got an original_start_time and start_time, the RetryUpgradeActionService was not able to retry it since it didn't have any timestamps to compare against.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            afernandez Alejandro Fernandez Assign to me
            stereshchenko Sviatoslav Tereshchenko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment