Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-18728

During cluster install, Components get timed out icon while starting

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.5.0
    • None
    • None

    Description

      This was caused by a very tricky race-condition in the way python multiprocessing.thread works resulting in deadlock in ambari_agent.ActionQueue thread.
      The problem is the below flow:
      If this all these three get executed at the same time (a very rear occasion):
      1. Process1 executes queue.get(False)
      2. Process2 executes queue.put(largeObjectWhichTakesLongTimeToPut)
      3. Someone kills Process2.

      This results in deadlock in process1 get. Which is caused by queue locks/semaphores to being released during put of process2.

      I have wrote a script test_race_condition.py to emulate this behaviour and indeed could reproduce this and test the fix for it.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            aonishuk Andrew Onischuk
            vrathod Vivek Rathod
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment