Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-18728

During cluster install, Components get timed out icon while starting

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.5.0
    • None
    • None

    Description

      This was caused by a very tricky race-condition in the way python multiprocessing.thread works resulting in deadlock in ambari_agent.ActionQueue thread.
      The problem is the below flow:
      If this all these three get executed at the same time (a very rear occasion):
      1. Process1 executes queue.get(False)
      2. Process2 executes queue.put(largeObjectWhichTakesLongTimeToPut)
      3. Someone kills Process2.

      This results in deadlock in process1 get. Which is caused by queue locks/semaphores to being released during put of process2.

      I have wrote a script test_race_condition.py to emulate this behaviour and indeed could reproduce this and test the fix for it.

      Attachments

        1. AMBARI-18728.patch
          2 kB
          Andrew Onischuk

        Issue Links

          Activity

            People

              aonishuk Andrew Onischuk
              vrathod Vivek Rathod
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: