Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
This was caused by a very tricky race-condition in the way python multiprocessing.thread works resulting in deadlock in ambari_agent.ActionQueue thread.
The problem is the below flow:
If this all these three get executed at the same time (a very rear occasion):
1. Process1 executes queue.get(False)
2. Process2 executes queue.put(largeObjectWhichTakesLongTimeToPut)
3. Someone kills Process2.
This results in deadlock in process1 get. Which is caused by queue locks/semaphores to being released during put of process2.
I have wrote a script test_race_condition.py to emulate this behaviour and indeed could reproduce this and test the fix for it.
Attachments
Attachments
Issue Links
- links to