Wangda Tan, the test failures seem unrelated. Had submitted patch again. Lets see what that gives.
Now my understanding of this patch is, preemption events still go to rmDispatcher, and rmDispatcher sends events to schedulerDispatcher.
Yes, you are correct.
The issue in the JIRA is that on picking up the ContainerPreempt Event from RM Dispatcher Queue, this event is processed by the RM Dispatcher thread itself(which can block this dispatcher) instead of putting it in another dispatcher queue. This was blocking the RM Dispatcher thread because it was waiting for scheduler lock. Hence other events in the queue were getting delayed.
The current design is such that different dispatchers(based on event type) are registered to main RM Dispatcher. And RM Dispatcher then dispatches these events to appropriate registered dispatchers. Although, its an additional step(and may cost some extra processing time), I think this has been done to have a single point of contact for dispatching events across RM.
We can open an interface for posting events directly to Scheduler Events as well. That ofcourse would be faster. But then this change should be made for all the scheduler events not only container preemption events. Thoughts ?