While working on
DRILL-2245, I ran into problems because Foreman.moveToState() can be called recursively. Being synchronized doesn't prevent the same thread from calling back into this. When this happens, state transitions are either dropped or broken depending on when this happens. This event really needs to be handled atomically, but when queryContext.cancelExecutingFragments() is called, the completion of the cancellations results in QueryManager.fragmentDone() being called, and it indiscriminately sends the COMPLETED state, even though we may be in the middle of handling a move to FAILED or CANCELLATION_REQUESTED. Unless the target state is first recorded before any possible recursive call happens, the recursive transition is recorded, and the second transition doesn't result in any message being sent to the user – the client hangs.
It's difficult to reason about when its safe for this kind of recursive call to happen, and even more difficult to avoid inadvertently introducing these. Problems happen when the delivery of an event to a listener causes cascaded events that eventually deliver to the same listener.
In order to avoid this, we should create (and use) a generic class that can receive such events. If nothing is happening when one is received, it is processed immediately, in line. If an event is received when we're in the middle of processing another event, the newly received one is queued. When the current event processing is completed, we then go through any queued events, and process them one by one. We continue in this way until there are no more events left, then return. This mechanism would improve code safety by preventing the delivery of events while in the middle of processing other events by the same handler.