There is a race condition in REEF-Local-Runtime, and it can happen as follows:
- The Evaluator sends the DONE message and exits its process.
- The RM discovers Evaluator ends, sends DONE message to Driver.
- Driver first gets DONE message from RM before getting reading the DONE message from the Evaluator in its network queue.
- Driver calls FailedEvaluatorHandler, even though the Evaluator shuts down properly.
This can be fixed by requiring an ACK from the Driver prior to letting the Evaluator exit its process.