Description
Recent stress testing for IMRU FT shows when there are hundreds of hundreds nodes with many failed and re-requested elevators, the default Tcp connection retry time is not longer enough for evaluators to connect to driver, as driver might be very busy to handle a long event queue and each event handler is locked in IMRU driver.
We need to set proper configuration values for IMRU example which is used for running stress testing.
In some of the IMRU tests, this configuration data is set through task function config that is incorrect. It should be set through configuration provider to make it work.
Attachments
Issue Links
- Is contained by
-
REEF-1223 IMRU Fault Tolerance - restart failed evaluators
- Resolved