Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The Job Driver built-in error handler is not readily "programmable".
A desired feature is to specify, per Job, the number of work item time-outs allowed for each work item. Presently, the number of retrys per work item time-out is 0. The default remains 0.
Allow users job submissions to specify for driver_jvm_args -DJobDriverErrorHandlerMaximumNumberOfTimeoutRetrysPerWorkItem=N, where N is an integer. If a work item times-out, it will be counted. If the number of retrys for that work item does not exceed the specified integer, it will be retried, else it will be a work item error.