Affects Version/s: 0.21.0
Fix Version/s: 0.21.0
We had one job with speculative execution hang.
4 reduce tasks were stuck with 95% completion because of a bad disk.
Devaraj pointed out
bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
It would be nice if speculative execution also starts up when tasks stop making progress.
Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check.