Devaraj, addressing your points:
1) Why do you think that progress rate would be better than expected time to
completion? We thought about this in the OSDI paper a bit, here is what we said:
"The primary insight behind our scheduler is the fol- lowing: We always
speculatively execute the task that we think will ﬁnish farthest into the
future, because this task provides the greatest opportunity for a speculative
task to overtake the original and save a signiﬁcant amount of time."
If speculative tasks didn't have to redo (potentially a lot of) work that the
original task attempt already completed, I would agree that using progress-rate
would be enough.
In a perfect world, we would only want to speculate on a task if 1) we
calculated its expected finish time on the node it's currently running on based
on % of task remaining and speed of the node it is running on, and then 2) also
calculated its expected finish time on the new node that we are considering
launching it on speculatively based on running it from the beginning at the
speed of the new node.
As such, we would only ever launch speculative tasks on nodes that are faster
than the node they are currently running on, which makes it seem like progress-rate
would be enough. However, sometimes just being faster isn't enough since it
doesn't guarantee that the speculative task will finish before the original
because the speculative task has to catch up with work that has already been
completed by the current task attempt.
For example, TT1 comes in asking for a speculative task. Let's say that TT1 can
progress at a rate of 3% of task progress per second. We look at tasks running
on TT2 and TT3 to decide which one to speculate.
TT2 is 90% done with task A and is progressing at 1% of progress per second. TT3
is 10% done with task B and is progressing at 2% of progress per second.
In this case, sorting based on progress rate would mean we speculate task A because
progress is being made more slowly on that task. However, if you think about it,
TT2 would have finished task A in another 10 seconds. If we speculate it, TT1
will get 30% done with it and then get killed because TT2 finished the task.
On the other hand, TT3 would have required another 45 seconds to finish task B.
If we speculate task B on TT1, it will finish the task in 33 seconds which is
faster than TT3 could have done it in.
In this case, using expected time to completion would be the right thing to do.
This was the motivation for LATE (as we described in the OSDI paper).
2) I will work on this soon (this weekend)
3) I will remove that code
4) Using tip.getDispatchTime() instead of tip.getExecStartTime() should do the
I will work on all of this ASAP (hopefully over the weekend).