Trying to hunt down some slow executions consistently happening about 1/4 of the time (about 7x slower than normal).
Comparing summaries of normal and slow queries the 'Max Time' column shows about the same for all fragments. Furthermore in the timeline the slow query takes most of the entire runtime between 'ready to start fragments' and 'all fragments started'. But there is no way to know which fragments on which nodes started late without timestamps, and no way to pinpoint a problematic node.
For more details see this discussion