> I'm concerned that this might blow up different schedulers in different ways.
I don't think that's a problem since the code change only affects job submission, which kicks in before scheduling code is run.
> Maybe we need to do an 'if' check during recovery and not throw an IOException?
I had another look at this and came up with a new patch. Does it look better?
The Hadoop 2 change sounds like the right approach. At first I thought we didn't need the property in Hadoop 2, due to
MAPREDUCE-2702, but actually it would allow users to mark a job as non-recoverable on a per-instance basis. It would build on YARN-128.