Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.1.0-Ducc
-
None
Description
If a node running a process for a job of type 'fixed' crashes RM will 'purge' the node forcing the rest of ducc to clear its records of the process.
If you then 'bounce' RM, it will think the 'fixed' job is missing an allocation (which it technically is, but 'fixed' is defined such that processes that go away are not replaced).
RM needs logic so that while recovering non-preemptable jobs that have any allocation, it marks them 'allocation complete', the logic being that no allocation would have been given unless it was complete at some time in the past.