Affects Version/s: 1.10.0
Fix Version/s: 1.10.0
Component/s: Runtime / Coordination
Release Note:Serialized `JobGraphs` which set the `ResourceSpec` created by Flink versions < 1.10 are no longer compatible with Flink >= 1.10. If you want to migrate these jobs to Flink 1.10.0 you will have to stop the job with a savepoint and then resume it from this savepoint on the Flink 1.10.0 cluster.
There are resources of double type values, like cpuCores in ResourceSpec/ResourceProfiles or all extended resources. These values can be generated via a merge or subtract, so that there can be small deltas.
Currently, in resource matching, these resources are matched without considering the deltas, which may result in issues as below:
1. A shared slot cannot fulfill a slot request even if it should be able to (because it is possible that (d1 + d2) - d1 < d2 for double values)
2. if a shared slot is used up, an unexpected error may occur when calculating its remaining resources in SlotSharingManager#listResolvedRootSlotInfo -> ResourceProfile#subtract
3. an unexpected error may happen when releasing a single task slot from a shared slot (in ResourceProfile#subtract)
To solve this issue, I'd propose to:
1. Change Resource to use BigDecimal to manage double values. This enabled the values able to be strictly compared, and able to be additively merged/subtracted with no precision loss. Extended resources can work correctly with double values with this change.
2. Introduce CPUResource to represent cpu cores. It is based on Resource
3. Change ResourceSpec/ResourceProfile to use CPUResource for cpu cores