Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.10.0
Description
There are resources of double type values, like cpuCores in ResourceSpec/ResourceProfiles or all extended resources. These values can be generated via a merge or subtract, so that there can be small deltas.
Currently, in resource matching, these resources are matched without considering the deltas, which may result in issues as below:
1. A shared slot cannot fulfill a slot request even if it should be able to (because it is possible that (d1 + d2) - d1 < d2 for double values)
2. if a shared slot is used up, an unexpected error may occur when calculating its remaining resources in SlotSharingManager#listResolvedRootSlotInfo -> ResourceProfile#subtract
3. an unexpected error may happen when releasing a single task slot from a shared slot (in ResourceProfile#subtract)
To solve this issue, I'd propose to:
1. Change Resource to use BigDecimal to manage double values. This enabled the values able to be strictly compared, and able to be additively merged/subtracted with no precision loss. Extended resources can work correctly with double values with this change.
2. Introduce CPUResource to represent cpu cores. It is based on Resource
3. Change ResourceSpec/ResourceProfile to use CPUResource for cpu cores
Attachments
Issue Links
- blocks
-
FLINK-14314 Allocate shared slot resources respecting the resources of all vertices in the group
- Closed
-
FLINK-14734 Add a ResourceSpec in SlotSharingGroup to describe its overall resources
- Closed
- links to