Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14058 FLIP-53 Fine-grained Operator Resource Management
  3. FLINK-14594

Fix matching logics of ResourceSpec/ResourceProfile/Resource considering double values

    XMLWordPrintableJSON

    Details

    • Release Note:
      Hide
      Serialized `JobGraphs` which set the `ResourceSpec` created by Flink versions < 1.10 are no longer compatible with Flink >= 1.10. If you want to migrate these jobs to Flink 1.10.0 you will have to stop the job with a savepoint and then resume it from this savepoint on the Flink 1.10.0 cluster.
      Show
      Serialized `JobGraphs` which set the `ResourceSpec` created by Flink versions < 1.10 are no longer compatible with Flink >= 1.10. If you want to migrate these jobs to Flink 1.10.0 you will have to stop the job with a savepoint and then resume it from this savepoint on the Flink 1.10.0 cluster.

      Description

      There are resources of double type values, like cpuCores in ResourceSpec/ResourceProfiles or all extended resources. These values can be generated via a merge or subtract, so that there can be small deltas.

      Currently, in resource matching, these resources are matched without considering the deltas, which may result in issues as below:
      1. A shared slot cannot fulfill a slot request even if it should be able to (because it is possible that (d1 + d2) - d1 < d2 for double values)
      2. if a shared slot is used up, an unexpected error may occur when calculating its remaining resources in SlotSharingManager#listResolvedRootSlotInfo -> ResourceProfile#subtract
      3. an unexpected error may happen when releasing a single task slot from a shared slot (in ResourceProfile#subtract)

      To solve this issue, I'd propose to:
      1. Change Resource to use BigDecimal to manage double values. This enabled the values able to be strictly compared, and able to be additively merged/subtracted with no precision loss. Extended resources can work correctly with double values with this change.
      2. Introduce CPUResource to represent cpu cores. It is based on Resource
      3. Change ResourceSpec/ResourceProfile to use CPUResource for cpu cores

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zhuzh Zhu Zhu
                Reporter:
                zhuzh Zhu Zhu
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m