Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-31976

Once marked as an inefficient scale-up, further scaling may not happen forever

    XMLWordPrintableJSON

Details

    Description

      The determination of whether it is an inefficient scale-up is calculated as follows

      double lastProcRate = lastSummary.getMetrics().get(TRUE_PROCESSING_RATE).getAverage();
      double lastExpectedProcRate =
      lastSummary.getMetrics().get(EXPECTED_PROCESSING_RATE).getCurrent();
      var currentProcRate = evaluatedMetrics.get(TRUE_PROCESSING_RATE).getAverage();
      double expectedIncrease = lastExpectedProcRate - lastProcRate;
      double actualIncrease = currentProcRate - lastProcRate;
      
      boolean withinEffectiveThreshold =
      (actualIncrease / expectedIncrease)
      >= conf.get(AutoScalerOptions.SCALING_EFFECTIVENESS_THRESHOLD);

      Because the expectedIncrease value references the last scaling history, it will not change unless there is an additional scale-up, only the actualIncrease value will change.
      The actualIncrease value is currentProcRate( avg of TRUE_PROCESSING_RATE),
      The calculation of TRUE_PROCESSING_RATE is as follows
      trueProcessingRate = busyTimeMultiplier * numRecordsInPerSecond.getSum()

      For example, let's say you've been marked as an inefficient scale-up, but the LAG continues to build up.
      You need to scale up to eliminate the growing LAG, but because you're marked as an inefficient scale-up, it won't happen.
      To unmark a scaleup as inefficient, the following conditions must be met: actualIncrease/expectedIncrease > SCALING_EFFECTIVENESS_THRESHOLD (default 0.1)

      Here, expectedIncrease is a constant with lastSummary, so the value of actualIncrease must increase.
      However, the actualIncrease value is proportional to busyTimeMultiplier and numRecordsInPerSecond, and these two values will converge to a certain value if no scaling occurs.
      Therefore, the value of actualIncrease will also converge.
      If this value fails to cross a threshold, no further scaling up is possible, even if the lag continues to build up.

      Attachments

        1. image-2023-05-01-22-41-57-208.png
          148 kB
          Tan Kim
        2. image-2023-05-01-23-54-06-383.png
          101 kB
          Tan Kim
        3. image-2023-05-01-23-55-08-254.png
          146 kB
          Tan Kim
        4. image-2023-05-02-02-08-25-920.png
          134 kB
          Tan Kim

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tanee.kim Tan Kim
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: