Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-30469

FLIP-266: Simplify network memory configurations for TaskManager

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Done
    • 1.17.0
    • 1.17.0
    • Runtime / Network
    • None
    • Hide
      The default value of `taskmanager.memory.network.max` has changed from `1g` to `Long#MAX_VALUE`, to reduce the number of config options user needs to tune when trying to increase the network memory size. This may affect the performance when this option is not explicitly configured, due to potential changes of network memory size, and heap and managed memory sizes when the total memory size is fixed. To go back to the previous behavior, user can explicitly configure this option to the previous default value `1g`.

      A threshold is introduced for controlling the number of required buffers among all buffers needed for reading data from upstream tasks. Reducing the number of required buffers helps reduces the chance of failures due to insufficient network buffers, at the price of potential performance impact. By default, the number of required buffers is only reduced for batch workloads, while stay unchanged for streaming workloads. This can be tuned via `taskmanager.network.memory.read-buffer.required-per-gate.max`. See the description of the config option for more details.
      Show
      The default value of `taskmanager.memory.network.max` has changed from `1g` to `Long#MAX_VALUE`, to reduce the number of config options user needs to tune when trying to increase the network memory size. This may affect the performance when this option is not explicitly configured, due to potential changes of network memory size, and heap and managed memory sizes when the total memory size is fixed. To go back to the previous behavior, user can explicitly configure this option to the previous default value `1g`. A threshold is introduced for controlling the number of required buffers among all buffers needed for reading data from upstream tasks. Reducing the number of required buffers helps reduces the chance of failures due to insufficient network buffers, at the price of potential performance impact. By default, the number of required buffers is only reduced for batch workloads, while stay unchanged for streaming workloads. This can be tuned via `taskmanager.network.memory.read-buffer.required-per-gate.max`. See the description of the config option for more details.

    Description

      When using Flink, users may encounter the following issues that affect usability.
      1. The job may fail with an "Insufficient number of network buffers" exception.
      2. Flink network memory size adjustment is complex.

      When encountering these issues, users can solve some problems by adding or adjusting parameters. However, multiple memory config options should be changed. The config option adjustment requires understanding the detailed internal implementation, which is impractical for most users.

      To resolve the issues, we propose some improvement solutions. For more details see FLIP-266.

      This is the umbrella ticket to track all the changes of this feature.

      Attachments

        Issue Links

          Activity

            People

              tanyuxin Yuxin Tan
              tanyuxin Yuxin Tan
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: