Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-28374 Some further improvements of blocking shuffle
  3. FLINK-28512

Select HashBasedDataBuffer and SortBasedDataBuffer dynamically based on the number of network buffers can be allocated for SortMergeResultPartition

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.16.0
    • None

    Description

      Currently, the SortMergeResultPartition select to use HashBasedDataBuffer and SortBasedDataBuffer based on the number of required buffers per result partition decided by 'taskmanager.network.sort-shuffle.min-buffers'. If the configured value is large enough, HashBasedDataBuffer will be used, otherwise, SortBasedDataBuffer will be used. Usually, the HashBasedDataBuffer has better performance. However, it is not easy to tune this value, because if a user tries to increase it for better performance, he/she is easy to encounter the 'Insufficient number of network buffers' error. This patch improves this case by selecting HashBasedDataBuffer and SortBasedDataBuffer dynamically based on the number of network buffers can be allocated. More specifically, if there is enough buffers at runtime, HashBasedDataBuffer will be used, otherwise, SortBasedDataBuffer will be used. To achieve better performance, the user only need to increase total amount of network memory per task manager.

      Attachments

        Issue Links

          Activity

            People

              tanyuxin Yuxin Tan
              kevin.cyj Yingjie Cao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: