Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-28374

Some further improvements of blocking shuffle

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.16.0
    • Runtime / Network
    • None

    Description

      This is an umbrella issue for sort-shuffle Improvements.

      Attachments

        Issue Links

          1.
          Remove data flush in SortMergeResultPartition Sub-task Resolved Yuxin Tan
          2.
          Avoid notifying too frequently when recycling buffers for BatchShuffleReadBufferPool Sub-task Resolved Yuxin Tan
          3.
          Restrict the number of threads for sort-shuffle data read Sub-task Resolved Yuxin Tan
          4.
          Read a full buffer of data per file IO read request for sort-shuffle Sub-task Resolved Yuxin Tan
          5.
          Decrease the memory size per request for sort-shuffle data read from 8M to 4M Sub-task Resolved Yuxin Tan
          6.
          Produce one intermediate dataset for multiple consumers consuming the same data Sub-task Closed Yingjie Cao
          7.
          Introduce new compression algorithms of higher compression ratio Sub-task Resolved Weijie Guo
          8.
          Select HashBasedDataBuffer and SortBasedDataBuffer dynamically based on the number of network buffers can be allocated for SortMergeResultPartition Sub-task Resolved Yuxin Tan
          9.
          Fix the bug that SortMergeResultPartitionReadScheduler may not read data sequentially Sub-task Resolved Yuxin Tan
          10.
          Store the number of bytes instead of the number of buffers in index entry for sort-shuffle Sub-task Resolved Yuxin Tan
          11.
          Remove the unused field in SortMergeSubpartitionReader Sub-task Resolved Yuxin Tan
          12.
          Extract header fields of Buffer into a BufferHeader class for blocking shuffle file IO Sub-task Resolved Yuxin Tan
          13.
          Allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side Sub-task Resolved Yingjie Cao
          14.
          Enlarge the max requested buffers for SortMergeResultPartitionReadScheduler Sub-task Resolved Yuxin Tan
          15.
          Sorting all unfinished readers in batches at one time in SortMergeResultPartitionReadScheduler Sub-task Resolved Yuxin Tan

          Activity

            People

              Unassigned Unassigned
              kevin.cyj Yingjie Cao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: