Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-28374 Some further improvements of blocking shuffle
  3. FLINK-28663

Allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side

    XMLWordPrintableJSON

Details

    Description

      Currently, one intermediate dataset can only be consumed by one downstream consumer vertex. If there are multiple consumer vertices consuming the same output of the same upstream vertex, multiple intermediate datasets will be produced. We can optimize this behavior to produce only one intermediate dataset which can be shared by multiple consumer vertices. As the first step, we should allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side. (Note that this optimization only works for blocking shuffle because pipelined shuffle result partition can not be consumed multiple times)

      Attachments

        Issue Links

          Activity

            People

              kevin.cyj Yingjie Cao
              kevin.cyj Yingjie Cao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: