Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-22910

Refine ShuffleMaster lifecycle management for pluggable shuffle service framework

    XMLWordPrintableJSON

Details

    • Hide
      We improved the ShuffleMaster interface by adding some lifecycle methods, including open, close, registerJob and unregisterJob. Besides, the ShuffleMaster now becomes a cluster level service which can be shared by multiple jobs. This is a breaking change to the pluggable shuffle service framework and the customized shuffle plugin needs to adapt to the new interface accordingly.
      Show
      We improved the ShuffleMaster interface by adding some lifecycle methods, including open, close, registerJob and unregisterJob. Besides, the ShuffleMaster now becomes a cluster level service which can be shared by multiple jobs. This is a breaking change to the pluggable shuffle service framework and the customized shuffle plugin needs to adapt to the new interface accordingly.

    Description

      The current ShuffleMaster has an unclear lifecycle which is inconsistent with the ShuffleEnvironment at the TM side. Besides, it is hard to Implement some important capabilities for remote shuffle service. For example, 1) release external resources when a job finished; 2) Stop or start tracking some partitions depending on the status of the external service or system.

      We drafted a document[1] which proposed some simple changes to solve these issues. The document is still not wholly completed yet. We will start a discussion once it is finished.

       

      [1] https://docs.google.com/document/d/1_cHoapNbx_fJ7ZNraSqw4ZK1hMRiWWJDITuSZrdMDDs/edit?usp=sharing

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              kevin.cyj Yingjie Cao
              kevin.cyj Yingjie Cao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: