Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-19343

FLIP-36: Support Interactive Programming in Flink

Details

    Attachments

      Issue Links

        Activity

          jinxing6042@126.com Jin Xing added a comment - - edited

          My team is considering the design of "Remote Shuffle Service", regarding this "Interactive" FLIP, may I comment by below points ?

          1. What is the role of ResourceManager ? From my previous understanding, it's mainly about computing resource management. But I guess "Interactive" will rely on the ResourceManager#clusterPartitionTracker for shuffle data lifecycle management. It seems weird. A possible answer is that ResourceManager is a component whose lifecycle goes across jobs and equals to ApplicationMaster, thus chosen to manage data across jobs. But I don't see strong causality. Should we have a separate component to manage lifecycle of shuffle data across jobs ?
          2. Additionally current "Interactive" shares shuffle data across jobs by saving ShuffleDescriptors into table catalog, which bypasses shuffle-service and makes "Interactive" a pure Table-API level feature. What if we want such interactive feature on DataStream or DataSet API level in the future? My question is how to reuse. Should shuffle meta should be managed within the scope of Flink Runtime, rather than spread out into Sql layer? If we go deeper into this question, the lifecycle of ShuffleMaster is the same with JobManager in current design, which makes ShuffleMaster not qualified to manage data sharing across jobs. Related with my point 1 – should we have a separate component to manage lifecycle of shuffle data across jobs ?
          3. In scenario of "Remote Shuffle Service", the lifecycle of TM decouples with shuffle data. It's not proper to ask the TM#partitionTracker to apply data release, but should rely on the communication between ShuffleMaster and ShuffleService.
          jinxing6042@126.com Jin Xing added a comment - - edited My team is considering the design of "Remote Shuffle Service", regarding this "Interactive" FLIP, may I comment by below points ? What is the role of ResourceManager ? From my previous understanding, it's mainly about computing resource management. But I guess "Interactive" will rely on the ResourceManager#clusterPartitionTracker for shuffle data lifecycle management. It seems weird. A possible answer is that ResourceManager is a component whose lifecycle goes across jobs and equals to ApplicationMaster, thus chosen to manage data across jobs. But I don't see strong causality. Should we have a separate component to manage lifecycle of shuffle data across jobs ? Additionally current "Interactive" shares shuffle data across jobs by saving ShuffleDescriptors into table catalog, which bypasses shuffle-service and makes "Interactive" a pure Table-API level feature. What if we want such interactive feature on DataStream or DataSet API level in the future? My question is how to reuse. Should shuffle meta should be managed within the scope of Flink Runtime, rather than spread out into Sql layer? If we go deeper into this question, the lifecycle of ShuffleMaster is the same with JobManager in current design, which makes ShuffleMaster not qualified to manage data sharing across jobs. Related with my point 1 – should we have a separate component to manage lifecycle of shuffle data across jobs ? In scenario of "Remote Shuffle Service", the lifecycle of TM decouples with shuffle data. It's not proper to ask the TM#partitionTracker to apply data release, but should rely on the communication between ShuffleMaster and ShuffleService.
          sxnan Xuannan Su added a comment - - edited

          Hi, Jin Xing. Thanks for your comments.
          1. I agree with you that it is indeed weird for the "Interactive" relies on the clusterPartitionTracker for managing the lifecycle of the cluster partition, which is also a form of shuffle data. And to support cache across jobs, we need to have a component whose lifecycle outlive the job to manage the shuffle data.
          2. TBH, also, I wouldn't say I like the idea to pass the ShuffleDescriptor back to the client-side. But at that time being, as you say, we do not have a separate component to manage the lifecycle of shuffle data across jobs in the runtime. Therefore, the decision is made to support share shuffle data across jobs. To support caching in DataStream, it is a more clear design to have a component at runtime scope to manage the shuffle data across jobs.
          3. I don't think the remote shuffle data should be managed by the PartitionTracker as well. Instead, I think the ClusterPartition is just a kind of shuffle data and therefore should be managed by the ShuffleService.
          I am pulling in chesnay. He may have more insight from the ClusterPartition perspective.

          sxnan Xuannan Su added a comment - - edited Hi, Jin Xing. Thanks for your comments. 1. I agree with you that it is indeed weird for the "Interactive" relies on the clusterPartitionTracker for managing the lifecycle of the cluster partition, which is also a form of shuffle data. And to support cache across jobs, we need to have a component whose lifecycle outlive the job to manage the shuffle data. 2. TBH, also, I wouldn't say I like the idea to pass the ShuffleDescriptor back to the client-side. But at that time being, as you say, we do not have a separate component to manage the lifecycle of shuffle data across jobs in the runtime. Therefore, the decision is made to support share shuffle data across jobs. To support caching in DataStream, it is a more clear design to have a component at runtime scope to manage the shuffle data across jobs. 3. I don't think the remote shuffle data should be managed by the PartitionTracker as well. Instead, I think the ClusterPartition is just a kind of shuffle data and therefore should be managed by the ShuffleService. I am pulling in chesnay . He may have more insight from the ClusterPartition perspective.
          flink-jira-bot Flink Jira Bot added a comment -

          I am the Flink Jira Bot and I help the community manage its development. I see this issues has been marked as Major but is unassigned and neither itself nor its Sub-Tasks have been updated for 30 days. I have gone ahead and added a "stale-major" to the issue". If this ticket is a Major, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized.

          flink-jira-bot Flink Jira Bot added a comment - I am the Flink Jira Bot and I help the community manage its development. I see this issues has been marked as Major but is unassigned and neither itself nor its Sub-Tasks have been updated for 30 days. I have gone ahead and added a "stale-major" to the issue". If this ticket is a Major, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized.
          flink-jira-bot Flink Jira Bot added a comment -

          This issue was labeled "stale-major" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion.

          flink-jira-bot Flink Jira Bot added a comment - This issue was labeled "stale-major" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion.
          flink-jira-bot Flink Jira Bot added a comment -

          I am the Flink Jira Bot and I help the community manage its development. I see this issues has been marked as Minor but is unassigned and neither itself nor its Sub-Tasks have been updated for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is still Minor, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized.

          flink-jira-bot Flink Jira Bot added a comment - I am the Flink Jira Bot and I help the community manage its development. I see this issues has been marked as Minor but is unassigned and neither itself nor its Sub-Tasks have been updated for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is still Minor, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized.
          flink-jira-bot Flink Jira Bot added a comment -

          This issue was labeled "stale-minor" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Minor, please raise the priority and ask a committer to assign you the issue or revive the public discussion.

          flink-jira-bot Flink Jira Bot added a comment - This issue was labeled "stale-minor" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Minor, please raise the priority and ask a committer to assign you the issue or revive the public discussion.
          aks647 Akshat Khandelwal added a comment - - edited

          Hi xuannan,
          we have a use case in which we wanted to explore the feature which you were working on.
          Is the development for the above Jira is completed?

          aks647 Akshat Khandelwal added a comment - - edited Hi xuannan , we have a use case in which we wanted to explore the feature which you were working on. Is the development for the above Jira is completed?

          People

            Unassigned Unassigned
            sxnan Xuannan Su
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated: