Details
-
Sub-task
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
Description
Follow up the test for https://issues.apache.org/jira/browse/FLINK-35533
In Flink 1.20, we proposed integrating Flink's Hybrid Shuffle with Apache Celeborn through a pluggable remote tier interface. To verify this feature, you should reference these main two steps.
1. Implement Celeborn tier.
- Implement a new tier factory and tier for Celeborn, including these APIs, including TierFactory/TierMasterAgent/TierProducerAgent/TierConsumerAgent.
- The implementations should support granular data management at the Segment level for both client and server sides.
2. Use the implemented tier to shuffle data.
- Compile Flink and Celeborn.
- Deploy Celeborn service
- Deploy a new Celeborn service with the new compiled packages. You can reference the doc (https://celeborn.apache.org/docs/latest/) to deploy the cluster.
- Add the compiled flink plugin jar (celeborn-client-flink-xxx.jar) to Flink classpaths.
- Configure the options to enable the feature.
- Configure the option taskmanager.network.hybrid-shuffle.external-remote-tier-factory.class to the new Celeborn tier classes. Except for this option, the following options should also be added.
execution.batch-shuffle-mode: ALL_EXCHANGES_HYBRID_FULL celeborn.master.endpoints: <the celeborn endpoint address> celeborn.client.shuffle.partition.type: MAP
- Run some test examples(e.g., WordCount) to verify the feature.