[FLINK-35603] Release Testing Instructions: Verify FLINK-35533(FLIP-459): Support Flink hybrid shuffle integration with Apache Celeborn - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.20.0
Component/s: Runtime / Network
Labels:
- release-testing

Description

Follow up the test for https://issues.apache.org/jira/browse/FLINK-35533

In Flink 1.20, we proposed integrating Flink's Hybrid Shuffle with Apache Celeborn through a pluggable remote tier interface. To verify this feature, you should reference these main two steps.

1. Implement Celeborn tier.

Implement a new tier factory and tier for Celeborn, including these APIs, including TierFactory/TierMasterAgent/TierProducerAgent/TierConsumerAgent.
The implementations should support granular data management at the Segment level for both client and server sides.

2. Use the implemented tier to shuffle data.

Compile Flink and Celeborn.
Deploy Celeborn service
- Deploy a new Celeborn service with the new compiled packages. You can reference the doc (https://celeborn.apache.org/docs/latest/) to deploy the cluster.
Add the compiled flink plugin jar (celeborn-client-flink-xxx.jar) to Flink classpaths.
Configure the options to enable the feature.
- Configure the option taskmanager.network.hybrid-shuffle.external-remote-tier-factory.class to the new Celeborn tier classes. Except for this option, the following options should also be added.

execution.batch-shuffle-mode: ALL_EXCHANGES_HYBRID_FULL 
celeborn.master.endpoints: <the celeborn endpoint address>
celeborn.client.shuffle.partition.type: MAP

Run some test examples(e.g., WordCount) to verify the feature.

Attachments

Activity

People

Assignee:: Yuxin Tan

Reporter:: Rui Fan

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Jun/24 08:42

Updated:: 25/Jun/24 06:18

Resolved:: 25/Jun/24 06:18