[FLINK-23402] Expose a consistent GlobalDataExchangeMode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.14.0
Component/s: API / DataStream
Labels:
- pull-request-available

Release Note:
The default DataStream API shuffle mode for batch executions has been changed to blocking exchanges for all edges of the stream graph. A new option `execution.batch-shuffle-mode` allows to change it to pipelined behavior if necessary.

Description

The Table API makes the GlobalDataExchangeMode configurable via table.exec.shuffle-mode.

In Table API batch mode the StreamGraph is configured with ALL_EDGES_BLOCKING and in DataStream API batch mode FORWARD_EDGES_PIPELINED.

I would vote for unifying the exchange mode of both APIs so that complex SQL pipelines behave identical in StreamTableEnvironment and TableEnvironment. Also the feedback a got so far would make ALL_EDGES_BLOCKING a safer option to run pipelines successfully with limited resources.

lzljs3620320

The previous history was like this:

The default value is pipeline, and we find that many times due to insufficient resources, the deployment will hang. And the typical use of batch jobs is small resources running large parallelisms, because in batch jobs, the granularity of failover is related to the amount of data processed by a single task. The smaller the amount of data, the faster the fault tolerance. So most of the scenarios are run with small resources and large parallelisms, little by little slowly running.

Later, we switched the default value to blocking. We found that the better blocking shuffle implementation would not slow down the running speed much. We tested tpc-ds and it took almost the same time.

dwysakowicz

I don't see a problem with changing the default value for DataStream batch mode if you think ALL_EDGES_BLOCKING is the better default option.

In any case, we should make this configurable for DataStream API users and make the specific Table API option obsolete.

Attachments

Issue Links

causes

FLINK-23464 Benchmarks aren't compiling

Closed

links to

GitHub Pull Request #16536

GitHub Pull Request #16578

GitHub Pull Request #16679

Activity

People

Assignee:: Timo Walther

Reporter:: Timo Walther

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 15/Jul/21 14:24

Updated:: 23/Sep/21 18:08

Resolved:: 04/Aug/21 14:32