[FLINK-14826] Enable 'Streaming bucketing end-to-end test' to pass with new DefaultScheduler - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.10.0
Fix Version/s: 1.10.0
Component/s: Tests
Labels:
- pull-request-available

Description

The tests fails because we exhaust the number of restarts (3). The reason is that the new scheduler may re-schedule tasks faster – we start counting down the restart back-off time as soon as we triggered task cancellation, however the legacy scheduler will only start counting down after the task cancellation is finished. Thus, re-scheduled tasks may be deployed into a TM that was killed, and therefore increase the number of restarts multiple times. The speed of the TM loss detection depends on heartbeat.interval and heartbeat.timeout. These settings are by default 10s and 50s respectively. The problem can even be reproduced with the legacy scheduler on the current master by setting heartbeat.timeout to a high value, such as 180000.

Attachments

Issue Links

is duplicated by

FLINK-14823 Enable 'Streaming bucketing end-to-end test' to pass with new DefaultScheduler

Closed

links to

GitHub Pull Request #10266

Activity

People

Assignee:: Gary Yao

Reporter:: Gary Yao

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 15/Nov/19 16:50

Updated:: 20/Nov/19 14:39

Resolved:: 20/Nov/19 14:39

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m