[FLINK-7231] SlotSharingGroups are not always released in time for new restarts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.3.1
Fix Version/s: 1.3.2, 1.4.0
Component/s: Runtime / Coordination
Labels:
None

Description

In the case where there are not enough resources to schedule the streaming program, a race condition can lead to a sequence of the following errors:

java.lang.IllegalStateException: SlotSharingGroup cannot clear task assignment, group still has allocated resources.

This eventually recovers, but may involve many fast restart attempts before doing so.

The root cause is that slots are not cleared before the next restart attempt.

Attachments

Issue Links

links to

GitHub Pull Request #4370

Activity

People

Assignee:: Stephan Ewen

Reporter:: Stephan Ewen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Jul/17 16:15

Updated:: 02/Oct/19 17:44

Resolved:: 08/Nov/17 13:29