[FLINK-28024] KeyedStateCheckpointingITCase.KeyedStateCheckpointingITCase ends up in infinite failover loop - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Duplicate
Affects Version/s: 1.16.0
Fix Version/s: None
Component/s: Build System / Azure Pipelines
Labels:
- test-stability

Description

We observed several situations already where log files reached a file size of over 120G. This caused the worker's disk usage to reach 100% resulting in the worker machine to go "offline", i.e. not being available to pick up new tasks.

The initially observed excessive log spilling is due to a TaskManager failing fatally which results in the requested number of slots never becoming available and the test job ending up in an infinite failover/restart loop. See further details in the comment section.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

testWithRocksDbBackendIncremental.log.gz
13/Jun/22 15:13
21 kB
Matthias Pohl

Issue Links

duplicates

FLINK-28077 Tasks get stuck during cancellation in ChannelStateWriteRequestExecutorImpl

Closed

is related to

FLINK-24433 "No space left on device" in Azure e2e tests

Closed

FLINK-25374 Azure pipeline get stalled on scanning project

Closed

Activity

People

Assignee:: Matthias Pohl

Reporter:: Matthias Pohl

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 13/Jun/22 13:16

Updated:: 21/Jun/22 10:25

Resolved:: 21/Jun/22 10:25