[FLINK-23381] Provide backpressure (currently job fails if a limit is hit) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.15.0
Component/s: Runtime / State Backends
Labels:
- pull-request-available

Description

With the current approach, job will fail if dstl.dfs.upload.max-in-flight (bytes) is reached.

Unsetting the limit roughly matches the current behaviour for other backends: async phase doesn't backpressure

(state.backend.rocksdb.checkpoint.transfer.thread.num only sets the upload thread pool size which uses an unbounded queue).

Note that blocking caller in DfsWriter.persistInternal() will also block regular stream processing (because of pre-emptive writes). This may or may not be desired behaviour.

Blocking sync phase of a snapshot can also have some issues (e.g. not being able to abort the checkpoint) which should be considered.

Attachments

Issue Links

blocks

FLINK-24402 Add a metric for back-pressure from the ChangelogStateBackend

Closed

links to

GitHub Pull Request #17229

Activity

People

Assignee:: Roman Khachatryan

Reporter:: Roman Khachatryan

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Jul/21 11:17

Updated:: 23/Aug/22 10:17

Resolved:: 19/Nov/21 09:41