[FLINK-27155] Reduce multiple reads to the same Changelog file in the same taskmanager during restore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.16.0
Component/s: Runtime / State Backends
Labels:
- pull-request-available

Description

Background

In the current implementation, State changes of different operators in the same taskmanager may be written to the same changelog file, which effectively reduces the number of files and requests to DFS.

But on the other hand, the current implementation also reads the same changelog file multiple times on recovery. More specifically, the number of times the same changelog file is accessed is related to the number of ChangeSets contained in it. And since each read needs to skip the preceding bytes, this network traffic is also wasted.

The result is a lot of unnecessary request to DFS when there are multiple slots and keyed state in the same taskmanager.

Proposal

We can reduce multiple reads to the same changelog file in the same taskmanager during restore.

One possible approach is to read the changelog file all at once and cache it in memory or local file for a period of time when reading the changelog file.

I think this could be a subtask of v2 FLIP-158: Generalized incremental checkpoints .

Hi ym , roman how do you think about ?

Attachments

Issue Links

relates to

FLINK-28286 move “enablechangelog” constant out of flink-streaming-java module

Closed

links to

GitHub Pull Request #20152

Reduce multiple reads to the same Changelog file in the same TaskManager during restore

Activity

People

Assignee:: Feifan Wang

Reporter:: Feifan Wang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 09/Apr/22 16:52

Updated:: 09/Aug/22 06:33

Resolved:: 09/Aug/22 06:33