[FLINK-31963] java.lang.ArrayIndexOutOfBoundsException when scaling down with unaligned checkpoints - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.17.0, 1.16.1, 1.15.4, 1.18.0
Fix Version/s: 1.16.2, 1.18.0, 1.17.1
Component/s: Runtime / Checkpointing
Labels:
- stability
Environment:

Flink: 1.17.0
FKO: 1.4.0
StateBackend: RocksDB(Genetic Incremental Checkpoint & Unaligned Checkpoint enabled)

Description

I'm testing Autoscaler through Kubernetes Operator and I'm facing the following issue.

As you know, when a job is scaled down through the autoscaler, the job manager and task manager go down and then back up again.

When this happens, an index out of bounds exception is thrown and the state is not restored from a checkpoint.

gyfora told me via the Flink Slack troubleshooting channel that this is likely an issue with Unaligned Checkpoint and not an issue with the autoscaler, but I'm opening a ticket with Gyula for more clarification.

Please see the attached JM and TM error logs.
Thank you.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2023-04-29-02-49-05-607.png
28/Apr/23 17:49
32 kB
Tan Kim
jobmanager_error.txt
28/Apr/23 01:51
3 kB
Tan Kim
taskmanager_error.txt
28/Apr/23 01:51
3 kB
Tan Kim

Issue Links

is related to

FLINK-27031 ChangelogRescalingITCase.test failed due to IllegalStateException

Closed

Activity

People

Assignee:: Stefan Richter

Reporter:: Tan Kim

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 28/Apr/23 02:01

Updated:: 19/May/23 04:55

Resolved:: 19/May/23 04:53