Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-31963

java.lang.ArrayIndexOutOfBoundsException when scaling down with unaligned checkpoints

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.17.0, 1.16.1, 1.15.4, 1.18.0
    • 1.16.2, 1.18.0, 1.17.1
    • Flink: 1.17.0
      FKO: 1.4.0
      StateBackend: RocksDB(Genetic Incremental Checkpoint & Unaligned Checkpoint enabled)

    Description

      I'm testing Autoscaler through Kubernetes Operator and I'm facing the following issue.

      As you know, when a job is scaled down through the autoscaler, the job manager and task manager go down and then back up again.

      When this happens, an index out of bounds exception is thrown and the state is not restored from a checkpoint.

      Gyula Fora told me via the Flink Slack troubleshooting channel that this is likely an issue with Unaligned Checkpoint and not an issue with the autoscaler, but I'm opening a ticket with Gyula for more clarification.

      Please see the attached JM and TM error logs.
      Thank you.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            srichter Stefan Richter
            tanee.kim Tan Kim
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment