Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32658

State should not be silently removed when ignore-unclaimed-state is false

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When ignore-unclaimed-state is false and the old state is removed, flink should throw exception. It's similar to removing a stateful operator.

      This case occurs not only when the user removes state, but also when the operator is replaced.

      For example: upgrade FlinkKafkaConsumer to KafkaSource. All logical are not changed, so the operator id isn't changed. The KafkaSource cannot resume from the state of FlinkKafkaConsumer. However, the new flink job can start, and the state is silently removed in the new job.(The old state is not physically discarded, it is still stored in the state backend, but the new code will never use it.)

      It also brings an additional problem: the KafkaSource will snapshot 2 states, it includes the new state of KafkaSource, and the union list state of FlinkKafkaConsumer. Whenever a job resumes from checkpoint, the union List state is inflated. Eventually the state size of kafka offset exceeded 200MB.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            fanrui Rui Fan
            fanrui Rui Fan

            Dates

              Created:
              Updated:

              Slack

                Issue deployment