Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10353

Restoring a KafkaProducer with Semantic.EXACTLY_ONCE from a savepoint written with Semantic.AT_LEAST_ONCE fails with NPE

    XMLWordPrintableJSON

Details

    Description

      If a KafkaProducer with Semantic.EXACTLY_ONCE is restored from a savepoint written with Semantic.AT_LEAST_ONCE the job fails on restore with the NPE below. This makes it impossible to upgrade an AT_LEAST_ONCE pipeline to an EXACTL_ONCE pipeline statefully.

      java.lang.NullPointerException
      at java.util.Hashtable.put(Hashtable.java:460)
      at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initTransactionalProducer(FlinkKafkaProducer011.java:955)
      at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.recoverAndCommit(FlinkKafkaProducer011.java:733)
      at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.recoverAndCommit(FlinkKafkaProducer011.java:93)
      at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.recoverAndCommitInternal(TwoPhaseCommitSinkFunction.java:373)
      at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.initializeState(TwoPhaseCommitSinkFunction.java:333)
      at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initializeState(FlinkKafkaProducer011.java:867)
      at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
      at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
      at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
      at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:254)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
      at java.lang.Thread.run(Thread.java:748)

      The reason is, that for Semantic.AT_LEAST_ONCE the snapshotted state of the TwoPhaseCommitFunction is of the form "TransactionHolder{handle=KafkaTransactionState [transactionalId=null, producerId=-1, epoch=-1], transactionStartTime=1537175471175}".

      Attachments

        Issue Links

          Activity

            People

              srichter Stefan Richter
              knaufk Konstantin Knauf
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: