Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-8086

FlinkKafkaProducer011 can permanently fail in recovery through ProducerFencedException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.4.0
    • 1.4.0
    • Connectors / Kafka
    • None

    Description

      Chaos monkey test in a cluster environment can permanently bring down our FlinkKafkaProducer011.

      Typically, after a small number of randomly killed TMs, the data generator job is no longer able to recover from a checkpoint because of the following problem:

      org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.

      The problem is reproduceable and happened for me in every run after the chaos monkey killed a couple of TMs.

      Attachments

        Issue Links

          Activity

            People

              pnowojski Piotr Nowojski
              srichter Stefan Richter
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: