Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17355

Exactly once kafka checkpointing sensitive to single node failures

    XMLWordPrintableJSON

Details

    Description

      With exactly one semantics, when checkpointing, FlinkKafkaProducer creates a new KafkaProducer for each checkpoint. KafkaProducer#initTransactions can timeout if a kafka node becomes unavailable, even in the case of multiple brokers and in-sync replicas (see https://stackoverflow.com/questions/55955379/enabling-exactly-once-causes-streams-shutdown-due-to-timeout-while-initializing).

      In non-flink cases, this might be fine since I imagine a KafkaProducer is not created very often. With flink however, this is called per checkpoint which means practically an HA kafka cluster isn't actually HA. This makes rolling a kafka node particularly painful even in intentional cases such as config changes or upgrades.

       

      In our specific setup, these are our settings:

      5 kafka nodes

      Per topic, we have a replication factor = 3 and in-sync replicas = 2 and partitions = 3

      Attachments

        Activity

          People

            Unassigned Unassigned
            tliao Teng Fei Liao
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: