Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-26018

Unnecessary late events when using the new KafkaSource

    XMLWordPrintableJSON

Details

    Description

      There is an issue with the new KafkaSource connector in Flink 1.14: when one task consumes messages from multiple topic partitions (statically created, timestamp are in order), it may start with one partition and advances watermarks before the data from other partitions come. In this case, the early messages in other partitions may unnecessarily be considered  as late ones.

      I discussed with renqs, it seems that the new KafkaSource only adds a partition into WatermarkMultiplexer when it receives data from that partition. In contrast, FlinkKafkaConsumer adds all known partition before it fetch any data. 

      Attached two files: the messages in Kafka and the corresponding TM logs.

      Attachments

        1. taskmanager_10.28.0.131_33249-b3370c_log
          377 kB
          Jun Qin
        2. message in kafka.txt
          48 kB
          Jun Qin

        Issue Links

          Activity

            People

              Unassigned Unassigned
              qinjunjerry Jun Qin
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: