Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12608

Simple identity pipeline sometimes loses data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.7.0
    • None
    • streams
    • None

    Description

      I'm running a very simple streams program that reads records from one topic into a table and then writes the stream back into another topic. In about 1 in 5 runs, some of the output records are missing. They tend to form a single contiguous range, as if a single batch was dropped somewhere.

      https://github.com/jamii/streaming-consistency/blob/main/kafka-streams/src/main/java/Demo.java#L49-L52

      $ wc -l tmp/*transactions
       999514 tmp/accepted_transactions
       1000000 tmp/transactions
       1999514 total
      
      $ cat tmp/transactions | cut -d',' -f 1 | cut -d' ' -f 2 > in
      
      $ cat tmp/accepted_transactions | cut -d',' -f 1 | cut -d':' -f 2 > out
      
      $ diff in out | wc -l
       487
      
      $ diff in out | head
       25313,25798d25312
       < 25312
       < 25313
       < 25314
       < 25315
       < 25316
       < 25317
       < 25318
       < 25319
       < 25320
       
      $ diff in out | tail
       < 25788
       < 25789
       < 25790
       < 25791
       < 25792
       < 25793
       < 25794
       < 25795
       < 25796
       < 25797
      

      I've checked running the consumer multiple times to make sure that the records are actually missing from the topic and it wasn't just a hiccup in the consumer.

      The repo linked above has instructions in the readme on how to reproduce the exact versions used.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jamii Jamie Brandon
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: