Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9572

Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some Records



    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.6.0
    • Component/s: streams
    • Labels:


      System test StreamsEosTest.test_failure_and_recovery failed due to a wrongly computed aggregation under exactly-once (EOS). The specific error is:

      Exception in thread "main" java.lang.RuntimeException: Result verification failed for ConsumerRecord(topic = sum, partition = 1, leaderEpoch = 0, offset = 2805, CreateTime = 1580719595164, serialized key size = 4, serialized value size = 8, headers = RecordHeaders(headers = [], isReadOnly = false), key = [B@6c779568, value = [B@f381794) expected <6069,17269> but was <6069,10698>
      	at org.apache.kafka.streams.tests.EosTestDriver.verifySum(EosTestDriver.java:444)
      	at org.apache.kafka.streams.tests.EosTestDriver.verify(EosTestDriver.java:196)
      	at org.apache.kafka.streams.tests.StreamsEosTest.main(StreamsEosTest.java:69)

      That means, the sum computed by the Streams app seems to be wrong for key 6069. I checked the dumps of the log segments of the input topic partition (attached: data-1.txt) and indeed two input records are not considered in the sum. With those two missed records the sum would be correct. More concretely, the input values for key 6069 are:

      1. 147
      2. 9250
      3. 5340
      4. 1231
      5. 1301

      The sum of this values is 17269 as stated in the exception above as expected sum. If you subtract values 3 and 4, i.e., 5340 and 1231 from 17269, you get 10698 , which is the actual sum in the exception above. Somehow those two values are missing.

      In the log dump of the output topic partition (attached: sum-1.txt), the sum is correct until the 4th value 1231 , i.e. 15968, then it is overwritten with 10698.

      In the log dump of the changelog topic of the state store that stores the sum (attached: 7-changelog-1.txt), the sum is also overwritten as in the output topic.

      I attached the logs of the three Streams instances involved.


        1. 7-changelog-1.txt
          493 kB
          Bruno Cadonna
        2. data-1.txt
          1.27 MB
          Bruno Cadonna
        3. streams22.log
          9.38 MB
          Bruno Cadonna
        4. streams23.log
          6.45 MB
          Bruno Cadonna
        5. streams30.log
          7.19 MB
          Bruno Cadonna
        6. sum-1.txt
          493 kB
          Bruno Cadonna



            • Assignee:
              guozhang Guozhang Wang
              cadonna Bruno Cadonna
            • Votes:
              0 Vote for this issue
              5 Start watching this issue


              • Created: