Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-13073

Simulation test fails due to inconsistency in MockLog's implementation




      We are getting the following error on trunk

      RaftEventSimulationTest > canRecoverAfterAllNodesKilled STANDARD_OUT
          timestamp = 2021-07-12T16:26:55.663, RaftEventSimulationTest:canRecoverAfterAllNodesKilled =
              Uncaught exception during poll of node 1                                  |-------------------jqwik-------------------
          tries = 25                    | # of calls to property
          checks = 25                   | # of not rejected calls
          generation = RANDOMIZED       | parameters are randomly generated
          after-failure = PREVIOUS_SEED | use the previous seed
          when-fixed-seed = ALLOW       | fixing the random seed is allowed
          edge-cases#mode = MIXIN       | edge cases are mixed in
          edge-cases#total = 108        | # of all combined edge cases
          edge-cases#tried = 4          | # of edge cases tried in current run
          seed = 8079861963960994566    | random seed to reproduce generated values    Sample
            arg0: 4002
            arg1: 2
            arg2: 4

      I think there are a couple of issues here:

      1. The ListenerContext for KafkaRaftClient uses the value returned by ReplicatedLog::startOffset() to determined the log start and when to load a snapshot while the MockLog implementation uses logStartOffset which could be a different value.
      2. MockLog doesn't implement ReplicatedLog::maybeClean so the log start offset is always 0.
      3. The snapshot id validation for MockLog and KafkaMetadataLog's createNewSnapshot throws an exception when the snapshot id is less than the log start offset.


      Fix the error quoted above we only need to fix bullet point 3. but I think we should fix all of the issues enumerated in this Jira.

      For 1. we should change the MockLog implementation so that it uses startOffset both externally and internally.

      For 2. I will file another issue to track this implementation.

      For 3. I think this validation is too strict. I think it is safe to simply ignore any attempt by the state machine to create an snapshot with an id less that the log start offset. We should return a {{Optional.empty()}}when the snapshot id is less than the log start offset. This tells the user that it doesn't need to generate a snapshot for that offset. 


        Issue Links



              jagsancio Jose Armando Garcia Sancio
              jagsancio Jose Armando Garcia Sancio
              0 Vote for this issue
              1 Start watching this issue