Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-8688

Upgrade system tests fail due to data loss with older message format

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4.0
    • Component/s: system tests
    • Labels:
      None

      Description

      System test failure for TestUpgrade/test_upgrade: from_kafka_version=0.9.0.1, to_message_format_version=0.9.0.1, compression_types=.lz4

      3 acked message did not make it to the Consumer. They are: [33906, 33900, 33903]. The first 3 missing messages were validated to ensure they are in Kafka's data files. 3 were missing. This suggests data loss. Here are some of the messages not found in the data files: [33906, 33900, 33903]
      
      Traceback (most recent call last):
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py", line 132, in run
          data = self.run_test()
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py", line 189, in run_test
          return self.test_context.function(self.test)
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/mark/_mark.py", line 428, in wrapper
          return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/core/upgrade_test.py", line 136, in test_upgrade
          self.run_produce_consume_validate(core_test_action=lambda: self.perform_upgrade(from_kafka_version,
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 112, in run_produce_consume_validate
          self.validate()
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 135, in validate
          assert succeeded, error_msg
      AssertionError: 3 acked message did not make it to the Consumer. They are: [33906, 33900, 33903]. The first 3 missing messages were validated to ensure they are in Kafka's data files. 3 were missing. This suggests data loss. Here are some of the messages not found in the data files: [33906, 33900, 33903]
      

      Logs show:

      1. Broker 1 is leader of partition
      2. Broker 2 successfully fetches from offset 10947 and processes request
      3. Broker 2 sends fetch request to broker 1 for offset 10950
      4. Broker 1 sets is HW to 10950, acknowledges produce requests up to HW
      5. Broker 2 is elected leader
      6. Broker 2 truncates to its local HW of 10947 - 3 messages are lost

      This data loss is a known issue that was fixed under KIP-101. But since this can still happen with older messages formats, we should update upgrade tests to cope with some data loss.
       

        Attachments

          Activity

            People

            • Assignee:
              rsivaram Rajini Sivaram
              Reporter:
              rsivaram Rajini Sivaram
              Reviewer:
              Ismael Juma
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: