Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9171

DelayedFetch completion may throw exception, causing successful produce to be failed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • core
    • None

    Description

      I was looking at the logs of the system test failure of ReassignPartitionsTest.

      Logs show produce error ReplicaNotAvailableException for two records in the producer log, but the data logs of all the brokers contain the records. The offsets of these records are returned as successful produce for two subsequent records which don't appear in the logs and hence the test failed.

      Broker logs of the leader at the time of the reassignment and leader change show:

       

      {{[2019-11-11 07:23:17,727] ERROR [ReplicaManager broker=3] Error processing append operation on partition test_topic-17 (kafka.server.ReplicaManager)
      org.apache.kafka.common.errors.ReplicaNotAvailableException: Partition test_topic-5 is not available}}

      This is failing the append operation on `test_topic-17` when a different partition `test_topic-5` was unavailable for fetch. I think it is fetch since produce would have thrown NotLeaderForPartitionException rather than ReplicaNotAvailableException.

      We don't expect DelayedFetch to throw exceptions and it looks like we are not handling `ReplicaNotAvailableException`.

      I am not sure if this fixes the issues with ReassignPartitionsTest, but this seems to a scenario that we should fix.

      Attachments

        Issue Links

          Activity

            People

              rsivaram Rajini Sivaram
              rsivaram Rajini Sivaram
              Ismael Juma Ismael Juma
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: