Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6916

AdminClient does not refresh metadata on broker failure

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.1, 1.1.0
    • 2.0.0
    • admin
    • None

    Description

      There are intermittent test failures in DynamicBrokerReconfigurationTest when brokers are restarted. The test uses ephemeral ports and hence ports after server restart are not the same as the ports before restart. The tests rely on metadata refresh on producers, consumers and admin clients to obtain new server ports when connections fail. This works with producers and consumers, but results in intermittent failures with admin client because refresh is not triggered.

      There are a couple of issues in AdminClient:

      1. Unlike producers and consumers, adminClient does not request metadata update when connection to a broker fails. This is particularly bad if controller goes down. Controller is used for various requests like createTopics and describeTopics. If controller goes down and adminClient.describeTopics() is invoked, adminClient sends the request to the old controller. If the connection fails, it keeps retrying with the same address. Metadata refresh is never triggered. The request times out after 2 minutes by default, metadata is not refreshed for 5 minutes by default. We should refresh metadata whenever connection to a broker fails.
      2. Admin client requests are always retried on the same node. In the example above, if controller goes down and a new controller is elected, it will be good if the retried request is sent to the new controller. Otherwise we are just blocking the call for 2 minutes with a lot of retries that would never succeed.

       

      Attachments

        Issue Links

          Activity

            People

              rsivaram Rajini Sivaram
              rsivaram Rajini Sivaram
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: