[KAFKA-6916] AdminClient does not refresh metadata on broker failure - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.1, 1.1.0
Fix Version/s: 2.0.0
Component/s: admin
Labels:
None

Description

There are intermittent test failures in DynamicBrokerReconfigurationTest when brokers are restarted. The test uses ephemeral ports and hence ports after server restart are not the same as the ports before restart. The tests rely on metadata refresh on producers, consumers and admin clients to obtain new server ports when connections fail. This works with producers and consumers, but results in intermittent failures with admin client because refresh is not triggered.

There are a couple of issues in AdminClient:

Unlike producers and consumers, adminClient does not request metadata update when connection to a broker fails. This is particularly bad if controller goes down. Controller is used for various requests like createTopics and describeTopics. If controller goes down and adminClient.describeTopics() is invoked, adminClient sends the request to the old controller. If the connection fails, it keeps retrying with the same address. Metadata refresh is never triggered. The request times out after 2 minutes by default, metadata is not refreshed for 5 minutes by default. We should refresh metadata whenever connection to a broker fails.
Admin client requests are always retried on the same node. In the example above, if controller goes down and a new controller is elected, it will be good if the retried request is sent to the new controller. Otherwise we are just blocking the call for 2 minutes with a lot of retries that would never succeed.

Attachments

Issue Links

links to

GitHub Pull Request #5050

Activity

People

Assignee:: Rajini Sivaram

Reporter:: Rajini Sivaram

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/May/18 09:35

Updated:: 29/May/18 15:37

Resolved:: 29/May/18 15:37