[KUDU-3341] Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.16.0
Component/s: master
Labels:
None

Description

Sometimes a tablet server could be shutdown because of detected disk failures, and this server would be re-added to the cluster with all data cleared.

Replicas could be replicated after --follower_unavailable_considered_failed_sec seconds. And then master send DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and keep retrying to delete tablets after --unresponsive_ts_rpc_timeout_ms(default 1 hour).

It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because the server uuid could only be corrected by restarting the tablet server, at that time full tablet reports would sent to master and if any, outdated replicas could be deleted finally.

Attachments

Activity

People

Assignee:: YifanZhang

Reporter:: YifanZhang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Nov/21 16:17

Updated:: 04/Dec/21 01:34

Resolved:: 04/Dec/21 01:34