Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3341

Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.16.0
    • master
    • None

    Description

      Sometimes a tablet server could be shutdown because of detected disk failures, and this server would be re-added to the cluster with all data cleared.

      Replicas could be replicated after  --follower_unavailable_considered_failed_sec seconds. And then master send DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and keep retrying to delete tablets after --unresponsive_ts_rpc_timeout_ms(default 1 hour).

      It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because the server uuid could only be corrected by restarting the tablet server, at that time full tablet reports would sent to master and if any, outdated replicas could be deleted finally.

      Attachments

        Activity

          People

            zhangyifan27 YifanZhang
            zhangyifan27 YifanZhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: