Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1608

Catalog Manager DeleteTablet retry logic is broken

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • master
    • None

    Description

      There are a couple of issues with the Catalog Manager's retry logic for DeleteTablet requests:

      1. The retries loop indefinitely
      2. The RPC response is checked against a whitelist of fatal errors, instead of a list of retriable errors. Additionally, we are missing many fatal errors on this list such as WRONG_SERVER_UUID and UNKNOWN_ERROR. I think we should instead only retry on errors which we know we can recover from.
      3. The catalog manager aggressively sends out DeleteTablet requests to tablet servers when tablets are ejected from the group. Arguably this should only be done lazily when the dead tablets report in, since most of the time the tablet will be ejected due to failure (and will never be seen again).

      Attachments

        Issue Links

          Activity

            People

              dineshabbi Dinesh Bhat
              danburkert Dan Burkert
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: