Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1991

Master does not retry TS maintenance task if target TS not registered

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.1
    • 1.4.0
    • master
    • None

    Description

      If a master had just started up and gets a heartbeat from a tablet server that triggers some action on another tablet server (for example, tombstoning an evicted replica or adding a new replica to an under-replicated tablet config) then if the target tablet server (for example, the replica that the master thought was the leader) has not yet registered with the master (since its restart) the action will fail to be sent and will not be retried.

      This is because:

      1. There is a logic error in the catalog manager task management code that assumes all tablet servers have registered with the master at the time a task is started; and
      2. These kinds of tasks are edge-triggered (based on a response to a tablet report) instead of level-triggered (based on periodic state polling) on the master side.

      Attachments

        Issue Links

          Activity

            People

              mpercy Mike Percy
              mpercy Mike Percy
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: