Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-422 Replicated master process (no master SPOF)
  3. KUDU-1374

Operations triggered by TS heartbeats may go unperformed



    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.7.1
    • 0.10.0
    • master
    • None


      (copying this from my multi-master design doc)

      The inclusion or exclusion of a tablet in an incremental tablet report is edge-triggered, and may result in a state changing operation on the tserver, communicated via out-of-band RPC. This RPC is retried until it is successful. However, if the leader master dies after it is able to respond to the tserver's heartbeat but before the out-of-band RPC is sent, the edge-triggered tablet report may be missed, and the state changing operation will not be performed until the next time the tablet is included in a tablet report. As tablet report inclusion criteria is narrow, operations may be "missed" for quite some time.

      These operations include:

      1. Some tablet deletions, such as tablets belonging to orphaned tables, or tablets whose deletion RPCs were sent and failed during an earlier DeleteTable() request.
      2. Some tablet alters, such as tablets whose alter RPCs were sent and failed during an earlier AlterTable() request.
      3. Config changes sent due to under-replicated tablets.

      A simple fix is to require that tservers send a full tablet report when they detect that a new leader master was elected.




            adar Adar Dembo
            adar Adar Dembo
            0 Vote for this issue
            2 Start watching this issue