Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-430 Consistent Operations
  3. KUDU-1127

Avoid holding RPC handler threads on replicas that are part of a degraded tablet

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Private Beta
    • 1.2.0
    • tserver
    • None

    Description

      If the client performs a snapshot scan, we may need to wait for the leader to tell us that the timestamp is "safe". If the majority of nodes in a tablet are down, this will never happen. After KUDU-689, well wait with a deadline, but even this multi-second wait will end up blocking a lot of RPC handlers, potentially preventing other useful work from getting done.

      We should probably short-circuit the wait in the case that we haven't heard from any leader within the election timeout and just respond immediately. Alternatively, we could make this an async callback vs a blocking wait on handler.

      Attachments

        Activity

          People

            dralves David Alves
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: