Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2288

Client should fail fast upon access to an unavailable tablet

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • supportability
    • None

    Description

      Currently if a tablet has become unavailable for some reason (eg it has lost a majority of replicas), the client will still faithfully retry up to its maximum timeout for a read or write operation. After that timeout, it will sometimes indicate a "timed out" error rather than something more indicative of the root cause.

      The retry-on-unavailability behavior is desirable in the case of transient unavailability (eg a node has just failed and a re-election is occurring). But if the tablet has been unavailable for quite some time (eg longer than the client timeout, or longer than N heartbeat intervals for some N) than we can assume that it's unlikely to recover within the timeout, and it would be preferable to fail fast with an appropriate exception.

      Attachments

        Issue Links

          Activity

            People

              abukor Attila Bukor
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: