Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1466

C++ client errors misreported as GetTableLocations timeouts

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 0.8.0
    • None
    • client
    • None

    Description

      client-test is currently very flaky due to this issue:

      • we are injecting some kind of failure on the tablet server (eg DNS resolution failure)
      • when we fail to connect to the TS, we correctly re-trigger a lookup against the master
      • depending how the backoffs and retries line up, we sometimes end up triggering the lookup retry when the remaining operation budget is very short (eg <10ms)
        • this GetTabletLocations RPC times out since the master is unable to respond within the ridiculously short timeout

      During the course of retrying some operation, we should probably not replace the 'last_error' with a master error, so long as we have had at least one successful master lookup (thus indicating that the master is not the problem)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: