Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-572

Better timeout handling for Kudu clients, especially for Master requests

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • M4.5
    • None
    • client, master
    • None
    • M5

    Description

      "suppose the admin operation timeout is 10 seconds, and that is the max timeout we use. but then how do we handle the case when:
      1) the server we're talking to is in an I/O Pause (this is exactly what master_failover-itest tests)
      2) we want to retry and find a new master in the mean time, and yet want to keep a timeout for this.
      another idea i had for this is:
      1) default rpc timeout (the minimum timeout, before we retry the rpc ops)
      2) overall timeout.
      right now effective default_admin_operation_timeout is (1)
      and select_master_timeout is (2)
      this way we keep default_timeout as usual, but now have another timeout we can use to detect slow nodes.
      i think on TS we rely on this by the quorum reporting the new leader to master – and i think we're changing that too.
      any thoughts on this?"

      See: http://gerrit.sjc.cloudera.com:8080/?l=1399#/c/5483/20/src/kudu/client/meta_cache.cc for discussion

      Attachments

        Activity

          People

            adar Adar Dembo
            avf Alex Feinberg
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: