Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2147

Unknown leader treated as valid TS UUID by catalog manager

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • consensus, master
    • None

    Description

      A bug was observed on a development cluster where an empty string reported as the leader in a TS heartbeat to the master was treated as a valid TS UUID by the TSPicker, resulting in the master attempting to add contact an empty-string member of the cluster to add a new replica and being unable to connect.

      ksck looked like this:

      Tablet c4e8c3260dda48efbb7e182b569a7fc6 of table 'impala::tpch_1000_kudu.customer' is under-replicated: configuration has 2 replicas vs desired 3
        09d6bf7a02124145b43f43cb7a667b3d (host1.foo.example.com:7050): RUNNING
        a662440710624c02bd5612df32cb0235 (host2.foo.example.com:7050): RUNNING
      
      2 replicas' active configs differ from the master's.
        All the peers reported by the master and tablet servers are:
        A = 09d6bf7a02124145b43f43cb7a667b3d
        B = a662440710624c02bd5612df32cb0235
      
      The consensus matrix is:
       Config source |  Voters  | Current term | Config index | Committed?
      ---------------+----------+--------------+--------------+------------
       master        | A   B    |              |              | Yes
       A             | A*  B    | 101          | 5441         | Yes
       B             | A*  B    | 101          | 5441         | Yes
      Table impala::tpch_1000_kudu.customer has 1 under-replicated tablet(s)
      

      There were accompanying error messages printed in the catalog manager log that looked like this:

      I0914 15:03:35.774370 22121 catalog_manager.cc:2988] Scheduling retry of AddServer ChangeConfig RPC for tablet c4e8c3260dda48efbb7e182b569a7fc6 with cas_config_opid_index 5441 with a delay of 24 ms (attempt = 1)
      W0914 15:03:35.774382 22121 catalog_manager.cc:3007] Async tablet task AddServer ChangeConfig RPC for tablet c4e8c3260dda48efbb7e182b569a7fc6 with cas_config_opid_index 5441 failed: Not found: Failed to reset TS proxy: Could not find TS for UUID
      

      Attachments

        Activity

          People

            mpercy Mike Percy
            mpercy Mike Percy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: