Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2948

ksck claims output is different but it’s actually the same

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.7.0
    • Fix Version/s: None
    • Component/s: ksck
    • Labels:
    • Environment:
      RHEL 7.7

      Description

      I came across this scenario where ksck reports the tablet has mismatched consensus and configs disagree with the master's but they do not seem to be:

      The consensus matrix is:
      Config source | Replicas | Current term | Config index | Committed?
      -------------------------------------------------------------
      master | A B* C | | | Yes
      A | A B* C | 2939 | 20571 | Yes
      B | A B* C | 2939 | 20571 | Yes
      C | A B* C | 2939 | 20571 | Yes
      Tablet 8137349615944d45a0897090d36d7a08 of table 'impala::<TableName>' is conflicted: Tablet 8137349615944d45a0897090d36d7a08 of table 'impala::<TableName>' replicas' active configs disagree with the master's
      6684505cec6f4a49b3442786cebdf06d (<serverFQDN>:7050): RUNNING
      8a3232953edd4ba79d20711d4ea3581d (<serverFQDN>:7050): RUNNING [LEADER]
      c8b68cb1366c45199668247d4d7c0295 (<serverFQDN>:7050): RUNNING

      All the peers reported by the master and tablet servers are:
      A = 6684505cec6f4a49b3442786cebdf06d
      B = 8a3232953edd4ba79d20711d4ea3581d
      C = c8b68cb1366c45199668247d4d7c0295

       

      The consensus matrix is:
      Config source | Replicas | Current term | Config index | Committed?
      -------------------------------------------------------------
      master | A B* C | | | Yes
      A | A B* C | 5030 | 6114195 | Yes
      B | A B* C | 5030 | 6114195 | Yes
      C | A B* C | 5030 | 6114195 | Yes
      Table impala::<TableName> has 1 tablet(s) with mismatched consensus

      1b6a44eadcd145f693390587c4e3308a (<serverFQDN>:7050): RUNNING
      36f1c8c3863a49778adeb6f40c73aa26 (<serverFQDN>:7050): RUNNING [LEADER]
      6c8fc8452f374672abdae3098a198101 (<serverFQDN>:7050): RUNNING

      0 replicas' active configs differ from the master's.
      All the peers reported by the master and tablet servers are:
      A = 1b6a44eadcd145f693390587c4e3308a
      B = 36f1c8c3863a49778adeb6f40c73aa26
      C = 6c8fc8452f374672abdae3098a198101

       

      Tablet servers are under high load - noticed many backpressure messages and the drives for data_dirs and wal_dis are encrypted using navencrypt.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              achennaka@cloudera.com Abhishek
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: