Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1391

2 of 3 replica alive but failed to elect leader

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • NA
    • consensus
    • None

    Description

      Last weekend many TS have a lot too many open files error(haven't upgrade to , when using our internal deploy tool to restart cluster (stop all ts, then start all ts), the control machine have some issue which seems to block or write to ssh terminal(maybe usb driver issue, not related to this bug), so only half (about 30) of the TS is shutdown, then after maybe 10 minutes, I switch to another control host and perform the whole restart.
      Then I see writes are blocked, because 1 tablet is in no leader state, from web-ui, 2 of 3 replicas is in follower state, 1 TABLET_DATA_TOMBSTONED, but all election failed, will attach the log of the 2 followers.

      Attachments

        1. remote-bootstrap-tool.patch
          4 kB
          Todd Lipcon
        2. 6a32cfa0353e4175809c2aa67e16ac9e.log.st216
          844 kB
          Binglin Chang
        3. 6a32cfa0353e4175809c2aa67e16ac9e.log.st212.before
          1.12 MB
          Binglin Chang
        4. 6a32cfa0353e4175809c2aa67e16ac9e.log.st212
          354 kB
          Binglin Chang
        5. 6a32cfa0353e4175809c2aa67e16ac9e.log.st172
          422 kB
          Binglin Chang

        Activity

          People

            Unassigned Unassigned
            decster Binglin Chang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: