Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-2019

Fixed abnormal exit of StateMachineUpdater

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0
    • Leader
    • None

    Description

      In some scenarios where Ratis is restarted, we find that there is a certain probability of an error at startup.

      For case 1

       

       

      By looking through the code, I found here is a problem with the code

      StateMachineUpdater will call this line when applying any member change log from previous term if the Leader exists, but the startupEntry for the current term may not have been initialized yet, so the assertion will throw an error.

      We should only fire this assertion if the log matches the current term.

      In addition, I found that the current implementation triggers notifyLeaderReady several times in the member change log of the current term, which is not consistent with the semantics of this interface, because the Leader is always in the ready state

      For case 2

      I noticed that StateMachineUpdater fetching leaderState and raftserver changetoFollower are asynchronous. As shown in the log, StateMachineUpdater gets leaderStateImpl with term 179 during execution and executes checkReady, during which time it receives log requests with larger term. Update term to 180, set leaderStateImpl to null, and then run getCurrentTerm on leaderStateImpl with term 179. In this case, we should get the latest term directly from server.getState().getCurrentTerm() so that we don't get this error

      Attachments

        1. image-2024-01-29-11-36-17-263.png
          372 kB
          Xinyu Tan
        2. screenshot-1.png
          546 kB
          Xinyu Tan
        3. screenshot-2.png
          242 kB
          Xinyu Tan

        Issue Links

          Activity

            People

              tanxinyu Xinyu Tan
              tanxinyu Xinyu Tan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m