Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-18640

Implement placement driver best-effort single actor selector and fail-over

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None

    Description

      Motivation

      As a prerequisite, it's worth to mention that placement drive itself should be reliable and have corresponding fail-over logic, meaning that placement driver service should be distributed in a way that if one of its nodes fails another one picks up the flag. On the other hand, despite the fact, that it's valid to have more than one PD active actors (the one that will check topology, send leaseGrant msg, etc) it's better to have one only in order to reduce the amount of unnecessary calculations, messaging duplication and so on. So, to sum up:

      • PD may work on top of meta storage, using it as a consensus provider.
      • There may be more than one active PD actors, that try to evaluate primary replica along with corresponding lease, send leaseGrant msg, etc, meaning that actions should be idempotent or that we should have an ability to skip stale/concurrent triggers.
      • It worth to have at least best-effort single actor selection logic.

      Definition of Done

      • Almost always (because of best-effort nature, it's not always) there's only one PD active actor if there's a majority in ms group.
      • If for some reason active actor fails, another one will picks up the flag as fast as possible.
      • It's still valid to have multiple active actors at the same time. If you guys have any ideas of how to implement not more than one actor, please share them.

      Implementation Notes

      Assuming that we have a distributed onLeaderElected(Peer leader, long term) callback we may implement following logic on PlacementDriverManager#start()

      • register ms.onLeaderElected()    
        ms.onLeaderElected((leader, term) -> {
                if (term > lastSeenTerm) {
                    if (leader.equlas(localNode)) {
                        // Become an active actor.
                    } else {
                        // Discard activeness. 
                    }
                } else {
                    // No-op, just a stale update.
                }
            });
      • refreshLeader and to exact the same logic as the one mentioned above in order to become and active actor if there already was a leader during listener registration.

      Attachments

        Issue Links

          Activity

            People

              Denis Chudov Denis Chudov
              alapin Alexander Lapin
              Vladislav Pyatkov Vladislav Pyatkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h