Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3698 Ozone Non-Rolling upgrades
  3. HDDS-4227

Implement a "prepareForUpgrade" step that applies all committed transactions onto the OM state machine.




      Why is this needed?
      Through HDDS-4143, we have a generic factory to handle multiple versions of apply transaction implementations based on layout version. Hence, this factory can be used to handle versioned requests across layout versions, whenever both the versions need to exist in the code (Let's say for HDDS-2939).

      However, it has been noticed that the OM ratis requests are still undergoing lot of minor changes (HDDS-4007, HDDS-4007, HDDS-3903), and in these cases it will become hard to maintain 2 versions of the code just to support clean upgrades.

      Hence, the plan is to build a pre-upgrade utility (client API) that makes sure that an OM instance has no "un-applied" transactions in this Raft log. Invoking this client API makes sure that the upgrade starts with a clean state. Of course, this would be needed only in a HA setup. In a non HA setup, this can either be skipped, or when invoked will be a No-Op (Non Ratis) or cause no harm (Single node Ratis).

      How does it work?
      Before updating the software bits, our goal is to get OMs to get to the latest state with respect to apply transaction. The reason we want this is to make sure that the same version of the code executes the AT step in all the 3 OMs. In a high level, the flow will be as follows.

      • Before upgrade, stop the OMs.
      • Start OMs with a special flag --prepareUpgrade (This is something like --init, which is a special state which stops the ephemeral OM instance after doing some work)
      • When OM is started with the --prepareUpgrade flag, it does not start the RPC server, so no new requests can get in.
      • In this state, we give every OM time to apply txn until the last txn.
      • We know that at least 2 OMs would have gotten the last client request transaction committed into their log. Hence, those 2 OMs are expected to apply transaction to that index faster.
      • At every OM, the Raft log will be purged after this wait period (so that the replay does not happen), and a Ratis snapshot taken at last txn.
      • Even if there is a lagger OM which is unable to get to last applied txn index, its logs will be purged after the wait time expires.
      • Now when OMs are started with newer version, all the OMs will start using the new code.
      • The lagger OM will get the new Ratis snapshot since there are no logs to replay from.


        Issue Links



              avijayan Aravindan Vijayan
              avijayan Aravindan Vijayan
              0 Vote for this issue
              3 Start watching this issue