Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 2.1.1, 2.0.3
    • amv2, proc-v2
    • None
    • Reviewed
    • Hide
      hbase-2.2.x uses a new Procedure form assiging/unassigning/moving Regions; it does not process hbase-2.1.x and earlier Unassign/Assign Procedure types. Upgrade requires that we first drain the Master Procedure Store of old style Procedures before starting the new 2.2.x Master. The patch here facilitates the draining process.

      On your running hbase-2.1.1+ (or 2.0.3+ cluster), when upgrading:

      1. Shutdown both active and standby masters (Your cluster will continue to server reads and writes without interruption).
      2. Set the property hbase.procedure.upgrade-to-2-2 to true in hbase-site.xml for the Master, and start only one Master, still using the 2.1.1+ (or 2.0.3+) binaries.
      3. Wait until the Master quits. Confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log as the cause of the shutdown. The Procedure Store is now empty.
      4. Start new Masters with the new 2.2.0+ code.
      Show
      hbase-2.2.x uses a new Procedure form assiging/unassigning/moving Regions; it does not process hbase-2.1.x and earlier Unassign/Assign Procedure types. Upgrade requires that we first drain the Master Procedure Store of old style Procedures before starting the new 2.2.x Master. The patch here facilitates the draining process. On your running hbase-2.1.1+ (or 2.0.3+ cluster), when upgrading: 1. Shutdown both active and standby masters (Your cluster will continue to server reads and writes without interruption). 2. Set the property hbase.procedure.upgrade-to-2-2 to true in hbase-site.xml for the Master, and start only one Master, still using the 2.1.1+ (or 2.0.3+) binaries. 3. Wait until the Master quits. Confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log as the cause of the shutdown. The Procedure Store is now empty. 4. Start new Masters with the new 2.2.0+ code.

    Description

      Click to add description

      Attachments

        Issue Links

        Activity

          zhangduo Duo Zhang added a comment -

          I think a possible way is that, we can introduce a config, if enabled, we will wait until it is safe to do rolling upgrading, and then we just call System.exit to shutdown the HMaster, so operators will know that it is OK to deploy HMaster with the new code then.

          The instructions will be:
          1. Shutdown both active and standby masters.
          2. Enable the rolling upgrading flag, and start only one master, still with the old code.
          3. Wait until the master quit, and confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log.
          4. Start new masters with the new code.

          WDYT sir? Michael Stack.

          Thanks.

          zhangduo Duo Zhang added a comment - I think a possible way is that, we can introduce a config, if enabled, we will wait until it is safe to do rolling upgrading, and then we just call System.exit to shutdown the HMaster, so operators will know that it is OK to deploy HMaster with the new code then. The instructions will be: 1. Shutdown both active and standby masters. 2. Enable the rolling upgrading flag, and start only one master, still with the old code. 3. Wait until the master quit, and confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log. 4. Start new masters with the new code. WDYT sir? Michael Stack . Thanks.
          stack Michael Stack added a comment -

          Don't we need this even if it is a stop/start upgrade, not just for a rolling upgrade? We need to clear out the old-style Procedures for both cases?

          Could we make setting the flag dynamic? A page where you set the config and the Master then runs until it has cleared out the old stuff and then System.exits (Yeah, I like your System.exit idea).

          stack Michael Stack added a comment - Don't we need this even if it is a stop/start upgrade, not just for a rolling upgrade? We need to clear out the old-style Procedures for both cases? Could we make setting the flag dynamic? A page where you set the config and the Master then runs until it has cleared out the old stuff and then System.exits (Yeah, I like your System.exit idea).
          zhangduo Duo Zhang added a comment -

          I still prefer a config, it is simple and no other side effect. I just saw that someone has put up a issue, that on our web page, anyone can split a table, even it does not have the permission of this table... I think here we will have the same problem, and even worse as this is an admin operation...

          The flag will only take effect on branch-2.0 and branch-2.1.

          zhangduo Duo Zhang added a comment - I still prefer a config, it is simple and no other side effect. I just saw that someone has put up a issue, that on our web page, anyone can split a table, even it does not have the permission of this table... I think here we will have the same problem, and even worse as this is an admin operation... The flag will only take effect on branch-2.0 and branch-2.1.
          zhangduo Duo Zhang added a comment -

          Will prepare a prototype soon.

          zhangduo Duo Zhang added a comment - Will prepare a prototype soon.
          zhangduo Duo Zhang added a comment -

          Michael Stack PTAL sir. Will try to write a UT.

          zhangduo Duo Zhang added a comment - Michael Stack PTAL sir. Will try to write a UT.
          stack Michael Stack added a comment -

          Looks great. Yeah, a UT and/or try it on a cluster would be good. s/All existed/All existing/ in next patch. Needs RN on process. We'll have to doc it in upgrade section in book too once we clear on process. Good stuff.

          stack Michael Stack added a comment - Looks great. Yeah, a UT and/or try it on a cluster would be good. s/All existed/All existing/ in next patch. Needs RN on process. We'll have to doc it in upgrade section in book too once we clear on process. Good stuff.
          stack Michael Stack added a comment -

          [~Apache9] Should 2.2 be 3.0?

          stack Michael Stack added a comment - [~Apache9] Should 2.2 be 3.0?
          zhangduo Duo Zhang added a comment -

          HBASE-20881 has been pushed to branch-2, so I think it should be 2.2?

          zhangduo Duo Zhang added a comment - HBASE-20881 has been pushed to branch-2, so I think it should be 2.2?
          stack Michael Stack added a comment -

          Agree [~Apache9] I changed it.

          stack Michael Stack added a comment - Agree [~Apache9] I changed it.
          zhangduo Duo Zhang added a comment -

          Oh I thought you mean the name of the flag...

          For this issue, it should be a block of 2.1.x and 2.0.x, as we will only commit the patch to these two branches...

          zhangduo Duo Zhang added a comment - Oh I thought you mean the name of the flag... For this issue, it should be a block of 2.1.x and 2.0.x, as we will only commit the patch to these two branches...
          stack Michael Stack added a comment -

          Yeah, you are right. I should have marked it 2.0.3.

          Whats to be done here to prove this mechanism will work [~Apache9]? Thanks.

          stack Michael Stack added a comment - Yeah, you are right. I should have marked it 2.0.3. Whats to be done here to prove this mechanism will work [~Apache9] ? Thanks.
          zhangduo Duo Zhang added a comment -

          I will try to write a UT first. And also test it on a cluster. And then we need to open a issue to write down the instructions in our ref guide.

          zhangduo Duo Zhang added a comment - I will try to write a UT first. And also test it on a cluster. And then we need to open a issue to write down the instructions in our ref guide.
          stack Michael Stack added a comment -

          I can help w/ the refguide bit. Could try it myself first so I knew what I was talking about (smile).

          stack Michael Stack added a comment - I can help w/ the refguide bit. Could try it myself first so I knew what I was talking about (smile).
          stack Michael Stack added a comment -

          Trying to figure something that might be a little easier on the operator.

          A clean shutdown should also work? Wait for quiescent amv2.. no assigning, crashing, then do shutdown. Let me check if this makes for any assign/unassigns. A tool could look at master wal proc and report if any assign/unassigns outstanding.

          Or, tool could look at master proc wals to see if outstanding assign/unassigns. If none, kill masters (standby first). Re-run tool to be sure. Then start 2.2?

          stack Michael Stack added a comment - Trying to figure something that might be a little easier on the operator. A clean shutdown should also work? Wait for quiescent amv2.. no assigning, crashing, then do shutdown. Let me check if this makes for any assign/unassigns. A tool could look at master wal proc and report if any assign/unassigns outstanding. Or, tool could look at master proc wals to see if outstanding assign/unassigns. If none, kill masters (standby first). Re-run tool to be sure. Then start 2.2?
          zhangduo Duo Zhang added a comment -

          OK, maybe a external tool, that write a special file on zk, and when master think it is safe to quit then it quits. The tool will also monitor the status of master, if the master quits, it will check the proc wals to see if there are outstanding unsupported procedures. If not, will output a message to say that it is safe to upgrade, otherwise you should restart from the beginning and try again.

          zhangduo Duo Zhang added a comment - OK, maybe a external tool, that write a special file on zk, and when master think it is safe to quit then it quits. The tool will also monitor the status of master, if the master quits, it will check the proc wals to see if there are outstanding unsupported procedures. If not, will output a message to say that it is safe to upgrade, otherwise you should restart from the beginning and try again.
          zhangduo Duo Zhang added a comment -

          Using zk is because that, I do not want to add a new method for master, which may introduce compatibility issues...

          zhangduo Duo Zhang added a comment - Using zk is because that, I do not want to add a new method for master, which may introduce compatibility issues...
          stack Michael Stack added a comment -

          Ok. I can do that. I won't commit this patch to branch-2.0 just yet. Let me roll 2.0.2. Can add in the zk thingy for 2.0.3 and 2.1.1.

          stack Michael Stack added a comment - Ok. I can do that. I won't commit this patch to branch-2.0 just yet. Let me roll 2.0.2. Can add in the zk thingy for 2.0.3 and 2.1.1.
          zhangduo Duo Zhang added a comment -

          Fine. I think we need more tests for it. Let's not block 2.0.2.

          zhangduo Duo Zhang added a comment - Fine. I think we need more tests for it. Let's not block 2.0.2.
          zhangduo Duo Zhang added a comment -

          Any progress here boss? Michael Stack. I think it is time to roll a 2.1.1?

          zhangduo Duo Zhang added a comment - Any progress here boss? Michael Stack . I think it is time to roll a 2.1.1?
          stack Michael Stack added a comment -

          I tried this. Seems to work:

          ...
          2018-10-19 11:06:09,371 INFO [ProcExecTimeout] procedure2.ProcedureExecutor: UPGRADE OK: All existed procedures have been finished, quit...
          2018-10-19 11:06:09,373 INFO [Thread-4] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@263558c9
          2018-10-19 11:06:09,373 INFO [Thread-4] regionserver.HRegionServer: ***** STOPPING region server 'localhost,16020,1539972363652' *****
          ...

          I suppose we could start a backup master that is the new code so it could take over soon as the old one stops?

          Hopefully we can figure a smoother transition by the time of 2.2 but lets commit this in meantime. Thanks [~Apache9]

          stack Michael Stack added a comment - I tried this. Seems to work: ... 2018-10-19 11:06:09,371 INFO [ProcExecTimeout] procedure2.ProcedureExecutor: UPGRADE OK: All existed procedures have been finished, quit... 2018-10-19 11:06:09,373 INFO [Thread-4] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@263558c9 2018-10-19 11:06:09,373 INFO [Thread-4] regionserver.HRegionServer: ***** STOPPING region server 'localhost,16020,1539972363652' ***** ... I suppose we could start a backup master that is the new code so it could take over soon as the old one stops? Hopefully we can figure a smoother transition by the time of 2.2 but lets commit this in meantime. Thanks [~Apache9]
          stack Michael Stack added a comment -

          Rebased patch. Trying against hadoopqa.

          stack Michael Stack added a comment - Rebased patch. Trying against hadoopqa.
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
                Prechecks
          +1 hbaseanti 0m 0s Patch does not have any anti-patterns.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -0 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
                master Compile Tests
          +1 mvninstall 5m 20s master passed
          +1 compile 0m 21s master passed
          +1 checkstyle 0m 15s master passed
          +1 shadedjars 4m 14s branch has no errors when building our shaded downstream artifacts.
          +1 findbugs 0m 26s master passed
          +1 javadoc 0m 14s master passed
                Patch Compile Tests
          +1 mvninstall 5m 26s the patch passed
          +1 compile 0m 22s the patch passed
          +1 javac 0m 22s the patch passed
          +1 checkstyle 0m 15s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 shadedjars 4m 24s patch has no errors when building our shaded downstream artifacts.
          +1 hadoopcheck 11m 17s Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0.
          +1 findbugs 0m 34s the patch passed
          +1 javadoc 0m 14s the patch passed
                Other Tests
          +1 unit 3m 9s hbase-procedure in the patch passed.
          +1 asflicense 0m 11s The patch does not generate ASF License warnings.
          37m 28s



          Subsystem Report/Notes
          Docker Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b
          JIRA Issue HBASE-21075
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12944775/0001-HBASE-21075-Confirm-that-we-can-rolling-upgrade-from.patch
          Optional Tests dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
          uname Linux 9896e49013ff 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux
          Build tool maven
          Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
          git revision master / 05d22ed960
          maven version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z)
          Default Java 1.8.0_181
          findbugs v3.1.0-RC3
          Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/14766/testReport/
          Max. process+thread count 279 (vs. ulimit of 10000)
          modules C: hbase-procedure U: hbase-procedure
          Console output https://builds.apache.org/job/PreCommit-HBASE-Build/14766/console
          Powered by Apache Yetus 0.8.0 http://yetus.apache.org

          This message was automatically generated.

          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated.       Prechecks +1 hbaseanti 0m 0s Patch does not have any anti-patterns. +1 @author 0m 0s The patch does not contain any @author tags. -0 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       master Compile Tests +1 mvninstall 5m 20s master passed +1 compile 0m 21s master passed +1 checkstyle 0m 15s master passed +1 shadedjars 4m 14s branch has no errors when building our shaded downstream artifacts. +1 findbugs 0m 26s master passed +1 javadoc 0m 14s master passed       Patch Compile Tests +1 mvninstall 5m 26s the patch passed +1 compile 0m 22s the patch passed +1 javac 0m 22s the patch passed +1 checkstyle 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 shadedjars 4m 24s patch has no errors when building our shaded downstream artifacts. +1 hadoopcheck 11m 17s Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. +1 findbugs 0m 34s the patch passed +1 javadoc 0m 14s the patch passed       Other Tests +1 unit 3m 9s hbase-procedure in the patch passed. +1 asflicense 0m 11s The patch does not generate ASF License warnings. 37m 28s Subsystem Report/Notes Docker Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b JIRA Issue HBASE-21075 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12944775/0001-HBASE-21075-Confirm-that-we-can-rolling-upgrade-from.patch Optional Tests dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile uname Linux 9896e49013ff 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh git revision master / 05d22ed960 maven version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) Default Java 1.8.0_181 findbugs v3.1.0-RC3 Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/14766/testReport/ Max. process+thread count 279 (vs. ulimit of 10000) modules C: hbase-procedure U: hbase-procedure Console output https://builds.apache.org/job/PreCommit-HBASE-Build/14766/console Powered by Apache Yetus 0.8.0 http://yetus.apache.org This message was automatically generated.
          stack Michael Stack added a comment -

          Pushed to branch-2.0 + branch-2.1. Thanks for the patch [~Apache9] . Feel free to edit the RN I proffered.

          stack Michael Stack added a comment - Pushed to branch-2.0 + branch-2.1. Thanks for the patch [~Apache9] . Feel free to edit the RN I proffered.
          hudson Hudson added a comment -

          Results for branch branch-2.1
          build #494 on builds.a.o: -1 overall


          details (if available):

          +1 general checks
          – For more information see general report

          +1 jdk8 hadoop2 checks
          – For more information see jdk8 (hadoop2) report

          -1 jdk8 hadoop3 checks
          – For more information see jdk8 (hadoop3) report

          +1 source release artifact
          – See build output for details.

          +1 client integration test

          hudson Hudson added a comment - Results for branch branch-2.1 build #494 on builds.a.o : -1 overall details (if available): +1 general checks – For more information see general report +1 jdk8 hadoop2 checks – For more information see jdk8 (hadoop2) report -1 jdk8 hadoop3 checks – For more information see jdk8 (hadoop3) report +1 source release artifact – See build output for details. +1 client integration test
          hudson Hudson added a comment -

          Results for branch branch-2.1
          build #492 on builds.a.o: -1 overall


          details (if available):

          +1 general checks
          – For more information see general report

          +1 jdk8 hadoop2 checks
          – For more information see jdk8 (hadoop2) report

          -1 jdk8 hadoop3 checks
          – For more information see jdk8 (hadoop3) report

          +1 source release artifact
          – See build output for details.

          +1 client integration test

          hudson Hudson added a comment - Results for branch branch-2.1 build #492 on builds.a.o : -1 overall details (if available): +1 general checks – For more information see general report +1 jdk8 hadoop2 checks – For more information see jdk8 (hadoop2) report -1 jdk8 hadoop3 checks – For more information see jdk8 (hadoop3) report +1 source release artifact – See build output for details. +1 client integration test
          hudson Hudson added a comment -

          Results for branch branch-2.0
          build #978 on builds.a.o: -1 overall


          details (if available):

          +1 general checks
          – For more information see general report

          -1 jdk8 hadoop2 checks
          – For more information see jdk8 (hadoop2) report

          -1 jdk8 hadoop3 checks
          – For more information see jdk8 (hadoop3) report

          +1 source release artifact
          – See build output for details.

          hudson Hudson added a comment - Results for branch branch-2.0 build #978 on builds.a.o : -1 overall details (if available): +1 general checks – For more information see general report -1 jdk8 hadoop2 checks – For more information see jdk8 (hadoop2) report -1 jdk8 hadoop3 checks – For more information see jdk8 (hadoop3) report +1 source release artifact – See build output for details.
          hudson Hudson added a comment -

          Results for branch branch-2.1
          build #493 on builds.a.o: +1 overall


          details (if available):

          +1 general checks
          – For more information see general report

          +1 jdk8 hadoop2 checks
          – For more information see jdk8 (hadoop2) report

          +1 jdk8 hadoop3 checks
          – For more information see jdk8 (hadoop3) report

          +1 source release artifact
          – See build output for details.

          +1 client integration test

          hudson Hudson added a comment - Results for branch branch-2.1 build #493 on builds.a.o : +1 overall details (if available): +1 general checks – For more information see general report +1 jdk8 hadoop2 checks – For more information see jdk8 (hadoop2) report +1 jdk8 hadoop3 checks – For more information see jdk8 (hadoop3) report +1 source release artifact – See build output for details. +1 client integration test

          People

            zhangduo Duo Zhang Assign to me
            zhangduo Duo Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                In order to see discussions, first confirm access to your Slack account(s) in the following workspace(s): ASF