Details
-
Sub-task
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
Hidehbase-2.2.x uses a new Procedure form assiging/unassigning/moving Regions; it does not process hbase-2.1.x and earlier Unassign/Assign Procedure types. Upgrade requires that we first drain the Master Procedure Store of old style Procedures before starting the new 2.2.x Master. The patch here facilitates the draining process.
On your running hbase-2.1.1+ (or 2.0.3+ cluster), when upgrading:
1. Shutdown both active and standby masters (Your cluster will continue to server reads and writes without interruption).
2. Set the property hbase.procedure.upgrade-to-2-2 to true in hbase-site.xml for the Master, and start only one Master, still using the 2.1.1+ (or 2.0.3+) binaries.
3. Wait until the Master quits. Confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log as the cause of the shutdown. The Procedure Store is now empty.
4. Start new Masters with the new 2.2.0+ code.Showhbase-2.2.x uses a new Procedure form assiging/unassigning/moving Regions; it does not process hbase-2.1.x and earlier Unassign/Assign Procedure types. Upgrade requires that we first drain the Master Procedure Store of old style Procedures before starting the new 2.2.x Master. The patch here facilitates the draining process. On your running hbase-2.1.1+ (or 2.0.3+ cluster), when upgrading: 1. Shutdown both active and standby masters (Your cluster will continue to server reads and writes without interruption). 2. Set the property hbase.procedure.upgrade-to-2-2 to true in hbase-site.xml for the Master, and start only one Master, still using the 2.1.1+ (or 2.0.3+) binaries. 3. Wait until the Master quits. Confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log as the cause of the shutdown. The Procedure Store is now empty. 4. Start new Masters with the new 2.2.0+ code.
Description
Attachments
Attachments
Issue Links
- relates to
-
HBASE-20881 Introduce a region transition procedure to handle all the state transition for a region
- Resolved
-
HBASE-21970 Document that how to upgrade from 2.0 or 2.1 to 2.2+
- Resolved
Activity
Don't we need this even if it is a stop/start upgrade, not just for a rolling upgrade? We need to clear out the old-style Procedures for both cases?
Could we make setting the flag dynamic? A page where you set the config and the Master then runs until it has cleared out the old stuff and then System.exits (Yeah, I like your System.exit idea).
I still prefer a config, it is simple and no other side effect. I just saw that someone has put up a issue, that on our web page, anyone can split a table, even it does not have the permission of this table... I think here we will have the same problem, and even worse as this is an admin operation...
The flag will only take effect on branch-2.0 and branch-2.1.
Looks great. Yeah, a UT and/or try it on a cluster would be good. s/All existed/All existing/ in next patch. Needs RN on process. We'll have to doc it in upgrade section in book too once we clear on process. Good stuff.
Oh I thought you mean the name of the flag...
For this issue, it should be a block of 2.1.x and 2.0.x, as we will only commit the patch to these two branches...
Yeah, you are right. I should have marked it 2.0.3.
Whats to be done here to prove this mechanism will work [~Apache9]? Thanks.
I will try to write a UT first. And also test it on a cluster. And then we need to open a issue to write down the instructions in our ref guide.
I can help w/ the refguide bit. Could try it myself first so I knew what I was talking about (smile).
Trying to figure something that might be a little easier on the operator.
A clean shutdown should also work? Wait for quiescent amv2.. no assigning, crashing, then do shutdown. Let me check if this makes for any assign/unassigns. A tool could look at master wal proc and report if any assign/unassigns outstanding.
Or, tool could look at master proc wals to see if outstanding assign/unassigns. If none, kill masters (standby first). Re-run tool to be sure. Then start 2.2?
OK, maybe a external tool, that write a special file on zk, and when master think it is safe to quit then it quits. The tool will also monitor the status of master, if the master quits, it will check the proc wals to see if there are outstanding unsupported procedures. If not, will output a message to say that it is safe to upgrade, otherwise you should restart from the beginning and try again.
Using zk is because that, I do not want to add a new method for master, which may introduce compatibility issues...
Ok. I can do that. I won't commit this patch to branch-2.0 just yet. Let me roll 2.0.2. Can add in the zk thingy for 2.0.3 and 2.1.1.
Any progress here boss? Michael Stack. I think it is time to roll a 2.1.1?
I tried this. Seems to work:
...
2018-10-19 11:06:09,371 INFO [ProcExecTimeout] procedure2.ProcedureExecutor: UPGRADE OK: All existed procedures have been finished, quit...
2018-10-19 11:06:09,373 INFO [Thread-4] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@263558c9
2018-10-19 11:06:09,373 INFO [Thread-4] regionserver.HRegionServer: ***** STOPPING region server 'localhost,16020,1539972363652' *****
...
I suppose we could start a backup master that is the new code so it could take over soon as the old one stops?
Hopefully we can figure a smoother transition by the time of 2.2 but lets commit this in meantime. Thanks [~Apache9]
+1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 17s | Docker mode activated. |
Prechecks | |||
+1 | hbaseanti | 0m 0s | Patch does not have any anti-patterns. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
-0 | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. |
master Compile Tests | |||
+1 | mvninstall | 5m 20s | master passed |
+1 | compile | 0m 21s | master passed |
+1 | checkstyle | 0m 15s | master passed |
+1 | shadedjars | 4m 14s | branch has no errors when building our shaded downstream artifacts. |
+1 | findbugs | 0m 26s | master passed |
+1 | javadoc | 0m 14s | master passed |
Patch Compile Tests | |||
+1 | mvninstall | 5m 26s | the patch passed |
+1 | compile | 0m 22s | the patch passed |
+1 | javac | 0m 22s | the patch passed |
+1 | checkstyle | 0m 15s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | shadedjars | 4m 24s | patch has no errors when building our shaded downstream artifacts. |
+1 | hadoopcheck | 11m 17s | Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. |
+1 | findbugs | 0m 34s | the patch passed |
+1 | javadoc | 0m 14s | the patch passed |
Other Tests | |||
+1 | unit | 3m 9s | hbase-procedure in the patch passed. |
+1 | asflicense | 0m 11s | The patch does not generate ASF License warnings. |
37m 28s |
Subsystem | Report/Notes |
---|---|
Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944775/0001-HBASE-21075-Confirm-that-we-can-rolling-upgrade-from.patch |
Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile |
uname | Linux 9896e49013ff 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux |
Build tool | maven |
Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh |
git revision | master / 05d22ed960 |
maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
Default Java | 1.8.0_181 |
findbugs | v3.1.0-RC3 |
Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14766/testReport/ |
Max. process+thread count | 279 (vs. ulimit of 10000) |
modules | C: hbase-procedure U: hbase-procedure |
Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14766/console |
Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
Pushed to branch-2.0 + branch-2.1. Thanks for the patch [~Apache9] . Feel free to edit the RN I proffered.
Results for branch branch-2.1
build #494 on builds.a.o: -1 overall
details (if available):
+1 general checks
– For more information see general report
+1 jdk8 hadoop2 checks
– For more information see jdk8 (hadoop2) report
-1 jdk8 hadoop3 checks
– For more information see jdk8 (hadoop3) report
+1 source release artifact
– See build output for details.
+1 client integration test
Results for branch branch-2.1
build #492 on builds.a.o: -1 overall
details (if available):
+1 general checks
– For more information see general report
+1 jdk8 hadoop2 checks
– For more information see jdk8 (hadoop2) report
-1 jdk8 hadoop3 checks
– For more information see jdk8 (hadoop3) report
+1 source release artifact
– See build output for details.
+1 client integration test
Results for branch branch-2.0
build #978 on builds.a.o: -1 overall
details (if available):
+1 general checks
– For more information see general report
-1 jdk8 hadoop2 checks
– For more information see jdk8 (hadoop2) report
-1 jdk8 hadoop3 checks
– For more information see jdk8 (hadoop3) report
+1 source release artifact
– See build output for details.
Results for branch branch-2.1
build #493 on builds.a.o: +1 overall
details (if available):
+1 general checks
– For more information see general report
+1 jdk8 hadoop2 checks
– For more information see jdk8 (hadoop2) report
+1 jdk8 hadoop3 checks
– For more information see jdk8 (hadoop3) report
+1 source release artifact
– See build output for details.
+1 client integration test
I think a possible way is that, we can introduce a config, if enabled, we will wait until it is safe to do rolling upgrading, and then we just call System.exit to shutdown the HMaster, so operators will know that it is OK to deploy HMaster with the new code then.
The instructions will be:
1. Shutdown both active and standby masters.
2. Enable the rolling upgrading flag, and start only one master, still with the old code.
3. Wait until the master quit, and confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log.
4. Start new masters with the new code.
WDYT sir? Michael Stack.
Thanks.