[HBASE-21075] Confirm that we can (rolling) upgrade from 2.0.x and 2.1.x to 2.2.x after HBASE-20881 - ASF JIRA

Details

Type: Sub-task
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1.1, 2.0.3
Component/s: amv2, proc-v2
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
hbase-2.2.x uses a new Procedure form assiging/unassigning/moving Regions; it does not process hbase-2.1.x and earlier Unassign/Assign Procedure types. Upgrade requires that we first drain the Master Procedure Store of old style Procedures before starting the new 2.2.x Master. The patch here facilitates the draining process.

On your running hbase-2.1.1+ (or 2.0.3+ cluster), when upgrading:

1. Shutdown both active and standby masters (Your cluster will continue to server reads and writes without interruption).
2. Set the property hbase.procedure.upgrade-to-2-2 to true in hbase-site.xml for the Master, and start only one Master, still using the 2.1.1+ (or 2.0.3+) binaries.
3. Wait until the Master quits. Confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log as the cause of the shutdown. The Procedure Store is now empty.
4. Start new Masters with the new 2.2.0+ code.

Show
hbase-2.2.x uses a new Procedure form assiging/unassigning/moving Regions; it does not process hbase-2.1.x and earlier Unassign/Assign Procedure types. Upgrade requires that we first drain the Master Procedure Store of old style Procedures before starting the new 2.2.x Master. The patch here facilitates the draining process. On your running hbase-2.1.1+ (or 2.0.3+ cluster), when upgrading: 1. Shutdown both active and standby masters (Your cluster will continue to server reads and writes without interruption). 2. Set the property hbase.procedure.upgrade-to-2-2 to true in hbase-site.xml for the Master, and start only one Master, still using the 2.1.1+ (or 2.0.3+) binaries. 3. Wait until the Master quits. Confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log as the cause of the shutdown. The Procedure Store is now empty. 4. Start new Masters with the new 2.2.0+ code.

Description

Click to add description

Attachments

0001-HBASE-21075-Confirm-that-we-can-rolling-upgrade-from.patch
Delete this attachment
19/Oct/18 18:10
4 kB
Michael Stack
HBASE-21075.patch
Delete this attachment
27/Aug/18 07:36
4 kB
Duo Zhang

Issue Links

Add Link

relates to

HBASE-20881 Introduce a region transition procedure to handle all the state transition for a region

Resolved

Delete this link

HBASE-21970 Document that how to upgrade from 2.0 or 2.1 to 2.2+

Resolved

Delete this link

Activity

Ascending order - Click to sort in descending order

Duo Zhang added a comment - 20/Aug/18 22:29

I think a possible way is that, we can introduce a config, if enabled, we will wait until it is safe to do rolling upgrading, and then we just call System.exit to shutdown the HMaster, so operators will know that it is OK to deploy HMaster with the new code then.

The instructions will be:
1. Shutdown both active and standby masters.
2. Enable the rolling upgrading flag, and start only one master, still with the old code.
3. Wait until the master quit, and confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log.
4. Start new masters with the new code.

WDYT sir? Michael Stack.

Thanks.

Duo Zhang added a comment - 20/Aug/18 22:29 I think a possible way is that, we can introduce a config, if enabled, we will wait until it is safe to do rolling upgrading, and then we just call System.exit to shutdown the HMaster, so operators will know that it is OK to deploy HMaster with the new code then. The instructions will be: 1. Shutdown both active and standby masters. 2. Enable the rolling upgrading flag, and start only one master, still with the old code. 3. Wait until the master quit, and confirm that there is a 'READY TO ROLLING UPGRADE' message in the master log. 4. Start new masters with the new code. WDYT sir? Michael Stack . Thanks.

Michael Stack added a comment - 20/Aug/18 23:16

Don't we need this even if it is a stop/start upgrade, not just for a rolling upgrade? We need to clear out the old-style Procedures for both cases?

Could we make setting the flag dynamic? A page where you set the config and the Master then runs until it has cleared out the old stuff and then System.exits (Yeah, I like your System.exit idea).

Michael Stack added a comment - 20/Aug/18 23:16 Don't we need this even if it is a stop/start upgrade, not just for a rolling upgrade? We need to clear out the old-style Procedures for both cases? Could we make setting the flag dynamic? A page where you set the config and the Master then runs until it has cleared out the old stuff and then System.exits (Yeah, I like your System.exit idea).

Duo Zhang added a comment - 21/Aug/18 00:32

I still prefer a config, it is simple and no other side effect. I just saw that someone has put up a issue, that on our web page, anyone can split a table, even it does not have the permission of this table... I think here we will have the same problem, and even worse as this is an admin operation...

The flag will only take effect on branch-2.0 and branch-2.1.

Duo Zhang added a comment - 21/Aug/18 00:32 I still prefer a config, it is simple and no other side effect. I just saw that someone has put up a issue, that on our web page, anyone can split a table, even it does not have the permission of this table... I think here we will have the same problem, and even worse as this is an admin operation... The flag will only take effect on branch-2.0 and branch-2.1.

Duo Zhang added a comment - 26/Aug/18 10:29

Will prepare a prototype soon.

Duo Zhang added a comment - 26/Aug/18 10:29 Will prepare a prototype soon.

Duo Zhang added a comment - 27/Aug/18 07:36

Michael Stack PTAL sir. Will try to write a UT.

Duo Zhang added a comment - 27/Aug/18 07:36 Michael Stack PTAL sir. Will try to write a UT.

Michael Stack added a comment - 27/Aug/18 13:59

Looks great. Yeah, a UT and/or try it on a cluster would be good. s/All existed/All existing/ in next patch. Needs RN on process. We'll have to doc it in upgrade section in book too once we clear on process. Good stuff.

Michael Stack added a comment - 27/Aug/18 13:59 Looks great. Yeah, a UT and/or try it on a cluster would be good. s/All existed/All existing/ in next patch. Needs RN on process. We'll have to doc it in upgrade section in book too once we clear on process. Good stuff.

Michael Stack added a comment - 27/Aug/18 14:53

[~Apache9] Should 2.2 be 3.0?

Michael Stack added a comment - 27/Aug/18 14:53 [~Apache9] Should 2.2 be 3.0?

Duo Zhang added a comment - 27/Aug/18 22:08

~~HBASE-20881~~ has been pushed to branch-2, so I think it should be 2.2?

Duo Zhang added a comment - 27/Aug/18 22:08 HBASE-20881 has been pushed to branch-2, so I think it should be 2.2?

Michael Stack added a comment - 27/Aug/18 22:36

Agree [~Apache9] I changed it.

Michael Stack added a comment - 27/Aug/18 22:36 Agree [~Apache9] I changed it.

Duo Zhang added a comment - 27/Aug/18 22:39

Oh I thought you mean the name of the flag...

For this issue, it should be a block of 2.1.x and 2.0.x, as we will only commit the patch to these two branches...

Duo Zhang added a comment - 27/Aug/18 22:39 Oh I thought you mean the name of the flag... For this issue, it should be a block of 2.1.x and 2.0.x, as we will only commit the patch to these two branches...

Michael Stack added a comment - 28/Aug/18 01:15

Yeah, you are right. I should have marked it 2.0.3.

Whats to be done here to prove this mechanism will work [~Apache9]? Thanks.

Michael Stack added a comment - 28/Aug/18 01:15 Yeah, you are right. I should have marked it 2.0.3. Whats to be done here to prove this mechanism will work [~Apache9] ? Thanks.

Duo Zhang added a comment - 28/Aug/18 01:24

I will try to write a UT first. And also test it on a cluster. And then we need to open a issue to write down the instructions in our ref guide.

Duo Zhang added a comment - 28/Aug/18 01:24 I will try to write a UT first. And also test it on a cluster. And then we need to open a issue to write down the instructions in our ref guide.

Michael Stack added a comment - 28/Aug/18 01:27

I can help w/ the refguide bit. Could try it myself first so I knew what I was talking about (smile).

Michael Stack added a comment - 28/Aug/18 01:27 I can help w/ the refguide bit. Could try it myself first so I knew what I was talking about (smile).

Michael Stack added a comment - 28/Aug/18 15:59

Trying to figure something that might be a little easier on the operator.

A clean shutdown should also work? Wait for quiescent amv2.. no assigning, crashing, then do shutdown. Let me check if this makes for any assign/unassigns. A tool could look at master wal proc and report if any assign/unassigns outstanding.

Or, tool could look at master proc wals to see if outstanding assign/unassigns. If none, kill masters (standby first). Re-run tool to be sure. Then start 2.2?

Michael Stack added a comment - 28/Aug/18 15:59 Trying to figure something that might be a little easier on the operator. A clean shutdown should also work? Wait for quiescent amv2.. no assigning, crashing, then do shutdown. Let me check if this makes for any assign/unassigns. A tool could look at master wal proc and report if any assign/unassigns outstanding. Or, tool could look at master proc wals to see if outstanding assign/unassigns. If none, kill masters (standby first). Re-run tool to be sure. Then start 2.2?

Duo Zhang added a comment - 29/Aug/18 01:48

OK, maybe a external tool, that write a special file on zk, and when master think it is safe to quit then it quits. The tool will also monitor the status of master, if the master quits, it will check the proc wals to see if there are outstanding unsupported procedures. If not, will output a message to say that it is safe to upgrade, otherwise you should restart from the beginning and try again.

Duo Zhang added a comment - 29/Aug/18 01:48 OK, maybe a external tool, that write a special file on zk, and when master think it is safe to quit then it quits. The tool will also monitor the status of master, if the master quits, it will check the proc wals to see if there are outstanding unsupported procedures. If not, will output a message to say that it is safe to upgrade, otherwise you should restart from the beginning and try again.

Duo Zhang added a comment - 29/Aug/18 01:49

Using zk is because that, I do not want to add a new method for master, which may introduce compatibility issues...

Duo Zhang added a comment - 29/Aug/18 01:49 Using zk is because that, I do not want to add a new method for master, which may introduce compatibility issues...

Michael Stack added a comment - 29/Aug/18 03:25

Ok. I can do that. I won't commit this patch to branch-2.0 just yet. Let me roll 2.0.2. Can add in the zk thingy for 2.0.3 and 2.1.1.

Michael Stack added a comment - 29/Aug/18 03:25 Ok. I can do that. I won't commit this patch to branch-2.0 just yet. Let me roll 2.0.2. Can add in the zk thingy for 2.0.3 and 2.1.1.

Duo Zhang added a comment - 29/Aug/18 03:26

Fine. I think we need more tests for it. Let's not block 2.0.2.

Duo Zhang added a comment - 29/Aug/18 03:26 Fine. I think we need more tests for it. Let's not block 2.0.2.

Duo Zhang added a comment - 10/Sep/18 07:52

Any progress here boss? Michael Stack. I think it is time to roll a 2.1.1?

Duo Zhang added a comment - 10/Sep/18 07:52 Any progress here boss? Michael Stack . I think it is time to roll a 2.1.1?

Michael Stack added a comment - 19/Oct/18 18:08

I tried this. Seems to work:

...
2018-10-19 11:06:09,371 INFO [ProcExecTimeout] procedure2.ProcedureExecutor: UPGRADE OK: All existed procedures have been finished, quit...
2018-10-19 11:06:09,373 INFO [Thread-4] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@263558c9
2018-10-19 11:06:09,373 INFO [Thread-4] regionserver.HRegionServer: ***** STOPPING region server 'localhost,16020,1539972363652' *****
...

I suppose we could start a backup master that is the new code so it could take over soon as the old one stops?

Hopefully we can figure a smoother transition by the time of 2.2 but lets commit this in meantime. Thanks [~Apache9]

Michael Stack added a comment - 19/Oct/18 18:08 I tried this. Seems to work: ... 2018-10-19 11:06:09,371 INFO [ProcExecTimeout] procedure2.ProcedureExecutor: UPGRADE OK: All existed procedures have been finished, quit... 2018-10-19 11:06:09,373 INFO [Thread-4] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@263558c9 2018-10-19 11:06:09,373 INFO [Thread-4] regionserver.HRegionServer: ***** STOPPING region server 'localhost,16020,1539972363652' ***** ... I suppose we could start a backup master that is the new code so it could take over soon as the old one stops? Hopefully we can figure a smoother transition by the time of 2.2 but lets commit this in meantime. Thanks [~Apache9]

Michael Stack added a comment - 19/Oct/18 18:10

Rebased patch. Trying against hadoopqa.

Michael Stack added a comment - 19/Oct/18 18:10 Rebased patch. Trying against hadoopqa.

Hadoop QA added a comment - 19/Oct/18 18:58

+1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 17s	Docker mode activated.
			Prechecks
+1	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1	@author	0m 0s	The patch does not contain any @author tags.
-0	test4tests	0m 0s	The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			master Compile Tests
+1	mvninstall	5m 20s	master passed
+1	compile	0m 21s	master passed
+1	checkstyle	0m 15s	master passed
+1	shadedjars	4m 14s	branch has no errors when building our shaded downstream artifacts.
+1	findbugs	0m 26s	master passed
+1	javadoc	0m 14s	master passed
			Patch Compile Tests
+1	mvninstall	5m 26s	the patch passed
+1	compile	0m 22s	the patch passed
+1	javac	0m 22s	the patch passed
+1	checkstyle	0m 15s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	shadedjars	4m 24s	patch has no errors when building our shaded downstream artifacts.
+1	hadoopcheck	11m 17s	Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0.
+1	findbugs	0m 34s	the patch passed
+1	javadoc	0m 14s	the patch passed
			Other Tests
+1	unit	3m 9s	hbase-procedure in the patch passed.
+1	asflicense	0m 11s	The patch does not generate ASF License warnings.
		37m 28s

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b
JIRA Issue	~~HBASE-21075~~
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12944775/0001-HBASE-21075-Confirm-that-we-can-rolling-upgrade-from.patch
Optional Tests	dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname	Linux 9896e49013ff 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
git revision	master / 05d22ed960
maven	version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z)
Default Java	1.8.0_181
findbugs	v3.1.0-RC3
Test Results	https://builds.apache.org/job/PreCommit-HBASE-Build/14766/testReport/
Max. process+thread count	279 (vs. ulimit of 10000)
modules	C: hbase-procedure U: hbase-procedure
Console output	https://builds.apache.org/job/PreCommit-HBASE-Build/14766/console
Powered by	Apache Yetus 0.8.0 http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 19/Oct/18 18:58 +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. Prechecks +1 hbaseanti 0m 0s Patch does not have any anti-patterns. +1 @author 0m 0s The patch does not contain any @author tags. -0 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. master Compile Tests +1 mvninstall 5m 20s master passed +1 compile 0m 21s master passed +1 checkstyle 0m 15s master passed +1 shadedjars 4m 14s branch has no errors when building our shaded downstream artifacts. +1 findbugs 0m 26s master passed +1 javadoc 0m 14s master passed Patch Compile Tests +1 mvninstall 5m 26s the patch passed +1 compile 0m 22s the patch passed +1 javac 0m 22s the patch passed +1 checkstyle 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 shadedjars 4m 24s patch has no errors when building our shaded downstream artifacts. +1 hadoopcheck 11m 17s Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. +1 findbugs 0m 34s the patch passed +1 javadoc 0m 14s the patch passed Other Tests +1 unit 3m 9s hbase-procedure in the patch passed. +1 asflicense 0m 11s The patch does not generate ASF License warnings. 37m 28s Subsystem Report/Notes Docker Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b JIRA Issue HBASE-21075 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12944775/0001-HBASE-21075-Confirm-that-we-can-rolling-upgrade-from.patch Optional Tests dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile uname Linux 9896e49013ff 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh git revision master / 05d22ed960 maven version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) Default Java 1.8.0_181 findbugs v3.1.0-RC3 Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/14766/testReport/ Max. process+thread count 279 (vs. ulimit of 10000) modules C: hbase-procedure U: hbase-procedure Console output https://builds.apache.org/job/PreCommit-HBASE-Build/14766/console Powered by Apache Yetus 0.8.0 http://yetus.apache.org This message was automatically generated.

Michael Stack added a comment - 19/Oct/18 19:36

Pushed to branch-2.0 + branch-2.1. Thanks for the patch [~Apache9] . Feel free to edit the RN I proffered.

Michael Stack added a comment - 19/Oct/18 19:36 Pushed to branch-2.0 + branch-2.1. Thanks for the patch [~Apache9] . Feel free to edit the RN I proffered.

Hudson added a comment - 20/Oct/18 04:01

Results for branch branch-2.1
build #494 on builds.a.o: -1 overall

details (if available):

+1 general checks
– For more information see general report

+1 jdk8 hadoop2 checks
– For more information see jdk8 (hadoop2) report

-1 jdk8 hadoop3 checks
– For more information see jdk8 (hadoop3) report

+1 source release artifact
– See build output for details.

+1 client integration test

Hudson added a comment - 20/Oct/18 04:01 Results for branch branch-2.1 build #494 on builds.a.o : -1 overall details (if available): +1 general checks – For more information see general report +1 jdk8 hadoop2 checks – For more information see jdk8 (hadoop2) report -1 jdk8 hadoop3 checks – For more information see jdk8 (hadoop3) report +1 source release artifact – See build output for details. +1 client integration test

Hudson added a comment - 20/Oct/18 04:24

Results for branch branch-2.1
build #492 on builds.a.o: -1 overall

details (if available):

+1 general checks
– For more information see general report

+1 jdk8 hadoop2 checks
– For more information see jdk8 (hadoop2) report

-1 jdk8 hadoop3 checks
– For more information see jdk8 (hadoop3) report

+1 source release artifact
– See build output for details.

+1 client integration test

Hudson added a comment - 20/Oct/18 04:24 Results for branch branch-2.1 build #492 on builds.a.o : -1 overall details (if available): +1 general checks – For more information see general report +1 jdk8 hadoop2 checks – For more information see jdk8 (hadoop2) report -1 jdk8 hadoop3 checks – For more information see jdk8 (hadoop3) report +1 source release artifact – See build output for details. +1 client integration test

Hudson added a comment - 20/Oct/18 04:32

Results for branch branch-2.0
build #978 on builds.a.o: -1 overall

details (if available):

+1 general checks
– For more information see general report

-1 jdk8 hadoop2 checks
– For more information see jdk8 (hadoop2) report

-1 jdk8 hadoop3 checks
– For more information see jdk8 (hadoop3) report

+1 source release artifact
– See build output for details.

Hudson added a comment - 20/Oct/18 04:32 Results for branch branch-2.0 build #978 on builds.a.o : -1 overall details (if available): +1 general checks – For more information see general report -1 jdk8 hadoop2 checks – For more information see jdk8 (hadoop2) report -1 jdk8 hadoop3 checks – For more information see jdk8 (hadoop3) report +1 source release artifact – See build output for details.

Hudson added a comment - 20/Oct/18 04:38

Results for branch branch-2.1
build #493 on builds.a.o: +1 overall

details (if available):

+1 general checks
– For more information see general report

+1 jdk8 hadoop2 checks
– For more information see jdk8 (hadoop2) report

+1 jdk8 hadoop3 checks
– For more information see jdk8 (hadoop3) report

+1 source release artifact
– See build output for details.

+1 client integration test

Hudson added a comment - 20/Oct/18 04:38 Results for branch branch-2.1 build #493 on builds.a.o : +1 overall details (if available): +1 general checks – For more information see general report +1 jdk8 hadoop2 checks – For more information see jdk8 (hadoop2) report +1 jdk8 hadoop3 checks – For more information see jdk8 (hadoop3) report +1 source release artifact – See build output for details. +1 client integration test

Comment

Viewable by All Users

Cancel

People

Assignee:: Duo Zhang Assign to me

Reporter:: Duo Zhang

Votes:: 0 Vote for this issue

Watchers:: 8
Start watching this issue

Dates

Created:: 20/Aug/18 07:05

Updated:: 28/Feb/19 07:58

Resolved:: 19/Oct/18 19:36

Agile

View on Board

Slack

In order to see discussions, first confirm access to your Slack account(s) in the following workspace(s): ASF

Issue deployment

HBase

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment