Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.2
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK:

      Reviewing 0.92v17

      isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere.

      Does isDeadRootServerInProgress need to be public? Ditto for meta version.

      This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier?

      Is there anything in place to stop us expiring a server twice if its carrying root and meta?

      What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta.

      I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all.

      It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here?

      Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works?

      Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired?

      This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch?

      This patch needs to have the v18 differences applied.

      1. HBASE-5270-92v11.patch
        27 kB
        chunhui shen
      2. HBASE-5270v11.patch
        32 kB
        chunhui shen
      3. hbase-5270v10.patch
        32 kB
        chunhui shen
      4. hbase-5270v9.patch
        28 kB
        chunhui shen
      5. hbase-5270v8.patch
        28 kB
        chunhui shen
      6. hbase-5270v7.patch
        29 kB
        chunhui shen
      7. hbase-5270v6.patch
        30 kB
        chunhui shen
      8. hbase-5270v5.patch
        30 kB
        chunhui shen
      9. hbase-5270v4.patch
        30 kB
        chunhui shen
      10. 5270-90v3.patch
        39 kB
        chunhui shen
      11. 5270-90v2.patch
        38 kB
        chunhui shen
      12. 5270-90-testcasev2.patch
        15 kB
        chunhui shen
      13. 5270-testcasev2.patch
        15 kB
        chunhui shen
      14. hbase-5270v2.patch
        35 kB
        chunhui shen
      15. sampletest.txt
        6 kB
        stack
      16. 5270-90-testcase.patch
        9 kB
        chunhui shen
      17. 5270-90.patch
        31 kB
        chunhui shen
      18. 5270-testcase.patch
        10 kB
        chunhui shen
      19. hbase-5270.patch
        28 kB
        chunhui shen

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #332 (See https://builds.apache.org/job/HBase-0.92/332/)
          HBASE-5270 Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler (Revision 1303219)
          HBASE-5270 Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler (Revision 1303218)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java

          stack :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #332 (See https://builds.apache.org/job/HBase-0.92/332/ ) HBASE-5270 Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler (Revision 1303219) HBASE-5270 Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler (Revision 1303218) Result = SUCCESS stack : Files : /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
          Hide
          stack added a comment -

          Committed to 0.92 branch. Thanks for the patch Chunhui.

          Show
          stack added a comment - Committed to 0.92 branch. Thanks for the patch Chunhui.
          Hide
          stack added a comment -

          Thanks Chunhui. Will apply after 0.92.1 leaves the gate so its there for 0.92.2 (unless 0.92.1RC0 gets squashed and then I'll include it).

          Show
          stack added a comment - Thanks Chunhui. Will apply after 0.92.1 leaves the gate so its there for 0.92.2 (unless 0.92.1RC0 gets squashed and then I'll include it).
          Hide
          chunhui shen added a comment -

          patchv11 for 0.92

          Show
          chunhui shen added a comment - patchv11 for 0.92
          Hide
          ramkrishna.s.vasudevan added a comment -

          So this one went in.
          Thanks Stack, Ted and mainly Chunhui for pursuing on this.
          The 0.90 patch that Chunhui gave in HBASE-5179 is running in our test clusters.

          Show
          ramkrishna.s.vasudevan added a comment - So this one went in. Thanks Stack, Ted and mainly Chunhui for pursuing on this. The 0.90 patch that Chunhui gave in HBASE-5179 is running in our test clusters.
          Hide
          stack added a comment -

          The trunk patch went into 0.94. I committed it. So, just need patch for 0.92.2 now.

          Show
          stack added a comment - The trunk patch went into 0.94. I committed it. So, just need patch for 0.92.2 now.
          Hide
          stack added a comment -

          @Lars Yeah. Let me try this trunk patch. Its good for master joining a cluster w/ concurrent crashing regionservers.

          Show
          stack added a comment - @Lars Yeah. Let me try this trunk patch. Its good for master joining a cluster w/ concurrent crashing regionservers.
          Hide
          Lars Hofhansl added a comment -

          Stack, should I apply to 0.94?

          Show
          Lars Hofhansl added a comment - Stack, should I apply to 0.94?
          Hide
          stack added a comment -

          I committed to trunk. Can you make a version for 0.92 please Chunhui? The trunk patch does not seem to apply. Thank you.

          Show
          stack added a comment - I committed to trunk. Can you make a version for 0.92 please Chunhui? The trunk patch does not seem to apply. Thank you.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518153/HBASE-5270v11.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestImportTsv

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1172//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1172//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1172//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518153/HBASE-5270v11.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1172//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1172//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1172//console This message is automatically generated.
          Hide
          chunhui shen added a comment -

          I have submited patch here -->HBASE-5270v11.patch

          I only add some notes for joinCluster(), change some comments explaining what deadNotExpired servers are in ServerManager, and fixing spaces around '='.

          Show
          chunhui shen added a comment - I have submited patch here --> HBASE-5270 v11.patch I only add some notes for joinCluster(), change some comments explaining what deadNotExpired servers are in ServerManager, and fixing spaces around '='.
          Hide
          stack added a comment -

          What did you change Chunhui? And rather, could you attach the patch here please. Thanks.

          Show
          stack added a comment - What did you change Chunhui? And rather, could you attach the patch here please. Thanks.
          Hide
          chunhui shen added a comment -

          minor items addressed in patchv11
          https://reviews.apache.org/r/4021/

          Show
          chunhui shen added a comment - minor items addressed in patchv11 https://reviews.apache.org/r/4021/
          Hide
          stack added a comment -

          It is called by one of test cases.

          Ok. Its package private so that is good. Could add a comment that it is for tests if you are making another patch.

          I'd be fine w/ the other stuff in a new issue.

          Want to upload new patch w/ minor items addressed, add comments and fixing spaces around '=', etc., and we can commit.

          I still think this all needs refactor but this change at least has an accompanying test proving it fixes a particular circumstance (Ted also tested the circumstance above).

          Good stuff Chunhui

          Show
          stack added a comment - It is called by one of test cases. Ok. Its package private so that is good. Could add a comment that it is for tests if you are making another patch. I'd be fine w/ the other stuff in a new issue. Want to upload new patch w/ minor items addressed, add comments and fixing spaces around '=', etc., and we can commit. I still think this all needs refactor but this change at least has an accompanying test proving it fixes a particular circumstance (Ted also tested the circumstance above). Good stuff Chunhui
          Hide
          chunhui shen added a comment -

          Who needs this:void joinCluster()

          It is called by one of test cases.

          What happens if when we call expireDeadNotExpiredServers, the server has already been expired?

          If the server has already been expired, it will do nothing in current logic because in ServerManager#expireServer

          if (!this.onlineServers.containsKey(serverName)){return;}

          We should call expireDeadNotExpiredServers only after master set serverShutdownHandlerEnabled now. So we could needn't check if serverShutdownHandlerEnabled is true or not.

          we do serverShutdownHandlerEnabled = true;this.serverManager.expireDeadNotExpiredServers();
          to enable ServerShutdownHandler and expire some servers who has been dead but not be expired by ServerManager.

          At last, we do the above in another issue ?

          Show
          chunhui shen added a comment - Who needs this:void joinCluster() It is called by one of test cases. What happens if when we call expireDeadNotExpiredServers, the server has already been expired? If the server has already been expired, it will do nothing in current logic because in ServerManager#expireServer if (! this .onlineServers.containsKey(serverName)){ return ;} We should call expireDeadNotExpiredServers only after master set serverShutdownHandlerEnabled now. So we could needn't check if serverShutdownHandlerEnabled is true or not. we do serverShutdownHandlerEnabled = true;this.serverManager.expireDeadNotExpiredServers(); to enable ServerShutdownHandler and expire some servers who has been dead but not be expired by ServerManager. At last, we do the above in another issue ?
          Hide
          stack added a comment -

          @Chunhui Sorry. I thought I'd provided comments already on this issue. It seems like my comments were lost by me.

          So, to me its odd that we disable one handler only; i.e. serverShutdownHandlerEnabled

          If a split or a close or an open come in while master is initializing, won't that mess up our view of the cluster? Why not turn all all handlers on initialization queuing tasks?

          I would tolerate our doing this in another issue if thats what you think we should do.

          Who needs this:

          
          +  void joinCluster() throws IOException, KeeperException, InterruptedException {
          

          I think you need to add comments explaining what deadNotExpired servers are. Its a weird concept and folks can go read over where they are defined but it could help if up in HMaster and AssingmentManager where they are used, you explain what they are to help those coming after trying to figure whats going on. Maybe point at the explaination over in ServerManager?

          +    Iterator<ServerName> serverIterator=deadNotExpiredServers.iterator();
          

          Notice how there are spaces around operators in all of the surrounding code.

          What happens if when we call expireDeadNotExpiredServers, the server has already been expired?

          Why do the following:

          +    serverShutdownHandlerEnabled = true;
          +    this.serverManager.expireDeadNotExpiredServers();
          

          .. then in expireDeadNotExpiredServers, first thing we do is check if serverShutdownHandlerEnabled is true or not. Under what cirumstance would it not be true if we just set it? This is only place we call expireDeadNotExpiredServers.

          In general I'm in favor of committing this patch because it has a test. Thanks Chunhui.

          Show
          stack added a comment - @Chunhui Sorry. I thought I'd provided comments already on this issue. It seems like my comments were lost by me. So, to me its odd that we disable one handler only; i.e. serverShutdownHandlerEnabled If a split or a close or an open come in while master is initializing, won't that mess up our view of the cluster? Why not turn all all handlers on initialization queuing tasks? I would tolerate our doing this in another issue if thats what you think we should do. Who needs this: + void joinCluster() throws IOException, KeeperException, InterruptedException { I think you need to add comments explaining what deadNotExpired servers are. Its a weird concept and folks can go read over where they are defined but it could help if up in HMaster and AssingmentManager where they are used, you explain what they are to help those coming after trying to figure whats going on. Maybe point at the explaination over in ServerManager? + Iterator<ServerName> serverIterator=deadNotExpiredServers.iterator(); Notice how there are spaces around operators in all of the surrounding code. What happens if when we call expireDeadNotExpiredServers, the server has already been expired? Why do the following: + serverShutdownHandlerEnabled = true ; + this .serverManager.expireDeadNotExpiredServers(); .. then in expireDeadNotExpiredServers, first thing we do is check if serverShutdownHandlerEnabled is true or not. Under what cirumstance would it not be true if we just set it? This is only place we call expireDeadNotExpiredServers. In general I'm in favor of committing this patch because it has a test. Thanks Chunhui.
          Hide
          chunhui shen added a comment -

          @Stack:
          Could you see this issue with latest patch again.
          Thanks.

          Show
          chunhui shen added a comment - @Stack: Could you see this issue with latest patch again. Thanks.
          Hide
          chunhui shen added a comment -

          Thanks Ted for the tests.

          Show
          chunhui shen added a comment - Thanks Ted for the tests.
          Hide
          Ted Yu added a comment -

          Backported patch v10 to 0.92
          All unit tests passed.

          I then performed the following test on a 5 region server cluster:
          1. find the RS hosting ROOT: sea-lab-3
          2. ran 'bin/rolling-restart.sh' on master
          3. when I saw the following output I kill -9'ed RS on sea-lab-3:

          starting master, logging to /apache/hbase/bin/../logs/hbase-hduser-master-sea-lab-0.out
          Wait a minute for master to come up join cluster
          

          4. master came up after sometime - ROOT got served by sea-lab-5
          5. ROOT went through one more transition when sea-lab-5 got restarted.

          Cluster is in stable state afterwards.

          Show
          Ted Yu added a comment - Backported patch v10 to 0.92 All unit tests passed. I then performed the following test on a 5 region server cluster: 1. find the RS hosting ROOT: sea-lab-3 2. ran 'bin/rolling-restart.sh' on master 3. when I saw the following output I kill -9'ed RS on sea-lab-3: starting master, logging to /apache/hbase/bin/../logs/hbase-hduser-master-sea-lab-0.out Wait a minute for master to come up join cluster 4. master came up after sometime - ROOT got served by sea-lab-5 5. ROOT went through one more transition when sea-lab-5 got restarted. Cluster is in stable state afterwards.
          Hide
          Ted Yu added a comment -

          @Stack:
          Can you provide your comments ?

          Show
          Ted Yu added a comment - @Stack: Can you provide your comments ?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517050/hbase-5270v10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -129 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1100//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1100//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1100//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517050/hbase-5270v10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -129 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1100//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1100//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1100//console This message is automatically generated.
          Hide
          chunhui shen added a comment -

          Patch v10 introduces Ted's review comment on r/4021.(Pass the failed tests also)
          For the logic, I think it's healthy now. Waiting for other suggestions.

          Show
          chunhui shen added a comment - Patch v10 introduces Ted's review comment on r/4021.(Pass the failed tests also) For the logic, I think it's healthy now. Waiting for other suggestions.
          Hide
          chunhui shen added a comment -

          @Ted
          Thanks for the review on r/4021. I will modify it later including the failed tests.

          For one of the points that when to allow processing of ServerShutdownHandler, complete initialized or complete assignROOTandMETA?

          As the stack's comment - 29/Feb/12 19:18, I think we should allow processing of ServerShutdownHandler after complete assignROOTandMETA to fix the hole case that metaserver died between assignROOTandMETA and initialized. In such a hole case, master will wait for meta when assign user regions, but SSH for meta server is prevented before initialized, so one confliction will happen.

          Show
          chunhui shen added a comment - @Ted Thanks for the review on r/4021. I will modify it later including the failed tests. For one of the points that when to allow processing of ServerShutdownHandler, complete initialized or complete assignROOTandMETA? As the stack's comment - 29/Feb/12 19:18, I think we should allow processing of ServerShutdownHandler after complete assignROOTandMETA to fix the hole case that metaserver died between assignROOTandMETA and initialized. In such a hole case, master will wait for meta when assign user regions, but SSH for meta server is prevented before initialized, so one confliction will happen.
          Hide
          Ted Yu added a comment -

          I put some review on r/4021.

          Show
          Ted Yu added a comment - I put some review on r/4021.
          Hide
          chunhui shen added a comment -

          https://reviews.apache.org/r/4021/
          Review request for the new version patch

          Show
          chunhui shen added a comment - https://reviews.apache.org/r/4021/ Review request for the new version patch
          Hide
          chunhui shen added a comment -

          I'm sorry. Let's continue discussing in this jira.

          stack added a comment - 01/Mar/12 17:45 Is this patch right? You have a disablessh flag. What about other callbacks that could come in during the initialization? I don't see where you queue up callbacks for processing post-initialization.

          I don't change the logic of prevent processing of SSH, but change the time of enabling SSH from initialized to finished assigning ROOTandMETA because the following hole:

          Are there still holes? For example, say the .META. server crashes AFTER we've verified it up in assignRootAndMeta but before we get to do a scan of .META. to rebuild user regions list. Could .META. be assigned w/o log splitting finishing?

          Show
          chunhui shen added a comment - I'm sorry. Let's continue discussing in this jira. stack added a comment - 01/Mar/12 17:45 Is this patch right? You have a disablessh flag. What about other callbacks that could come in during the initialization? I don't see where you queue up callbacks for processing post-initialization. I don't change the logic of prevent processing of SSH, but change the time of enabling SSH from initialized to finished assigning ROOTandMETA because the following hole: Are there still holes? For example, say the .META. server crashes AFTER we've verified it up in assignRootAndMeta but before we get to do a scan of .META. to rebuild user regions list. Could .META. be assigned w/o log splitting finishing?
          Hide
          stack added a comment -

          I have created a new issue and submit the new patch, what about move discussion to HBASE-5501 since this one has too long comments

          That the issue is long is not a good reason to open new issue. We need to keep the decisions and discussion in one place. It makes it easier on the folks who are following behind us to make sense of why we did what. Would suggest you close hbase-5501.

          Show
          stack added a comment - I have created a new issue and submit the new patch, what about move discussion to HBASE-5501 since this one has too long comments That the issue is long is not a good reason to open new issue. We need to keep the decisions and discussion in one place. It makes it easier on the folks who are following behind us to make sense of why we did what. Would suggest you close hbase-5501.
          Hide
          chunhui shen added a comment -

          I have created a new issue and submit the new patch, what about move discussion to HBASE-5501 since this one has too long comments

          Show
          chunhui shen added a comment - I have created a new issue and submit the new patch, what about move discussion to HBASE-5501 since this one has too long comments
          Hide
          stack added a comment -

          Also, do you need to make a new version of this patch now hbase-5454 has gone in? Thanks.

          Show
          stack added a comment - Also, do you need to make a new version of this patch now hbase-5454 has gone in? Thanks.
          Hide
          stack added a comment -

          I have thought this issue for a long time, and I think preventing processing of SSH is a clear and simple solution, otherwise we should consider many cases where meta server died in different time during master initializing.

          Would we do the above as part of another issue Chunhui?

          Show
          stack added a comment - I have thought this issue for a long time, and I think preventing processing of SSH is a clear and simple solution, otherwise we should consider many cases where meta server died in different time during master initializing. Would we do the above as part of another issue Chunhui?
          Hide
          ramkrishna.s.vasudevan added a comment -

          I think preventing processing of SSH is a clear and simple solution

          I too think it is a good and simple idea.

          Show
          ramkrishna.s.vasudevan added a comment - I think preventing processing of SSH is a clear and simple solution I too think it is a good and simple idea.
          Hide
          chunhui shen added a comment -

          @stack
          I think we could introduce initializing state first for HBASE-5454.

          Are there still holes? For example, say the .META. server crashes AFTER we've verified it up in assignRootAndMeta but before we get to do a scan of .META. to rebuild user regions list. Could .META. be assigned w/o log splitting finishing?

          Yes.it's another a hole, but it's easy to solve, we could stop the processing of expired servers until master finished assign ROOT and META not initialized.

          I have thought this issue for a long time, and I think preventing processing of SSH is a clear and simple solution, otherwise we should consider many cases where meta server died in different time during master initializing.

          Show
          chunhui shen added a comment - @stack I think we could introduce initializing state first for HBASE-5454 . Are there still holes? For example, say the .META. server crashes AFTER we've verified it up in assignRootAndMeta but before we get to do a scan of .META. to rebuild user regions list. Could .META. be assigned w/o log splitting finishing? Yes.it's another a hole, but it's easy to solve, we could stop the processing of expired servers until master finished assign ROOT and META not initialized. I have thought this issue for a long time, and I think preventing processing of SSH is a clear and simple solution, otherwise we should consider many cases where meta server died in different time during master initializing.
          Hide
          stack added a comment -

          @Prakash

          .... then the next step, that you outlined in your scenario, cannot be allowed

          How should we do this boss?

          The problem you are outlining is probably still there but the scenario has to be refined.

          What should I add? If we allow that the split could take a long time, its possible that on entry to the log splitting the server was good but by the end it could have gone AWOL.

          And then it should initialize everything based on that knowledge which must not change during initialization.

          I think the root issue is that it needs to scan .META. and ROOT as part of the startup; they need to be assigned and up w/ all edits in place. Thats whats proving to be a little tough to ensure.

          (Thanks for the review P).

          Show
          stack added a comment - @Prakash .... then the next step, that you outlined in your scenario, cannot be allowed How should we do this boss? The problem you are outlining is probably still there but the scenario has to be refined. What should I add? If we allow that the split could take a long time, its possible that on entry to the log splitting the server was good but by the end it could have gone AWOL. And then it should initialize everything based on that knowledge which must not change during initialization. I think the root issue is that it needs to scan .META. and ROOT as part of the startup; they need to be assigned and up w/ all edits in place. Thats whats proving to be a little tough to ensure. (Thanks for the review P).
          Hide
          Prakash Khemani added a comment -

          @Stack

          If we presume that the list of servers that joinClusters received contains
          the server hosting .META., then the next step, that you outlined in your
          scenario, cannot be allowed. If we are splitting logs for .META. then we
          have determined that meta-server was not running and therefore it cannot
          be taking edits. The problem you are outlining is probably still there but
          the scenario has to be refined.

          Anyway my point was - at startup master should determine once what servers
          are up and what are not. This should include whether ROOT and META are
          assigned or not. And then it should initialize everything based on that
          knowledge which must not change during initialization. Anything that
          changes during initialization should be taken care of by the normal
          Server-handlers. But I have to admit, I don't understand the assignment
          complexities very well Š I will read up some more.

          Show
          Prakash Khemani added a comment - @Stack If we presume that the list of servers that joinClusters received contains the server hosting .META., then the next step, that you outlined in your scenario, cannot be allowed. If we are splitting logs for .META. then we have determined that meta-server was not running and therefore it cannot be taking edits. The problem you are outlining is probably still there but the scenario has to be refined. Anyway my point was - at startup master should determine once what servers are up and what are not. This should include whether ROOT and META are assigned or not. And then it should initialize everything based on that knowledge which must not change during initialization. Anything that changes during initialization should be taken care of by the normal Server-handlers. But I have to admit, I don't understand the assignment complexities very well Š I will read up some more.
          Hide
          stack added a comment -

          @Chunhui You have a new method splitLogIfOnline which will split the log if the server was online. Why do you not expire the server? (You remove the expireIfOnline method).

          Now we have this initializing state, do you think we should also stop the processing of expired servers during this startup phase and instead queue them up for processing after the master is up? Could do that in another issue maybe since this issue has been going on too long and your patch is at least an improvement on what we currently have (This startup sequence needs a big refactor IMO – it is way too complicated figuring the sequence in which stuff runs).

          Are there still holes? For example, say the .META. server crashes AFTER we've verified it up in assignRootAndMeta but before we get to do a scan of .META. to rebuild user regions list. Could .META. be assigned w/o log splitting finishing? (I don't think so... .META. would be offline until the servershutdown handler ran and it would first split logs).

          Good stuff.

          Show
          stack added a comment - @Chunhui You have a new method splitLogIfOnline which will split the log if the server was online. Why do you not expire the server? (You remove the expireIfOnline method). Now we have this initializing state, do you think we should also stop the processing of expired servers during this startup phase and instead queue them up for processing after the master is up? Could do that in another issue maybe since this issue has been going on too long and your patch is at least an improvement on what we currently have (This startup sequence needs a big refactor IMO – it is way too complicated figuring the sequence in which stuff runs). Are there still holes? For example, say the .META. server crashes AFTER we've verified it up in assignRootAndMeta but before we get to do a scan of .META. to rebuild user regions list. Could .META. be assigned w/o log splitting finishing? (I don't think so... .META. would be offline until the servershutdown handler ran and it would first split logs). Good stuff.
          Hide
          stack added a comment -

          @Prakash

          Presume we pass list from splitLogAfterStartup to joinClusters as you suggest and presume list of servers included the server that had been hosting .META.

          Allow that during or just after splitLogAfterStartup, .META. server 'crashes' – it becomes unresponsive. Also allow that somehow, just before it hung up, during a long running log split, .META. took on a couple of edits saying regions A, B, and C had split.

          In assignRootAndMeta, we'll notice the unresponsiveness, force the expiration of the server that was carrying .META. (this will queue a ServerShutdownHandler but will not wait on its completion), and we'll then reassign of .META. Its very likely that .META. will go to one of the other 'good' servers. Its also likely that the SSH will not have completed its processing before this assign happens. Thus, on deploy, the .META. will be missing the above A, B, and C split edits.

          Show
          stack added a comment - @Prakash Presume we pass list from splitLogAfterStartup to joinClusters as you suggest and presume list of servers included the server that had been hosting .META. Allow that during or just after splitLogAfterStartup, .META. server 'crashes' – it becomes unresponsive. Also allow that somehow, just before it hung up, during a long running log split, .META. took on a couple of edits saying regions A, B, and C had split. In assignRootAndMeta, we'll notice the unresponsiveness, force the expiration of the server that was carrying .META. (this will queue a ServerShutdownHandler but will not wait on its completion), and we'll then reassign of .META. Its very likely that .META. will go to one of the other 'good' servers. Its also likely that the SSH will not have completed its processing before this assign happens. Thus, on deploy, the .META. will be missing the above A, B, and C split edits.
          Hide
          chunhui shen added a comment -

          @Prakash
          In a live cluster, do the following step
          1.kill the master;
          1.start the master, and master is initializing;
          3.master complete splitLog
          4.kill the META server
          5.master start assigning ROOT and META
          6.Now meta region data will loss, because we can't ensure META server's log is split caused by step 4.

          Show
          chunhui shen added a comment - @Prakash In a live cluster, do the following step 1.kill the master; 1.start the master, and master is initializing; 3.master complete splitLog 4.kill the META server 5.master start assigning ROOT and META 6.Now meta region data will loss, because we can't ensure META server's log is split caused by step 4.
          Hide
          Prakash Khemani added a comment -

          Assuming that the master uses the saved region-server list in joinCluster,
          can you then please outline the scenario where problems can still happen?
          There is some handling of META and ROOT not being available in
          ServerShutdownHandler and I am wondering why that is not sufficient.

          On 2/27/12 11:17 PM, "chunhui shen (Commented) (JIRA)" <jira@apache.org>

          Show
          Prakash Khemani added a comment - Assuming that the master uses the saved region-server list in joinCluster, can you then please outline the scenario where problems can still happen? There is some handling of META and ROOT not being available in ServerShutdownHandler and I am wondering why that is not sufficient. On 2/27/12 11:17 PM, "chunhui shen (Commented) (JIRA)" <jira@apache.org>
          Hide
          chunhui shen added a comment -

          I feel that it might be lot simpler if master saves the list of region-servers that it had given to splitLogAfterStartup(), and later uses the same list for rebuilding user regions. That should fix this issue, won't it?

          Yes, it's right in most case. But if the server died who carrying ROOT or META during master initializing, it's another problem. So we should fix these two case.

          Show
          chunhui shen added a comment - I feel that it might be lot simpler if master saves the list of region-servers that it had given to splitLogAfterStartup(), and later uses the same list for rebuilding user regions. That should fix this issue, won't it? Yes, it's right in most case. But if the server died who carrying ROOT or META during master initializing, it's another problem. So we should fix these two case.
          Hide
          Prakash Khemani added a comment -

          (I haven't read through the comments carefully and I am sorry for the noise if I am way off the mark)

          The problem as I see it is that the Master's understanding of which region servers are online changes from the time that it calls splitLogAfterStartup() to the time it calls rebuildUserRegions() in joinCluster().

          I feel that it might be lot simpler if master saves the list of region-servers that it had given to splitLogAfterStartup(), and later uses the same list for rebuilding user regions. That should fix this issue, won't it?

          Show
          Prakash Khemani added a comment - (I haven't read through the comments carefully and I am sorry for the noise if I am way off the mark) The problem as I see it is that the Master's understanding of which region servers are online changes from the time that it calls splitLogAfterStartup() to the time it calls rebuildUserRegions() in joinCluster(). I feel that it might be lot simpler if master saves the list of region-servers that it had given to splitLogAfterStartup(), and later uses the same list for rebuilding user regions. That should fix this issue, won't it?
          Hide
          chunhui shen added a comment -

          @Ted
          I has modified as the above in patchv8.

          Show
          chunhui shen added a comment - @Ted I has modified as the above in patchv8.
          Hide
          Ted Yu added a comment -

          Please remove space after the dot:

          +    mfs. splitLogAfterStartup(sm.getOnlineServers().keySet());
          

          I see the following code in several methods:

          +    if (isInitializing()) {
          +      throw new PleaseHoldException("Master is initializing");
          +    }
          

          Does creating a new method wrapping the above code make sense ?

          Show
          Ted Yu added a comment - Please remove space after the dot: + mfs. splitLogAfterStartup(sm.getOnlineServers().keySet()); I see the following code in several methods: + if (isInitializing()) { + throw new PleaseHoldException( "Master is initializing" ); + } Does creating a new method wrapping the above code make sense ?
          Hide
          Ted Yu added a comment -
          + * of the first instance, or when master is initializing and client call admin's
          + * operations
          

          should read:

          + * of the first instance, or when master is initializing and client calls admin
          + * operations
          

          Please fill javadoc for the following method:

          +  public RegionServerTracker createRegionServerTracker(final ZooKeeperWatcher zkw,
          

          Is the above used for testing ? I don't see it called in other classes.

          Show
          Ted Yu added a comment - + * of the first instance, or when master is initializing and client call admin's + * operations should read: + * of the first instance, or when master is initializing and client calls admin + * operations Please fill javadoc for the following method: + public RegionServerTracker createRegionServerTracker( final ZooKeeperWatcher zkw, Is the above used for testing ? I don't see it called in other classes.
          Hide
          Ted Yu added a comment -

          @Chunhui:
          Can you upload patch v7 onto review board ?

          A test run through Hadoop QA would be helpful as well.

          Show
          Ted Yu added a comment - @Chunhui: Can you upload patch v7 onto review board ? A test run through Hadoop QA would be helpful as well.
          Hide
          chunhui shen added a comment -

          In v7 patch, use PleaseHoldException.

          Show
          chunhui shen added a comment - In v7 patch, use PleaseHoldException.
          Hide
          chunhui shen added a comment -

          Do you think we should check to see if we have already split this server's log for the case where the server was carrying root and meta?

          I think it's not a problem, server's log dir will be deleted after split, so the second split will do nothing. Of course, we could do the check to prevent unnecessary call splitLogIfOnline.

          Thanks stack for the review, I will make a new patch as the above advice.

          Show
          chunhui shen added a comment - Do you think we should check to see if we have already split this server's log for the case where the server was carrying root and meta? I think it's not a problem, server's log dir will be deleted after split, so the second split will do nothing. Of course, we could do the check to prevent unnecessary call splitLogIfOnline. Thanks stack for the review, I will make a new patch as the above advice.
          Hide
          stack added a comment -

          @Ted Yes. We can keep the prefix and change the rest of the sentence to be more generic. If Chunhui reuses it here, it'll be an exception the master throws when they want the client to come back in a while.

          Show
          stack added a comment - @Ted Yes. We can keep the prefix and change the rest of the sentence to be more generic. If Chunhui reuses it here, it'll be an exception the master throws when they want the client to come back in a while.
          Hide
          Ted Yu added a comment -

          If we reuse PleaseHoldException, the javadoc for that exception should be modified:

           * This exception is thrown by the master when a region server was shut down
           * and restarted so fast that the master still hasn't processed the server
           * shutdown of the first instance.
          
          Show
          Ted Yu added a comment - If we reuse PleaseHoldException, the javadoc for that exception should be modified: * This exception is thrown by the master when a region server was shut down * and restarted so fast that the master still hasn't processed the server * shutdown of the first instance.
          Hide
          Jimmy Xiang added a comment -

          @Stack, I agree. I think we should reuse the existing exception if we can.

          Show
          Jimmy Xiang added a comment - @Stack, I agree. I think we should reuse the existing exception if we can.
          Hide
          stack added a comment -

          Do you think we should check to see if we have already split this server's log for the case where the server was carrying root and meta?

          +      splitLogIfOnline(currentMetaServer);
          

          Or will the above call become a noop because we just split it before we assignedroot?

          Is this a 'safe mode' or is it the master 'initializing'? I think 'safe mode' makes folks think of hdfs. It is a little similar in that master is trying to make sense of the cluster but initializing might be a better name for this state.

          BTW, I think this is an improvement over previous versions of this patch. Its easier to reason about. Good stuff Chunhui.

          Make a method and put this duplicated code into it and call it from the two places its repeated:

          +    if (!deadNotExpiredServers.isEmpty()) {
          +      for (final ServerName server : deadNotExpiredServers) {
          +        LOG.debug("Removing dead but not expired server: " + server
          +            + " from eligible server pool.");
          +        servers.remove(server);
          +      }
          +    }
          

          Fix this bit of javadoc '... but not are expired now.'

          You don't need this:

          + * Copyright 2007 The Apache Software Foundation
          

          I think MasterInSafeModeException becomes MasterInitializingException?

          Good stuff Chunhui

          Regards Jimmy's comment:

          Instead of introducing safe mode, can we add something to the RPC server and don't allow it to sever traffic before the actual server is ready, for example, fully initialized?

          We have a ServerNotRunningYetException down in the ipc. Its thrown by HBaseServer if RPC has not started yet. It seems a little related to this MasterInitializing. We also have a PleaseHoldException. Perhaps the Master should throw this instead of the MasterInitializing? We'd throw a PleaseHoldException and the message would be detail that the master is initializing?

          Show
          stack added a comment - Do you think we should check to see if we have already split this server's log for the case where the server was carrying root and meta? + splitLogIfOnline(currentMetaServer); Or will the above call become a noop because we just split it before we assignedroot? Is this a 'safe mode' or is it the master 'initializing'? I think 'safe mode' makes folks think of hdfs. It is a little similar in that master is trying to make sense of the cluster but initializing might be a better name for this state. BTW, I think this is an improvement over previous versions of this patch. Its easier to reason about. Good stuff Chunhui. Make a method and put this duplicated code into it and call it from the two places its repeated: + if (!deadNotExpiredServers.isEmpty()) { + for ( final ServerName server : deadNotExpiredServers) { + LOG.debug( "Removing dead but not expired server: " + server + + " from eligible server pool." ); + servers.remove(server); + } + } Fix this bit of javadoc '... but not are expired now.' You don't need this: + * Copyright 2007 The Apache Software Foundation I think MasterInSafeModeException becomes MasterInitializingException? Good stuff Chunhui Regards Jimmy's comment: Instead of introducing safe mode, can we add something to the RPC server and don't allow it to sever traffic before the actual server is ready, for example, fully initialized? We have a ServerNotRunningYetException down in the ipc. Its thrown by HBaseServer if RPC has not started yet. It seems a little related to this MasterInitializing. We also have a PleaseHoldException. Perhaps the Master should throw this instead of the MasterInitializing? We'd throw a PleaseHoldException and the message would be detail that the master is initializing?
          Hide
          chunhui shen added a comment -

          @stack
          Could you take a look about introducing safemode to delay SSH after master is initialized.
          I think this solution is more easier for the issue.

          Show
          chunhui shen added a comment - @stack Could you take a look about introducing safemode to delay SSH after master is initialized. I think this solution is more easier for the issue.
          Hide
          chunhui shen added a comment -

          don't allow it to sever traffic before the actual server is ready.

          I think it's inconvenient. For example, before fully initialized, we need to allow RegionserverReport but don't allow admin's operation.Also, Server death is found through ZK not RPC.

          Show
          chunhui shen added a comment - don't allow it to sever traffic before the actual server is ready. I think it's inconvenient. For example, before fully initialized, we need to allow RegionserverReport but don't allow admin's operation.Also, Server death is found through ZK not RPC.
          Hide
          Jimmy Xiang added a comment -

          Instead of introducing safe mode, can we add something to the RPC server and don't allow it to sever traffic before the actual server is ready, for example, fully initialized?

          Show
          Jimmy Xiang added a comment - Instead of introducing safe mode, can we add something to the RPC server and don't allow it to sever traffic before the actual server is ready, for example, fully initialized?
          Hide
          chunhui shen added a comment -

          I‘m sorry for the mistake of ConcurrentHashSet.
          Thank Ted.

          Show
          chunhui shen added a comment - I‘m sorry for the mistake of ConcurrentHashSet. Thank Ted.
          Hide
          Ted Yu added a comment -

          The compilation error was caused by the following in src/main/java/org/apache/hadoop/hbase/master/ServerManager.java:

          +import org.apache.mina.util.ConcurrentHashSet;
          

          See http://www.onkarjoshi.com/blog/201/concurrenthashset-in-java-from-concurrenthashmap/ for how to get a ConcurrentHashSet.

          Show
          Ted Yu added a comment - The compilation error was caused by the following in src/main/java/org/apache/hadoop/hbase/master/ServerManager.java: + import org.apache.mina.util.ConcurrentHashSet; See http://www.onkarjoshi.com/blog/201/concurrenthashset-in-java-from-concurrenthashmap/ for how to get a ConcurrentHashSet.
          Hide
          chunhui shen added a comment -

          I build project again and didn't find any compilation error.

          [ERROR] /home/hduser/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java:[55,27] package org.apache.mina.util does not exist

          why package org.apache.mina.util ? Is there any mistake?

          Show
          chunhui shen added a comment - I build project again and didn't find any compilation error. [ERROR] /home/hduser/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java: [55,27] package org.apache.mina.util does not exist why package org.apache.mina.util ? Is there any mistake?
          Hide
          Ted Yu added a comment -

          There seems to be some compilation error:

          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project hbase: Compilation failure: Compilation failure:
          [ERROR] /home/hduser/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java:[55,27] package org.apache.mina.util does not exist
          ...
          [ERROR] /home/hduser/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java:[107,54] cannot find symbol
          [ERROR] symbol  : class ConcurrentHashSet
          [ERROR] location: class org.apache.hadoop.hbase.master.ServerManager
          
          Show
          Ted Yu added a comment - There seems to be some compilation error: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile ( default -compile) on project hbase: Compilation failure: Compilation failure: [ERROR] /home/hduser/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java:[55,27] package org.apache.mina.util does not exist ... [ERROR] /home/hduser/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java:[107,54] cannot find symbol [ERROR] symbol : class ConcurrentHashSet [ERROR] location: class org.apache.hadoop.hbase.master.ServerManager
          Hide
          chunhui shen added a comment -

          @Ted
          I has created the Review Request:
          https://reviews.apache.org/r/4021/

          Thank you.

          Show
          chunhui shen added a comment - @Ted I has created the Review Request: https://reviews.apache.org/r/4021/ Thank you.
          Hide
          Ted Yu added a comment -

          I was able to create new request.
          Select hbase for Repository.
          Enter '/' for Base Directory.

          Leave Bugs field blank.
          Enter hbase to Groups field.

          Show
          Ted Yu added a comment - I was able to create new request. Select hbase for Repository. Enter '/' for Base Directory. Leave Bugs field blank. Enter hbase to Groups field.
          Hide
          chunhui shen added a comment -

          I can't add review request, it throws error:The file 'https://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java' (r1292711) could not be found in the repository
          why?

          Show
          chunhui shen added a comment - I can't add review request, it throws error:The file 'https://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java' (r1292711) could not be found in the repository why?
          Hide
          chunhui shen added a comment -

          @Ted
          I submit patch v5.

          So a server could be in both deadNotExpiredServers and deadservers ? I don't see return statement in the if block.

          I'm sorry I make a mistake to miss return statement in the if block.

          Also we check that we're not in safe mode in expireDelayedServers().

          And master is in safe mode only when it is initializing now.

          Show
          chunhui shen added a comment - @Ted I submit patch v5. So a server could be in both deadNotExpiredServers and deadservers ? I don't see return statement in the if block. I'm sorry I make a mistake to miss return statement in the if block. Also we check that we're not in safe mode in expireDelayedServers(). And master is in safe mode only when it is initializing now.
          Hide
          Ted Yu added a comment -
             public synchronized void expireServer(final ServerName serverName) {
          +    if (services.isSafeMode()) {
          +      LOG.info("Master is in safe mode, delay expiring server " + serverName);
          +      this.deadNotExpiredServers.add(serverName);
          +    }
          

          So a server could be in both deadNotExpiredServers and deadservers ? I don't see return statement in the if block.

          In expireDelayedServers(), should we check that we're not in safe mode ?

          I recommend creating a review on review board. See an example in my first comment of this JIRA.

          Show
          Ted Yu added a comment - public synchronized void expireServer( final ServerName serverName) { + if (services.isSafeMode()) { + LOG.info( "Master is in safe mode, delay expiring server " + serverName); + this .deadNotExpiredServers.add(serverName); + } So a server could be in both deadNotExpiredServers and deadservers ? I don't see return statement in the if block. In expireDelayedServers(), should we check that we're not in safe mode ? I recommend creating a review on review board. See an example in my first comment of this JIRA.
          Hide
          chunhui shen added a comment -

          @Ted

          safe mode would make cluster startup longer, especially after a critical issue caused cluster shutdown.

          I think it just make some admin operations unavailable during safe mode, but not affect data read and write service.
          It will make SSH longer, but it's a small probability event, where server died during master is initializing.

          Show
          chunhui shen added a comment - @Ted safe mode would make cluster startup longer, especially after a critical issue caused cluster shutdown. I think it just make some admin operations unavailable during safe mode, but not affect data read and write service. It will make SSH longer, but it's a small probability event, where server died during master is initializing.
          Hide
          chunhui shen added a comment -

          I define that master is in safe mode if it is stopping or initializing.

          If a region server die during master's safe mode, ServerManager will add ServerName to a set(Set<ServerName> deadNotExpiredServers),but not expire it until master is initialized。

          So if it is a server which carry META or ROOT, we will split its log when assigning RootAndMeta.

          Also, when assigning regions , we will remove this dead server from destinations.

          Show
          chunhui shen added a comment - I define that master is in safe mode if it is stopping or initializing. If a region server die during master's safe mode, ServerManager will add ServerName to a set(Set<ServerName> deadNotExpiredServers),but not expire it until master is initialized。 So if it is a server which carry META or ROOT, we will split its log when assigning RootAndMeta. Also, when assigning regions , we will remove this dead server from destinations.
          Hide
          Ted Yu added a comment -

          @Chunhui:
          Have you verified patch v4 in real, decent sized cluster ?
          My concern is that the safe mode would make cluster startup longer, especially after a critical issue caused cluster shutdown.

          Show
          Ted Yu added a comment - @Chunhui: Have you verified patch v4 in real, decent sized cluster ? My concern is that the safe mode would make cluster startup longer, especially after a critical issue caused cluster shutdown.
          Hide
          chunhui shen added a comment -

          @Stack @Ted
          In hbase-5270v4.patch,
          I introduce safe mode for master when it is stopping or initializing.
          In the safe mode, master will delay processing ServerShutdownHandler and refuse many admin operations(could see HBASE-5454).
          Through safe mode, we could ensure data security and fix this issue much easier.
          Could you review it again.
          Thanks.

          Show
          chunhui shen added a comment - @Stack @Ted In hbase-5270v4.patch, I introduce safe mode for master when it is stopping or initializing. In the safe mode, master will delay processing ServerShutdownHandler and refuse many admin operations(could see HBASE-5454 ). Through safe mode, we could ensure data security and fix this issue much easier. Could you review it again. Thanks.
          Hide
          chunhui shen added a comment -

          Takes Stack‘s comment in 5270-90v3.patch

          Show
          chunhui shen added a comment - Takes Stack‘s comment in 5270-90v3.patch
          Hide
          Ted Yu added a comment -

          w.r.t. Chunhui's comment @ 18/Feb/12 02:52
          We shoulds correlate the 10s sleep after log splitting with the 20s sleep in test through some constant. Otherwise the test would easily break.

          Show
          Ted Yu added a comment - w.r.t. Chunhui's comment @ 18/Feb/12 02:52 We shoulds correlate the 10s sleep after log splitting with the 20s sleep in test through some constant. Otherwise the test would easily break.
          Hide
          chunhui shen added a comment -
          So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master?
          

          I'm sorry I'm wrong for the upper comment.

          if a server had root and meta, it will be expired when processing root,
          and we will not expire it a second time processing meta because the following code (metaServerInfo == null)

          +      HServerInfo metaServerInfo = this.serverManager
          +          .getHServerInfo(metaServerAddress);
          +      if (metaServerInfo != null) {
          +        HServerLoad metaServerLoad = metaServerInfo.getLoad();
          +        if (metaServerLoad != null && metaServerLoad.getNumberOfRegions() > 0
          +            && !catalogTracker.getRootLocation().equals(metaServerAddress)) {
          +          // If metaServer is online && not start just now, we expire it
          +          this.serverManager.expireServer(metaServerInfo);
          +        }
          +      }
          
          Show
          chunhui shen added a comment - So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master? I'm sorry I'm wrong for the upper comment. if a server had root and meta, it will be expired when processing root, and we will not expire it a second time processing meta because the following code (metaServerInfo == null) + HServerInfo metaServerInfo = this .serverManager + .getHServerInfo(metaServerAddress); + if (metaServerInfo != null ) { + HServerLoad metaServerLoad = metaServerInfo.getLoad(); + if (metaServerLoad != null && metaServerLoad.getNumberOfRegions() > 0 + && !catalogTracker.getRootLocation().equals(metaServerAddress)) { + // If metaServer is online && not start just now, we expire it + this .serverManager.expireServer(metaServerInfo); + } + }
          Hide
          chunhui shen added a comment -
          So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master?
          

          I'm sorry I'm wrong for the upper comment.

          if a server had root and meta, it will be expired when processing root,
          and we will not expire it a second time processing meta because the following code (metaServerInfo == null)

          +      HServerInfo metaServerInfo = this.serverManager
          +          .getHServerInfo(metaServerAddress);
          +      if (metaServerInfo != null) {
          +        HServerLoad metaServerLoad = metaServerInfo.getLoad();
          +        if (metaServerLoad != null && metaServerLoad.getNumberOfRegions() > 0
          +            && !catalogTracker.getRootLocation().equals(metaServerAddress)) {
          +          // If metaServer is online && not start just now, we expire it
          +          this.serverManager.expireServer(metaServerInfo);
          +        }
          +      }
          
          Show
          chunhui shen added a comment - So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master? I'm sorry I'm wrong for the upper comment. if a server had root and meta, it will be expired when processing root, and we will not expire it a second time processing meta because the following code (metaServerInfo == null) + HServerInfo metaServerInfo = this .serverManager + .getHServerInfo(metaServerAddress); + if (metaServerInfo != null ) { + HServerLoad metaServerLoad = metaServerInfo.getLoad(); + if (metaServerLoad != null && metaServerLoad.getNumberOfRegions() > 0 + && !catalogTracker.getRootLocation().equals(metaServerAddress)) { + // If metaServer is online && not start just now, we expire it + this .serverManager.expireServer(metaServerInfo); + } + }
          Hide
          chunhui shen added a comment -

          For the other suggestion,I will do a modify later.
          Thanks for Stack's review!

          Show
          chunhui shen added a comment - For the other suggestion,I will do a modify later. Thanks for Stack's review!
          Hide
          chunhui shen added a comment -
          +    // Remove regions in RIT, they are may being processed by the SSH.
          +    synchronized (regionsInTransition) {
          +      nodes.removeAll(regionsInTransition.keySet());
          +    }
          

          Perhaps SSH has put up something in RIT because its done an assign and here we are blanket removing them all?

          Yes, SSH and master'initializing Thread may assign the same regions, so we should do a prevent of mutli assign.

          Show
          chunhui shen added a comment - + // Remove regions in RIT, they are may being processed by the SSH. + synchronized (regionsInTransition) { + nodes.removeAll(regionsInTransition.keySet()); + } Perhaps SSH has put up something in RIT because its done an assign and here we are blanket removing them all? Yes, SSH and master'initializing Thread may assign the same regions, so we should do a prevent of mutli assign.
          Hide
          chunhui shen added a comment -

          So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master?

           if (metaServerLoad != null && metaServerLoad.getNumberOfRegions() > 0
          +            && !catalogTracker.getRootLocation().equals(metaServerAddress)) {
          +          // If metaServer is online && not start just now, we expire it
          +          this.serverManager.expireServer(metaServerInfo);
          +        }
          

          If a server had root and meta , we will ensure not expire it a second time through catalogTracker.getRootLocation().equals(metaServerAddress)

          Show
          chunhui shen added a comment - So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master? if (metaServerLoad != null && metaServerLoad.getNumberOfRegions() > 0 + && !catalogTracker.getRootLocation().equals(metaServerAddress)) { + // If metaServer is online && not start just now, we expire it + this .serverManager.expireServer(metaServerInfo); + } If a server had root and meta , we will ensure not expire it a second time through catalogTracker.getRootLocation().equals(metaServerAddress)
          Hide
          chunhui shen added a comment -
          +   * Dead servers under processing by the ServerShutdownHander. 
          Whats that mean?  Its while the server is being processed by ServerShutdownHandler exclusively -- these are the inProgress servers?
          

          Yes,these are the inProgress servers

          Show
          chunhui shen added a comment - + * Dead servers under processing by the ServerShutdownHander. Whats that mean? Its while the server is being processed by ServerShutdownHandler exclusively -- these are the inProgress servers? Yes,these are the inProgress servers
          Hide
          chunhui shen added a comment -

          Because this issue contains a bug that root will not be assigned and master will block on waiting for root when initializing
          So we set timeout for the testcase.

          Show
          chunhui shen added a comment - Because this issue contains a bug that root will not be assigned and master will block on waiting for root when initializing So we set timeout for the testcase.
          Hide
          chunhui shen added a comment -
          Why the need for this timeout:
          
          +    Thread.sleep(10000 * 2);
          +    ((GatedNodeDeleteRegionServerTracker) master.getRegionServerTracker()).gate
          +        .set(false);
          

          Because we sleep 10s after splitLog, we sleep 20s to make sure that master is assigning RootAndMeta or has assigned. After it we starting process the event of RS node deleted

          Show
          chunhui shen added a comment - Why the need for this timeout: + Thread .sleep(10000 * 2); + ((GatedNodeDeleteRegionServerTracker) master.getRegionServerTracker()).gate + .set( false ); Because we sleep 10s after splitLog, we sleep 20s to make sure that master is assigning RootAndMeta or has assigned. After it we starting process the event of RS node deleted
          Hide
          chunhui shen added a comment -
          Can you just do + super.nodeDeleted(path); instead of + GatedNodeDeleteRegionServerTracker.super.nodeDeleted(path);?
          

          If we block the nodeDeleted(path) in GatedNodeDeleteRegionServerTracker, it will block all the ZK event.
          so I just want to delay the event of RS node deleted through a thread. However, in the thread#run(), we need call GatedNodeDeleteRegionServerTracker.super.nodeDeleted(path);

          Show
          chunhui shen added a comment - Can you just do + super .nodeDeleted(path); instead of + GatedNodeDeleteRegionServerTracker. super .nodeDeleted(path);? If we block the nodeDeleted(path) in GatedNodeDeleteRegionServerTracker, it will block all the ZK event. so I just want to delay the event of RS node deleted through a thread. However, in the thread#run(), we need call GatedNodeDeleteRegionServerTracker.super.nodeDeleted(path);
          Hide
          chunhui shen added a comment -
          +            // We set serverLoad with one region, it could differentiate with
          +            // regionserver which is started just now
          +            HServerLoad serverLoad = new HServerLoad();
          +            serverLoad.setNumberOfRegions(1);
          How you know it has a region?
          

          We do this to mark the RS running ago, not the regionserver which is started just now.
          (If it is a regionserver started just now, it has no regions, so when master assignRootAndMeta,we needn't expire it.(Only 90 version need do this, because rootLocation doesn't contain startcode, so we can't be sure it is a rootServer according to HServerAddress))

          Show
          chunhui shen added a comment - + // We set serverLoad with one region, it could differentiate with + // regionserver which is started just now + HServerLoad serverLoad = new HServerLoad(); + serverLoad.setNumberOfRegions(1); How you know it has a region? We do this to mark the RS running ago, not the regionserver which is started just now. (If it is a regionserver started just now, it has no regions, so when master assignRootAndMeta,we needn't expire it.(Only 90 version need do this, because rootLocation doesn't contain startcode, so we can't be sure it is a rootServer according to HServerAddress))
          Hide
          stack added a comment -

          Why we do this Chunhui?

          +            // We set serverLoad with one region, it could differentiate with
          +            // regionserver which is started just now
          +            HServerLoad serverLoad = new HServerLoad();
          +            serverLoad.setNumberOfRegions(1);
          

          How you know it has a region?

          This needs a bit of javadoc explaining its parsing a ServerName:

          getServerInfo

          Change the data member name from isLogSplitted to logSplit and then the method name becomes isLogSplit.

          Fix this long line:

          +      return getConfiguration().getBoolean("TestingMaster.sleep", false) ? new GatedNodeDeleteRegionServerTracker(
          

          Can you just do + super.nodeDeleted(path); instead of + GatedNodeDeleteRegionServerTracker.super.nodeDeleted(path);?

          Just wondering if you can do your test w/o timeouts but instead by using gates? For example, this wait:

          + cluster.getConfiguration().setInt("TestingMaster.sleep.duration", 10000);

          Or, can you interrupt the wait once your conditions are met so we don't have to wait the ten seconds?

          For your trunk patch need to categorize your test (see other tests in trunk.... see how they have a bit of code on the end and then at top there is an annotation).

          Sleep for shorter here:

          +    while (!master.isLogSplittedAfterStartup()) {
          +      Thread.sleep(1000);
          

          See '13.4.1.2. Writing Tests' in the manual here http://hbase.apache.org/book.html#hbase.tests for some recommendations around timeouts, etc.

          Why the need for this timeout:

          +    Thread.sleep(10000 * 2);
          +    ((GatedNodeDeleteRegionServerTracker) master.getRegionServerTracker()).gate
          +        .set(false);
          

          Can we do w/o? Timing dependent tests can be flakey. The timings make for different results in different contexts.

          What happens if this is not true:

          +    if (rootServerNum != metaServerNum) {
          

          Do we not test?

          There is some duplicated code in this method? Maybe break it out into a method?

          For example, this code:

          +      // First abort master
          +      for (MasterThread mt : cluster.getLiveMasterThreads()) {
          +        if (mt.getMaster().isActiveMaster()) {
          +          mt.getMaster().abort("Aborting for tests",
          +              new Exception("Trace info"));
          +          mt.join();
          +          break;
          +        }
          +      }
          +      LOG.debug("Master is aborted");
          +      master = (TestingMaster) cluster.startMaster().getMaster();
          +      while (!master.isLogSplittedAfterStartup()) {
          +        Thread.sleep(1000);
          +      }
          +      LOG.debug("splitted:" + master.isLogSplittedAfterStartup()
          +          + ",initialized:" + master.isInitialized());
          

          In dead servers, one returns a boolean and the other does not. One ups a counter and the other does not. Should they be the same?

          Can we be more specific in this note?

          +   * Dead servers under processing by the ServerShutdownHander. 
          

          Whats that mean? Its while the server is being processed by ServerShutdownHandler exclusively – these are the inProgress servers?

          Perhaps rename getLogDirIfExists as getServerLogDirIfExists?

          In HMaster, fix my bad javadoc. It says 'Used testing' for createRegionServerTracker. Should be 'Override to change master's RegionServerTracker creation. Used testing'. Similar for splitLogAfterStartup.

          So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master?

          Who is clearing + for (; serverName != null; ) { in the wait loop?

          Should we check the master stopped flag in this loop too in case its been stopped while this is running?

          Does this not already exist in HMaster? getRegionServerTracker Does it have to be public (maybe it does for tests?)

          Do we need to add this? getOnlineServerNames Can you not get it from getOnlineServers? This is a convenience utility?

          Is there possibility of race here?

          +    // Remove regions in RIT, they are may being processed by the SSH.
          +    synchronized (regionsInTransition) {
          +      nodes.removeAll(regionsInTransition.keySet());
          +    }
          

          Perhaps SSH has put up something in RIT because its done an assign and here we are blanket removing them all?

          Otherwise, patch is coming along Chunhui. Thanks.

          Show
          stack added a comment - Why we do this Chunhui? + // We set serverLoad with one region, it could differentiate with + // regionserver which is started just now + HServerLoad serverLoad = new HServerLoad(); + serverLoad.setNumberOfRegions(1); How you know it has a region? This needs a bit of javadoc explaining its parsing a ServerName: getServerInfo Change the data member name from isLogSplitted to logSplit and then the method name becomes isLogSplit. Fix this long line: + return getConfiguration().getBoolean( "TestingMaster.sleep" , false ) ? new GatedNodeDeleteRegionServerTracker( Can you just do + super.nodeDeleted(path); instead of + GatedNodeDeleteRegionServerTracker.super.nodeDeleted(path);? Just wondering if you can do your test w/o timeouts but instead by using gates? For example, this wait: + cluster.getConfiguration().setInt("TestingMaster.sleep.duration", 10000); Or, can you interrupt the wait once your conditions are met so we don't have to wait the ten seconds? For your trunk patch need to categorize your test (see other tests in trunk.... see how they have a bit of code on the end and then at top there is an annotation). Sleep for shorter here: + while (!master.isLogSplittedAfterStartup()) { + Thread .sleep(1000); See '13.4.1.2. Writing Tests' in the manual here http://hbase.apache.org/book.html#hbase.tests for some recommendations around timeouts, etc. Why the need for this timeout: + Thread .sleep(10000 * 2); + ((GatedNodeDeleteRegionServerTracker) master.getRegionServerTracker()).gate + .set( false ); Can we do w/o? Timing dependent tests can be flakey. The timings make for different results in different contexts. What happens if this is not true: + if (rootServerNum != metaServerNum) { Do we not test? There is some duplicated code in this method? Maybe break it out into a method? For example, this code: + // First abort master + for (MasterThread mt : cluster.getLiveMasterThreads()) { + if (mt.getMaster().isActiveMaster()) { + mt.getMaster().abort( "Aborting for tests" , + new Exception( "Trace info" )); + mt.join(); + break ; + } + } + LOG.debug( "Master is aborted" ); + master = (TestingMaster) cluster.startMaster().getMaster(); + while (!master.isLogSplittedAfterStartup()) { + Thread .sleep(1000); + } + LOG.debug( "splitted:" + master.isLogSplittedAfterStartup() + + ",initialized:" + master.isInitialized()); In dead servers, one returns a boolean and the other does not. One ups a counter and the other does not. Should they be the same? Can we be more specific in this note? + * Dead servers under processing by the ServerShutdownHander. Whats that mean? Its while the server is being processed by ServerShutdownHandler exclusively – these are the inProgress servers? Perhaps rename getLogDirIfExists as getServerLogDirIfExists? In HMaster, fix my bad javadoc. It says 'Used testing' for createRegionServerTracker. Should be 'Override to change master's RegionServerTracker creation. Used testing'. Similar for splitLogAfterStartup. So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master? Who is clearing + for (; serverName != null; ) { in the wait loop? Should we check the master stopped flag in this loop too in case its been stopped while this is running? Does this not already exist in HMaster? getRegionServerTracker Does it have to be public (maybe it does for tests?) Do we need to add this? getOnlineServerNames Can you not get it from getOnlineServers? This is a convenience utility? Is there possibility of race here? + // Remove regions in RIT, they are may being processed by the SSH. + synchronized (regionsInTransition) { + nodes.removeAll(regionsInTransition.keySet()); + } Perhaps SSH has put up something in RIT because its done an assign and here we are blanket removing them all? Otherwise, patch is coming along Chunhui. Thanks.
          Hide
          stack added a comment -

          @Chunhui Pardon me. I did not get to review this today. I will do it tomorrow.

          Show
          stack added a comment - @Chunhui Pardon me. I did not get to review this today. I will do it tomorrow.
          Hide
          chunhui shen added a comment -

          Testcase and patch for 90 version

          Show
          chunhui shen added a comment - Testcase and patch for 90 version
          Hide
          chunhui shen added a comment -

          Optimize the testcase as Stack's sample.
          And hbase-5270v2 is a patch to fix the issue for trunk including testcase.

          Show
          chunhui shen added a comment - Optimize the testcase as Stack's sample. And hbase-5270v2 is a patch to fix the issue for trunk including testcase.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Chunhui

          Appreciate your work on preparing test cases.

          Show
          ramkrishna.s.vasudevan added a comment - @Chunhui Appreciate your work on preparing test cases.
          Hide
          stack added a comment -

          @Chunhui Excellent.

          I am all good w/ modifying core to make it more testable. Would suggest instead though that you make more generic changes up in Master, etc.

          For example, the attached patch makes it so a subclass of HMaster can observe log splitting and insert pauses and override the RegionServerTracker to pause on deletes until a gate is cleared. This might make more sense than the custom changes made to RegionServerTracker and HMaster in this patch.

          On the patch, I think the HMaster and RegionServerTracker changes are too specific to this test case. Would suggest making more generic changes to these core classes because other tests will be able to make use of a more generic change (I think its good to mod core classes to make them more testable).
          '
          On the test, can we have a better name than TestHRegionserverKilled? It doesn't say what this test does (testDataCorrectnessWhenMasterFailOver might be better as class name).

          Why do this:

          + Configuration conf = HBaseConfiguration.create();

          Why not use whats in your HBaseTestingUtility (do a getConfiguration – see the attached test).

          Also, you might use the junit primitives for test setup and teardown as per the attache test.

          Show
          stack added a comment - @Chunhui Excellent. I am all good w/ modifying core to make it more testable. Would suggest instead though that you make more generic changes up in Master, etc. For example, the attached patch makes it so a subclass of HMaster can observe log splitting and insert pauses and override the RegionServerTracker to pause on deletes until a gate is cleared. This might make more sense than the custom changes made to RegionServerTracker and HMaster in this patch. On the patch, I think the HMaster and RegionServerTracker changes are too specific to this test case. Would suggest making more generic changes to these core classes because other tests will be able to make use of a more generic change (I think its good to mod core classes to make them more testable). ' On the test, can we have a better name than TestHRegionserverKilled? It doesn't say what this test does (testDataCorrectnessWhenMasterFailOver might be better as class name). Why do this: + Configuration conf = HBaseConfiguration.create(); Why not use whats in your HBaseTestingUtility (do a getConfiguration – see the attached test). Also, you might use the junit primitives for test setup and teardown as per the attache test.
          Hide
          Ted Yu added a comment -

          Reattaching patch v2 for TRUNK with --no-prefix

          Show
          Ted Yu added a comment - Reattaching patch v2 for TRUNK with --no-prefix
          Hide
          chunhui shen added a comment -

          I have written a testcase for this issue to show the exist problem in 5270-testcase.patch

          And hbase-5270.patch is the combination of testcase and HBASE-5179 ‘s latest patch。

          5270-90.patch and 5270-90-testcase.patch are for 90 version

          Now, testcase may not contain all the situations which causes problems and runs slowly.

          I will optimize the testcase later.

          Show
          chunhui shen added a comment - I have written a testcase for this issue to show the exist problem in 5270-testcase.patch And hbase-5270.patch is the combination of testcase and HBASE-5179 ‘s latest patch。 5270-90.patch and 5270-90-testcase.patch are for 90 version Now, testcase may not contain all the situations which causes problems and runs slowly. I will optimize the testcase later.
          Hide
          stack added a comment -

          So when assigning regions, some regionplans whose destination is the dead server will be failed.

          True. They'll fail and get retried elsewhere which shouldn't be too bad. But need to keep it in mind. This all doesn't work (I think) if we need to scan meta and its on a server that has just gone down.

          Show
          stack added a comment - So when assigning regions, some regionplans whose destination is the dead server will be failed. True. They'll fail and get retried elsewhere which shouldn't be too bad. But need to keep it in mind. This all doesn't work (I think) if we need to scan meta and its on a server that has just gone down.
          Hide
          chunhui shen added a comment -

          If we don't process zookeeper events that come in during master failover, one dead regionserver will be considered as onlined in ServerManager.
          So when assigning regions, some regionplans whose destination is the dead server will be failed.

          I agree with lessen the number of code paths, and we should ensure the way first.

          Show
          chunhui shen added a comment - If we don't process zookeeper events that come in during master failover, one dead regionserver will be considered as onlined in ServerManager. So when assigning regions, some regionplans whose destination is the dead server will be failed. I agree with lessen the number of code paths, and we should ensure the way first.
          Hide
          stack added a comment -

          HBASE-3171 is the issue to purge root.

          Show
          stack added a comment - HBASE-3171 is the issue to purge root.
          Hide
          stack added a comment -

          I was taking a look through HBASE-5179 and HBASE-4748 again, the two issues that spawned this one (Both are in synopsis about master failover with concurrent servershutdown handler running). I have also been looking at "HBASE-5344
          [89-fb] Scan unassigned region directory on master failover".

          HBASE-5179 starts out as we can miss edits if a server is discovered to be dead AFTER master failover has started up splitting logs because we'll notice it dead so will assign out its regions but before we've had a chance to split its logs. The way fb deal with this in hbase-5344 is not to process zookeeper events that come in during master failover. They queue them instead and only start in on the processing after master is up.

          Chunhui does something like this in his original patch by adding any server currently being processed by server shutdown to the list of regionservers whose logs we should not split. The fb way of halting temporarily the callback processing seems more airtight.

          HBASE-5179 is then extended to include as in scope, the processing of servers carrying root and meta (hbase-4748) that crash during master failover. We need to consider the cases where a server crashes AFTER master failover distributed log splitting has started but before we run the verifications of meta and root locations.

          Currently we'll expire the server that is unresponsive when we go to verify root and meta locations. The notion is that the meta regions will be assigned by the server shutdown handler. The fb technique of turning off processing zk events would mess with our existing handling code here – but I'm not too confident the code is going to do the right thing since it has no tests of this predicament and the scenarios look like they could be pretty varied (root is offline only, meta server has crashed only, a server with both root and meta has crashed, etc). In hbase-5344, fb will go query each regionserver for the regions its currently hosting (and look in zk to see what rs are up). Maybe we need some of this from 89-fb in trunk but I'm not clear on it just yet; would need more study of the current state of trunk and then of what is happening over in 89-fb.

          One thing I think we should do to lessen the number of code paths we can take on failover is to do the long-talked of purge of the root region. This should cut down on the number of states we need to deal with and make reasoning about failure states on failover easier to reason about.

          Show
          stack added a comment - I was taking a look through HBASE-5179 and HBASE-4748 again, the two issues that spawned this one (Both are in synopsis about master failover with concurrent servershutdown handler running). I have also been looking at " HBASE-5344 [89-fb] Scan unassigned region directory on master failover". HBASE-5179 starts out as we can miss edits if a server is discovered to be dead AFTER master failover has started up splitting logs because we'll notice it dead so will assign out its regions but before we've had a chance to split its logs. The way fb deal with this in hbase-5344 is not to process zookeeper events that come in during master failover. They queue them instead and only start in on the processing after master is up. Chunhui does something like this in his original patch by adding any server currently being processed by server shutdown to the list of regionservers whose logs we should not split. The fb way of halting temporarily the callback processing seems more airtight. HBASE-5179 is then extended to include as in scope, the processing of servers carrying root and meta (hbase-4748) that crash during master failover. We need to consider the cases where a server crashes AFTER master failover distributed log splitting has started but before we run the verifications of meta and root locations. Currently we'll expire the server that is unresponsive when we go to verify root and meta locations. The notion is that the meta regions will be assigned by the server shutdown handler. The fb technique of turning off processing zk events would mess with our existing handling code here – but I'm not too confident the code is going to do the right thing since it has no tests of this predicament and the scenarios look like they could be pretty varied (root is offline only, meta server has crashed only, a server with both root and meta has crashed, etc). In hbase-5344, fb will go query each regionserver for the regions its currently hosting (and look in zk to see what rs are up). Maybe we need some of this from 89-fb in trunk but I'm not clear on it just yet; would need more study of the current state of trunk and then of what is happening over in 89-fb. One thing I think we should do to lessen the number of code paths we can take on failover is to do the long-talked of purge of the root region. This should cut down on the number of states we need to deal with and make reasoning about failure states on failover easier to reason about.
          Hide
          stack added a comment -

          What if the region server hosting .META. went down ?

          Yes... was just thinking about that. In this case we'd run the splitter in-line, in SSH, not via executor.... let me look at code. I'm trying to write tests and catch-up on all the stuff that was done over on previous issue.

          Show
          stack added a comment - What if the region server hosting .META. went down ? Yes... was just thinking about that. In this case we'd run the splitter in-line, in SSH, not via executor.... let me look at code. I'm trying to write tests and catch-up on all the stuff that was done over on previous issue.
          Hide
          Ted Yu added a comment -

          though it may have been the same server

          This has been handled in patch on reviewboard, line 628:

                    !currentMetaServer.equals(currentRootServer) &&
          

          Could we hold up shutdownserverhandler until master is up?

          What if the region server hosting .META. went down ?

          Show
          Ted Yu added a comment - though it may have been the same server This has been handled in patch on reviewboard, line 628: !currentMetaServer.equals(currentRootServer) && Could we hold up shutdownserverhandler until master is up? What if the region server hosting .META. went down ?
          Hide
          stack added a comment -

          So, the param 'definitiveRootServer' is used in this case to ensure the dead root server is carryingRoot when it is being expired.

          Whats 'definitive' about it? Is it that we know for sure the server was carrying root or meta? How?

          Is there any possible to expire a server if its carrying root and meta now? I don't think so.

          You are saying that this patch does nothing new here? We COULD expire the server that was carrying root, wait on its log split, then expire the server carrying meta (though it may have been the same server)... it might be ok but we might kill a server that has just started. I'm ok if fixing this is outside scope of this patch.

          I don't find this operation earlier in master setup, and this operation is not introduced by this issue. And I only introduce this logic for 90 from trunk.

          So, you copied this to 0.90 from TRUNK (so my notion that we already had this is my remembering how things work on TRUNK.. that would make sense).

          I think we need explain it, But whether we shouldn't use distributed split log, I'm not very sure.

          If we are not sure, we shouldn't do it.

          When matser is initializing, if one RS is killed and restart, then dead server is in progress while master startup

          This seems like a small window. Or do you think it could happen frequent? Could we hold up shutdownserverhandler until master is up?

          Show
          stack added a comment - So, the param 'definitiveRootServer' is used in this case to ensure the dead root server is carryingRoot when it is being expired. Whats 'definitive' about it? Is it that we know for sure the server was carrying root or meta? How? Is there any possible to expire a server if its carrying root and meta now? I don't think so. You are saying that this patch does nothing new here? We COULD expire the server that was carrying root, wait on its log split, then expire the server carrying meta (though it may have been the same server)... it might be ok but we might kill a server that has just started. I'm ok if fixing this is outside scope of this patch. I don't find this operation earlier in master setup, and this operation is not introduced by this issue. And I only introduce this logic for 90 from trunk. So, you copied this to 0.90 from TRUNK (so my notion that we already had this is my remembering how things work on TRUNK.. that would make sense). I think we need explain it, But whether we shouldn't use distributed split log, I'm not very sure. If we are not sure, we shouldn't do it. When matser is initializing, if one RS is killed and restart, then dead server is in progress while master startup This seems like a small window. Or do you think it could happen frequent? Could we hold up shutdownserverhandler until master is up?
          Hide
          chunhui shen added a comment -

          Thanks for stack's comment

          This method param names are not right 'definitiveRootServer'; what is meant by definitive?  Do they need this qualifier?
          

          When master is initializing, carryingMeta and carryingRoot is always false (We could see it from the code of ServerManger#expire()). So, the param 'definitiveRootServer' is used in this case to ensure the dead root server is carryingRoot when it is being expired.

          Is there anything in place to stop us expiring a server twice if its carrying root and meta?
          

          Is there any possible to expire a server if its carrying root and meta now? I don't think so.

          onlineServers needs to be explained

          I think so, this param is not be passed into joinCluster before, so when executing joinCluster , the onlineServers is current online servers. However, it has a problem before, when executing joinCluster , maybe some server is being processed as dead server, so its log is being splitted by SSH while its regions are being assigned by joinCluster(),causing data loss.

          It looks like we get the list by trawling zk for regionserver znodes that have not checked in.  Don't we do this operation earlier in master setup?  Are we doing it again here?
          

          I don't find this operation earlier in master setup, and this operation is not introduced by this issue. And I only introduce this logic for 90 from trunk.

          Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch.  Its not explained in code why we would do this.

          I think we need explain it, But whether we shouldn't use distributed split log, I'm not very sure.

          Why would we have dead servers in progress here in master startup?  Because a servershutdownhandler fired?
          

          When matser is initializing, if one RS is killed and restart, then dead server is in progress while master startup

          Show
          chunhui shen added a comment - Thanks for stack's comment This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? When master is initializing, carryingMeta and carryingRoot is always false (We could see it from the code of ServerManger#expire()). So, the param 'definitiveRootServer' is used in this case to ensure the dead root server is carryingRoot when it is being expired. Is there anything in place to stop us expiring a server twice if its carrying root and meta? Is there any possible to expire a server if its carrying root and meta now? I don't think so. onlineServers needs to be explained I think so, this param is not be passed into joinCluster before, so when executing joinCluster , the onlineServers is current online servers. However, it has a problem before, when executing joinCluster , maybe some server is being processed as dead server, so its log is being splitted by SSH while its regions are being assigned by joinCluster(),causing data loss. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? I don't find this operation earlier in master setup, and this operation is not introduced by this issue. And I only introduce this logic for 90 from trunk. Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this . I think we need explain it, But whether we shouldn't use distributed split log, I'm not very sure. Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? When matser is initializing, if one RS is killed and restart, then dead server is in progress while master startup
          Hide
          Ted Yu added a comment -

          Updated patch to address Stack's comments up till 'Is there anything in place to stop us expiring a server twice if its carrying root and meta?'

          Show
          Ted Yu added a comment - Updated patch to address Stack's comments up till 'Is there anything in place to stop us expiring a server twice if its carrying root and meta?'
          Hide
          Ted Yu added a comment -

          https://reviews.apache.org/r/3601 is created to track the review process for this JIRA.

          Show
          Ted Yu added a comment - https://reviews.apache.org/r/3601 is created to track the review process for this JIRA.

            People

            • Assignee:
              chunhui shen
              Reporter:
              Ted Yu
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development