Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.99.0
    • Component/s: master, Region Assignment
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      With this patch, HMaster is also a HRegionServer. Active master can serve some small tables per configuration (by default, it doesn't). Backup masters don't serve any region.

      HMaster and HRegionServer share the same RPC server. That's transparent to clients since the master information is in ZK. The master and the regionserver also shares the same web UI server.

      There is no protocol, or API change. So this feature is backward compatible, and rolling upgradable. In case it's needed, some configurations (HBASE-10815) will be added so that (1) we can have two web UI servers in the master (and backup masters), one for the master, and the other for the region server; (2) we can leave backup masters alone and don't put any regions on them. So that things are the same as before.

      With this patch, the following configurations are removed (and not used any more):

      hbase.master.dns.interface
      hbase.master.dns.nameserver
      hbase.master.port
      hbase.master.ipc.address
      hbase.master.zksession.recover.timeout
      fail.fast.expired.active.master
      hbase.master.handler.count



      Show
      With this patch, HMaster is also a HRegionServer. Active master can serve some small tables per configuration (by default, it doesn't). Backup masters don't serve any region. HMaster and HRegionServer share the same RPC server. That's transparent to clients since the master information is in ZK. The master and the regionserver also shares the same web UI server. There is no protocol, or API change. So this feature is backward compatible, and rolling upgradable. In case it's needed, some configurations ( HBASE-10815 ) will be added so that (1) we can have two web UI servers in the master (and backup masters), one for the master, and the other for the region server; (2) we can leave backup masters alone and don't put any regions on them. So that things are the same as before. With this patch, the following configurations are removed (and not used any more): hbase.master.dns.interface hbase.master.dns.nameserver hbase.master.port hbase.master.ipc.address hbase.master.zksession.recover.timeout fail.fast.expired.active.master hbase.master.handler.count

      Description

      I was thinking simplifying/improving the region assignments. The first step is to co-locate the meta and the master as many people agreed on HBASE-5487.

      1. master_rs.pdf
        72 kB
        Jimmy Xiang
      2. Co-locateMetaAndMasterHBASE-10569.pdf
        200 kB
        stack
      3. hbase-10569_v3.1.patch
        660 kB
        Jimmy Xiang
      4. hbase-10569_v3.patch
        660 kB
        Jimmy Xiang
      5. hbase-10569_v2.patch
        674 kB
        Jimmy Xiang
      6. hbase-10569_v1.patch
        654 kB
        Jimmy Xiang

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-1.1 #152 (See https://builds.apache.org/job/HBase-1.1/152/)
          HBASE-12956 Binding to 0.0.0.0 is broken after HBASE-10569 (enis: rev fc7f53f240b3a5a83333a1819b0ec2f0f7d8e3aa)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
          • hbase-server/src/test/java/org/apache/hadoop/hbase/TestHBaseTestingUtility.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-1.1 #152 (See https://builds.apache.org/job/HBase-1.1/152/ ) HBASE-12956 Binding to 0.0.0.0 is broken after HBASE-10569 (enis: rev fc7f53f240b3a5a83333a1819b0ec2f0f7d8e3aa) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java hbase-server/src/test/java/org/apache/hadoop/hbase/TestHBaseTestingUtility.java hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-1.0 #718 (See https://builds.apache.org/job/HBase-1.0/718/)
          HBASE-12956 Binding to 0.0.0.0 is broken after HBASE-10569 (enis: rev 15140bf48491d92dae2d514f2cc84c09205d87b7)

          • hbase-server/src/test/java/org/apache/hadoop/hbase/TestHBaseTestingUtility.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-1.0 #718 (See https://builds.apache.org/job/HBase-1.0/718/ ) HBASE-12956 Binding to 0.0.0.0 is broken after HBASE-10569 (enis: rev 15140bf48491d92dae2d514f2cc84c09205d87b7) hbase-server/src/test/java/org/apache/hadoop/hbase/TestHBaseTestingUtility.java hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #6100 (See https://builds.apache.org/job/HBase-TRUNK/6100/)
          HBASE-12956 Binding to 0.0.0.0 is broken after HBASE-10569 (enis: rev 3b56d2a0bc36f9dcb901bb709b8d9ae58df955ff)

          • hbase-server/src/test/java/org/apache/hadoop/hbase/TestHBaseTestingUtility.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #6100 (See https://builds.apache.org/job/HBase-TRUNK/6100/ ) HBASE-12956 Binding to 0.0.0.0 is broken after HBASE-10569 (enis: rev 3b56d2a0bc36f9dcb901bb709b8d9ae58df955ff) hbase-server/src/test/java/org/apache/hadoop/hbase/TestHBaseTestingUtility.java hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          Hide
          Jimmy Xiang added a comment -

          Updated the release notes because HBASE-12034 changed the behavior a little bit.

          Show
          Jimmy Xiang added a comment - Updated the release notes because HBASE-12034 changed the behavior a little bit.
          Hide
          Mikhail Antonov added a comment -

          Regarding some discussion about hosting META on master

          • if we want to run cluster with 1M or 50M regions (HBASE-11165), we may (will?) have to host META regions on multiple servers
          • on the zk-less assignments (HBASE-11059), short-circuit atomic updates of meta were mentioned (making use of the fact that meta is local to master).

          That said, how do we now define that - is hosting meta on master the very-nice-to-have optimization, or is that pre-requisite for cluster to function?

          Show
          Mikhail Antonov added a comment - Regarding some discussion about hosting META on master if we want to run cluster with 1M or 50M regions ( HBASE-11165 ), we may (will?) have to host META regions on multiple servers on the zk-less assignments ( HBASE-11059 ), short-circuit atomic updates of meta were mentioned (making use of the fact that meta is local to master). That said, how do we now define that - is hosting meta on master the very-nice-to-have optimization, or is that pre-requisite for cluster to function?
          Hide
          Jonathan Hsieh added a comment -

          Point of order: If I read it right, you can only do 'agreement' on mailing list ("Community decisions must be reached on the mailing list. " [1]).

          Thanks for the reminder. Duly noted.

          Show
          Jonathan Hsieh added a comment - Point of order: If I read it right, you can only do 'agreement' on mailing list ("Community decisions must be reached on the mailing list. " [1] ). Thanks for the reminder. Duly noted.
          Hide
          Francis Liu added a comment -

          I think the main motivation is the colocation of all the components that are usually involved in a single "transaction". The
          main example of this is the Assignment, which involve: Master, META, ZooKeeper.

          I see but the notion of consolidation is counter-intuitive if we want hbase to be horizontally scalable. See the-node-that-must-not-be-named as an example.

          #1 and #2 are part of a generic notification system, which will be used to propagate ACLs, Visibility, Quotas. (In theory the base of this system is also the one behind the ZK-less Assignment)

          We should be able to implement a notification system without #1 and #2? Would be good to have a doc describing and motivatiing why we need the dependency. And why it is better than a generic version of the current approach.

          For the Horizontal scalability, I think that we are going to have Multiple Master each one operating on its subsection of "meta" (and the notification system). This means that you will have concurrent assignments on different masters.
          The best case is where you can fit a full table (regions metadata) on a single master, the other case is where your table is split on multiple master which means that operation that requires to work on the full set of regions e.g. delete, disable, enable need some sort of coordination to provide the full consistency that you'll get with a full table that fits on a single master.

          Yep, instead avoiding distributed coordination tasks we should make distributed operations a first class operation.

          Show
          Francis Liu added a comment - I think the main motivation is the colocation of all the components that are usually involved in a single "transaction". The main example of this is the Assignment, which involve: Master, META, ZooKeeper. I see but the notion of consolidation is counter-intuitive if we want hbase to be horizontally scalable. See the-node-that-must-not-be-named as an example. #1 and #2 are part of a generic notification system, which will be used to propagate ACLs, Visibility, Quotas. (In theory the base of this system is also the one behind the ZK-less Assignment) We should be able to implement a notification system without #1 and #2? Would be good to have a doc describing and motivatiing why we need the dependency. And why it is better than a generic version of the current approach. For the Horizontal scalability, I think that we are going to have Multiple Master each one operating on its subsection of "meta" (and the notification system). This means that you will have concurrent assignments on different masters. The best case is where you can fit a full table (regions metadata) on a single master, the other case is where your table is split on multiple master which means that operation that requires to work on the full set of regions e.g. delete, disable, enable need some sort of coordination to provide the full consistency that you'll get with a full table that fits on a single master. Yep, instead avoiding distributed coordination tasks we should make distributed operations a first class operation.
          Hide
          stack added a comment -

          I thought that is what we agreed upon old topology default for 1.0 and new topology default for trunk/2.0 during the pow-wow.

          Point of order: If I read it right, you can only do 'agreement' on mailing list ("Community decisions must be reached on the mailing list. " [1]).

          At the powwow we raised this issue, yeah. We said stuff like it probably more palatable, as noted above, that "...for 1.0 we should just go w/ the old topology and do clean switch to new layout in 2.0..." and that we needed to bring this issue to the 1.0 RM's attention. I'd be in favor of keeping our current default for 1.0 apache hbase but would be fine if folks want to flip the option so master carrying regions is not on by default.

          1. https://blogs.apache.org/comdev/entry/how_apache_projects_use_consensus

          Show
          stack added a comment - I thought that is what we agreed upon old topology default for 1.0 and new topology default for trunk/2.0 during the pow-wow. Point of order: If I read it right, you can only do 'agreement' on mailing list ("Community decisions must be reached on the mailing list. " [1] ). At the powwow we raised this issue, yeah. We said stuff like it probably more palatable, as noted above, that "...for 1.0 we should just go w/ the old topology and do clean switch to new layout in 2.0..." and that we needed to bring this issue to the 1.0 RM's attention. I'd be in favor of keeping our current default for 1.0 apache hbase but would be fine if folks want to flip the option so master carrying regions is not on by default. 1. https://blogs.apache.org/comdev/entry/how_apache_projects_use_consensus
          Hide
          Enis Soztutar added a comment -

          I think having a brand new deployment style with gotcha's we haven't figured out at scale with yet is risky for what we want to be a super stable release.

          +1 for that. It was also one of my concerns for 1.0.

          Show
          Enis Soztutar added a comment - I think having a brand new deployment style with gotcha's we haven't figured out at scale with yet is risky for what we want to be a super stable release. +1 for that. It was also one of my concerns for 1.0.
          Hide
          Jimmy Xiang added a comment -

          The default is on now.

          Show
          Jimmy Xiang added a comment - The default is on now.
          Hide
          Jonathan Hsieh added a comment -

          I thought that is what we agreed upon old topology default for 1.0 and new topology default for trunk/2.0 during the pow-wow.

          My vote is to keep the old topology for hbase 1.0, but have the option to go new in 1.0. For 2.0/trunk, we default to the new topology. The old topology is well understood operationally for deploys. Though straightforward for us devs, I think having a brand new deployment style with gotcha's we haven't figured out at scale with yet is risky for what we want to be a super stable release.

          Show
          Jonathan Hsieh added a comment - I thought that is what we agreed upon old topology default for 1.0 and new topology default for trunk/2.0 during the pow-wow. My vote is to keep the old topology for hbase 1.0, but have the option to go new in 1.0. For 2.0/trunk, we default to the new topology. The old topology is well understood operationally for deploys. Though straightforward for us devs, I think having a brand new deployment style with gotcha's we haven't figured out at scale with yet is risky for what we want to be a super stable release.
          Hide
          stack added a comment -

          Jimmy Xiang Should we default on co-located master and meta for apache hbase 1.0 (nice new 'feature'?) If defaulted on, it'll be tested. We (and vendors) can also test old topology still works. We should probably ask out on list.....

          Show
          stack added a comment - Jimmy Xiang Should we default on co-located master and meta for apache hbase 1.0 (nice new 'feature'?) If defaulted on, it'll be tested. We (and vendors) can also test old topology still works. We should probably ask out on list.....
          Hide
          Jimmy Xiang added a comment - - edited

          Totally agree with what Stack and Matteo said.

          As to Enis Soztutar's question:

          2) if we are also allowing to completely disable this feature (as in the other jira), will there still be benefit for this?

          Big features usually come in with an option to disable it at first. This is a basic idea to introduce great features with smooth migration paths for users at the beginning, right?

          Show
          Jimmy Xiang added a comment - - edited Totally agree with what Stack and Matteo said. As to Enis Soztutar 's question: 2) if we are also allowing to completely disable this feature (as in the other jira), will there still be benefit for this? Big features usually come in with an option to disable it at first. This is a basic idea to introduce great features with smooth migration paths for users at the beginning, right?
          Hide
          stack added a comment -

          ...what is the benefit of having this complexity.

          Complex because we need to be able to support both topologies? (The patch is taking us toward a simpler deploy and a master state that is easier to reason about; these is also some nice refactoring of the megalithic master and regionserver classes).

          I could buy the argument that for 1.0 we should just go w/ the old topology and do clean switch to new layout in 2.0. I would like it to be on sooner than this but we are short on testing as it is so one less option, is likely the way to go.

          We should not do this for saving on RPC's, but actually getting rid of master in-memory state.

          This is not about saving RPCs. It is not about getting rid of master-in-memory state either. If anything, it is about more of the cluster state being owned by master.

          If we have the state in meta and only in meta, then we won't need colocation requirements or meta-as-a-single-region.

          Not sure how this would work. meta is dumb. You need an agent of some kind. I'd like to hear more E.

          This issue has gotten a bit messy (Francis Liu called it first in attached doc saying we seem to be talking about what this issue is not about). It is raising loads of good stuff that we need to break out and make sure it all gets covered (I opened HBASE-11165 at Francis prompting). I could give this a pass and add links to our 1.0 issue.

          Show
          stack added a comment - ...what is the benefit of having this complexity. Complex because we need to be able to support both topologies? (The patch is taking us toward a simpler deploy and a master state that is easier to reason about; these is also some nice refactoring of the megalithic master and regionserver classes). I could buy the argument that for 1.0 we should just go w/ the old topology and do clean switch to new layout in 2.0. I would like it to be on sooner than this but we are short on testing as it is so one less option, is likely the way to go. We should not do this for saving on RPC's, but actually getting rid of master in-memory state. This is not about saving RPCs. It is not about getting rid of master-in-memory state either. If anything, it is about more of the cluster state being owned by master. If we have the state in meta and only in meta, then we won't need colocation requirements or meta-as-a-single-region. Not sure how this would work. meta is dumb. You need an agent of some kind. I'd like to hear more E. This issue has gotten a bit messy ( Francis Liu called it first in attached doc saying we seem to be talking about what this issue is not about). It is raising loads of good stuff that we need to break out and make sure it all gets covered (I opened HBASE-11165 at Francis prompting). I could give this a pass and add links to our 1.0 issue.
          Hide
          Enis Soztutar added a comment -

          It is required we can rolling upgrade from 0.98 to 1.0 (do we have that in our 1.0 scope doc? If not, lets do so).

          I though unless we break it, it should be de facto from release to release. Opened HBASE-11164.

          At hackathon discussion was a means of maintaining the current topology – i.e. Masters do mastering and nothing else, BU Masters sit idle

          This is why I was arguing that backup masters should default to hosting no regions unless explicitly specified.

          because at least for vendors who are not at a major version juncture when hbase 1.0 ships, they'll probably want to keep the old layout.

          Agreed, that is what we are going to do.

          I think it on by default in 1.0 apache hbase. If not then, if others think differently, on by default in apache hbase 2.0.

          My concern is that if we cannot immediately get the benefits of co-locating, and the vendors or some big deployments won't turn this on, what is the benefit of having this complexity. We should not do this for saving on RPC's, but actually getting rid of master in-memory state. If we have the state in meta and only in meta, then we won't need colocation requirements or meta-as-a-single-region.

          Show
          Enis Soztutar added a comment - It is required we can rolling upgrade from 0.98 to 1.0 (do we have that in our 1.0 scope doc? If not, lets do so). I though unless we break it, it should be de facto from release to release. Opened HBASE-11164 . At hackathon discussion was a means of maintaining the current topology – i.e. Masters do mastering and nothing else, BU Masters sit idle This is why I was arguing that backup masters should default to hosting no regions unless explicitly specified. because at least for vendors who are not at a major version juncture when hbase 1.0 ships, they'll probably want to keep the old layout. Agreed, that is what we are going to do. I think it on by default in 1.0 apache hbase. If not then, if others think differently, on by default in apache hbase 2.0. My concern is that if we cannot immediately get the benefits of co-locating, and the vendors or some big deployments won't turn this on, what is the benefit of having this complexity. We should not do this for saving on RPC's, but actually getting rid of master in-memory state. If we have the state in meta and only in meta, then we won't need colocation requirements or meta-as-a-single-region.
          Hide
          stack added a comment -

          Sorry, assigned myself by mistack.

          Show
          stack added a comment - Sorry, assigned myself by mistack.
          Hide
          stack added a comment -

          Enis Soztutar

          Will this break rolling upgrades from 0.98 -> 0.99? We want to keep 0.98 to 1.0 rolling restart support.

          It is required we can rolling upgrade from 0.98 to 1.0 (do we have that in our 1.0 scope doc? If not, lets do so). This feature should not preclude. At hackathon discussion was a means of maintaining the current topology – i.e. Masters do mastering and nothing else, BU Masters sit idle – because at least for vendors who are not at a major version juncture when hbase 1.0 ships, they'll probably want to keep the old layout. I think that as long as we can do a rolling upgrade and we broadcast it loud enough in the release notes, a master's role changing going from 0.98 to 0.99 such that it hosts regions would be fine (in apache hbase).

          if we are also allowing to completely disable this feature (as in the other jira), will there still be benefit for this?

          I think it on by default in 1.0 apache hbase. If not then, if others think differently, on by default in apache hbase 2.0.

          I updated the attached doc. to address some of the comments. I also put there as something to keep in mind Francis's requirement that hbase master be able to do 1M regions (host and assign fast) very soon and 50M regions not too long after that.

          Show
          stack added a comment - Enis Soztutar Will this break rolling upgrades from 0.98 -> 0.99? We want to keep 0.98 to 1.0 rolling restart support. It is required we can rolling upgrade from 0.98 to 1.0 (do we have that in our 1.0 scope doc? If not, lets do so). This feature should not preclude. At hackathon discussion was a means of maintaining the current topology – i.e. Masters do mastering and nothing else, BU Masters sit idle – because at least for vendors who are not at a major version juncture when hbase 1.0 ships, they'll probably want to keep the old layout. I think that as long as we can do a rolling upgrade and we broadcast it loud enough in the release notes, a master's role changing going from 0.98 to 0.99 such that it hosts regions would be fine (in apache hbase). if we are also allowing to completely disable this feature (as in the other jira), will there still be benefit for this? I think it on by default in 1.0 apache hbase. If not then, if others think differently, on by default in apache hbase 2.0. I updated the attached doc. to address some of the comments. I also put there as something to keep in mind Francis's requirement that hbase master be able to do 1M regions (host and assign fast) very soon and 50M regions not too long after that.
          Hide
          Matteo Bertozzi added a comment -

          I think the main motivation is the colocation of all the components that are usually involved in a single "transaction". The main example of this is the Assignment, which involve: Master, META, ZooKeeper.

          #1 and #2 are part of a generic notification system, which will be used to propagate ACLs, Visibility, Quotas. (In theory the base of this system is also the one behind the ZK-less Assignment)

          For the Horizontal scalability, I think that we are going to have Multiple Master each one operating on its subsection of "meta" (and the notification system). This means that you will have concurrent assignments on different masters.
          The best case is where you can fit a full table (regions metadata) on a single master, the other case is where your table is split on multiple master which means that operation that requires to work on the full set of regions e.g. delete, disable, enable need some sort of coordination to provide the full consistency that you'll get with a full table that fits on a single master.

          Show
          Matteo Bertozzi added a comment - I think the main motivation is the colocation of all the components that are usually involved in a single "transaction". The main example of this is the Assignment, which involve: Master, META, ZooKeeper. #1 and #2 are part of a generic notification system, which will be used to propagate ACLs, Visibility, Quotas. (In theory the base of this system is also the one behind the ZK-less Assignment) For the Horizontal scalability, I think that we are going to have Multiple Master each one operating on its subsection of "meta" (and the notification system). This means that you will have concurrent assignments on different masters. The best case is where you can fit a full table (regions metadata) on a single master, the other case is where your table is split on multiple master which means that operation that requires to work on the full set of regions e.g. delete, disable, enable need some sort of coordination to provide the full consistency that you'll get with a full table that fits on a single master.
          Hide
          Francis Liu added a comment -

          Thanks for the doc. Itd be great if we could have the list of use cases we are trying to solve. So we have that motivate the design decisions.

          During the discussion and my chat with Matteo and Jimmy here is what I got:

          1. A method to guarantee security acl changes are fully propagated when a acl change is requested
          2. Same as #1 but for quota
          3. Remove master daemon to simplify deployment/ops
          4. Have a designated set of servers system tables will be hosted on. To isolate it from user region workloads.

          Feel free to add if I missed anything.

          #3 and #4 directly motivates this patch. Tho it seeems there was an agreement to still hqve designated hosts as masters?

          It seems to me #1 and #2 are use cases for a synchronous coordination frameowrk (consensus discussion). Which may or may not require system table colocation. Having fault tolerant coordination as a first class primitive is sorely missing. And I believe enable us avoid design choices which would impede horizontal scalability.

          Show
          Francis Liu added a comment - Thanks for the doc. Itd be great if we could have the list of use cases we are trying to solve. So we have that motivate the design decisions. During the discussion and my chat with Matteo and Jimmy here is what I got: 1. A method to guarantee security acl changes are fully propagated when a acl change is requested 2. Same as #1 but for quota 3. Remove master daemon to simplify deployment/ops 4. Have a designated set of servers system tables will be hosted on. To isolate it from user region workloads. Feel free to add if I missed anything. #3 and #4 directly motivates this patch. Tho it seeems there was an agreement to still hqve designated hosts as masters? It seems to me #1 and #2 are use cases for a synchronous coordination frameowrk (consensus discussion). Which may or may not require system table colocation. Having fault tolerant coordination as a first class primitive is sorely missing. And I believe enable us avoid design choices which would impede horizontal scalability.
          Hide
          Enis Soztutar added a comment -

          Thanks Jimmy for the doc. Sorry I was not in the hackaton, so missed the discussions there. A couple of questions,
          1) Will this break rolling upgrades from 0.98 -> 0.99? We want to keep 0.98 to 1.0 rolling restart support.
          2) if we are also allowing to completely disable this feature (as in the other jira), will there still be benefit for this?

          Show
          Enis Soztutar added a comment - Thanks Jimmy for the doc. Sorry I was not in the hackaton, so missed the discussions there. A couple of questions, 1) Will this break rolling upgrades from 0.98 -> 0.99? We want to keep 0.98 to 1.0 rolling restart support. 2) if we are also allowing to completely disable this feature (as in the other jira), will there still be benefit for this?
          Hide
          Jimmy Xiang added a comment -

          Added a doc to clarify some about master/rs and deployment impact related to this issue.

          Show
          Jimmy Xiang added a comment - Added a doc to clarify some about master/rs and deployment impact related to this issue.
          Hide
          Andrew Purtell added a comment -

          Thanks. I can export the doc + comments as DOCX and attach when the discussion there is done.

          Show
          Andrew Purtell added a comment - Thanks. I can export the doc + comments as DOCX and attach when the discussion there is done.
          Hide
          stack added a comment -

          I attached the file. Comments are missing. Need to go to Google Doc for that or I can copy them here.

          Show
          stack added a comment - I attached the file. Comments are missing. Need to go to Google Doc for that or I can copy them here.
          Hide
          Andrew Purtell added a comment -

          Please capture the Google document and its comments on this JIRA, all discussions should eventually appear on ASF resources. Thanks!

          Show
          Andrew Purtell added a comment - Please capture the Google document and its comments on this JIRA, all discussions should eventually appear on ASF resources. Thanks!
          Hide
          Francis Liu added a comment -

          Thanks for the writeup. Left some comments, mainly concerned about splittability, as it seems the main draw of this approach is local writes to the meta which seems to be at odds with meta splittability?

          Show
          Francis Liu added a comment - Thanks for the writeup. Left some comments, mainly concerned about splittability, as it seems the main draw of this approach is local writes to the meta which seems to be at odds with meta splittability?
          Hide
          Lars Hofhansl added a comment -

          Thanks for writing this up. If the active HMaster only holds META that should be fine, it'd be moved/recovered quickly. Local writes are great, no cross machine coordination needed.

          Show
          Lars Hofhansl added a comment - Thanks for writing this up. If the active HMaster only holds META that should be fine, it'd be moved/recovered quickly. Local writes are great, no cross machine coordination needed.
          Hide
          stack added a comment -

          So, chatted with Jimmy and Matteo and intend doubling-down on this issues' original description; i.e. colocating meta and master; no meta if no master and vice versa so ignore my suggestions we change the subject of this issue.

          See one-pager here for more argument why we think this the way to go. Includes answers for above concerns: https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.99evfnn62059

          Show
          stack added a comment - So, chatted with Jimmy and Matteo and intend doubling-down on this issues' original description; i.e. colocating meta and master; no meta if no master and vice versa so ignore my suggestions we change the subject of this issue. See one-pager here for more argument why we think this the way to go. Includes answers for above concerns: https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.99evfnn62059
          Hide
          Sergey Shelukhin added a comment -

          We can have split meta in this case with distributed master?

          Show
          Sergey Shelukhin added a comment - We can have split meta in this case with distributed master?
          Hide
          stack added a comment -

          Is this issue misnamed? Should it be "One process to do both regionserver and master duties" or some such?

          Colocating master and meta has gotten push back both by Francis and Lars just recently but also earlier up in rb (https://reviews.apache.org/r/19198/ – see tail of first review comment).

          Yes, we need to be able to split meta....

          Show
          stack added a comment - Is this issue misnamed? Should it be "One process to do both regionserver and master duties" or some such? Colocating master and meta has gotten push back both by Francis and Lars just recently but also earlier up in rb ( https://reviews.apache.org/r/19198/ – see tail of first review comment). Yes, we need to be able to split meta....
          Hide
          Jimmy Xiang added a comment -

          I filed HBASE-10923 to make it configurable as to where to assign the meta region.

          Show
          Jimmy Xiang added a comment - I filed HBASE-10923 to make it configurable as to where to assign the meta region.
          Hide
          Jimmy Xiang added a comment -

          Meta regions of course can be assigned to other region servers too. As to Lars' concern, I was thinking to make it a load balancer decision about where to put meta regions. So it can be changed easily.

          Show
          Jimmy Xiang added a comment - Meta regions of course can be assigned to other region servers too. As to Lars' concern, I was thinking to make it a load balancer decision about where to put meta regions. So it can be changed easily.
          Hide
          Lars Hofhansl added a comment -

          I share the concern. Part of the benefit of splitting META is that we server portions of it in different RegionServers.
          Also, seeing the other fallouts from this (eager assignment of Region to the master, moving META during a HMaster failover - which was lightweight before)... I am no longer sure that this is a good idea at all.

          Show
          Lars Hofhansl added a comment - I share the concern. Part of the benefit of splitting META is that we server portions of it in different RegionServers. Also, seeing the other fallouts from this (eager assignment of Region to the master, moving META during a HMaster failover - which was lightweight before)... I am no longer sure that this is a good idea at all.
          Hide
          Francis Liu added a comment -

          Sorry just to clarify, the regions also have to be servable by different servers this way we have a much stronger case for horizontal scalability.

          Show
          Francis Liu added a comment - Sorry just to clarify, the regions also have to be servable by different servers this way we have a much stronger case for horizontal scalability.
          Hide
          Jimmy Xiang added a comment -

          We can still split the meta. A master can host several regions, right?

          Show
          Jimmy Xiang added a comment - We can still split the meta. A master can host several regions, right?
          Hide
          Francis Liu added a comment -

          Sorry late seeing this patch. Would this patch prevent us from ever splitting the meta in the future? HDFS is moving towards separating namenode responsibilities to have better scalability, just concerned we are doing the reverse (consolidation) and will make scaling harder to do down the road?

          Show
          Francis Liu added a comment - Sorry late seeing this patch. Would this patch prevent us from ever splitting the meta in the future? HDFS is moving towards separating namenode responsibilities to have better scalability, just concerned we are doing the reverse (consolidation) and will make scaling harder to do down the road?
          Hide
          Jimmy Xiang added a comment -

          So when we failover the master we'll force a move of meta. Won't that lead to bigger "blib" than before.

          Namespace and ACL are small tables cached in ZK/RS. For META, yes, it's a bigger "blib". If we remove ZK from the assignment in the future, when the master is failing over, nobody can update the meta. In this case, the meta info cached in the client side won't change.

          Seems like this will wreak havoc to data locality.

          In HBASE-10815, we will introduce a configuration so that we can exclude backup masters from serving regions. That should help.

          Show
          Jimmy Xiang added a comment - So when we failover the master we'll force a move of meta. Won't that lead to bigger "blib" than before. Namespace and ACL are small tables cached in ZK/RS. For META, yes, it's a bigger "blib". If we remove ZK from the assignment in the future, when the master is failing over, nobody can update the meta. In this case, the meta info cached in the client side won't change. Seems like this will wreak havoc to data locality. In HBASE-10815 , we will introduce a configuration so that we can exclude backup masters from serving regions. That should help.
          Hide
          Lars Hofhansl added a comment -

          Just noticed this issue now.

          Active master serves table META, namespace, and ACL in a secure installation.

          So when we failover the master we'll force a move of meta. Won't that lead to bigger "blib" than before.

          Backup masters are regionservers too. They serve regions, while trying to be the next active master. Once a backup master becomes the active one, it will serve META and namespace table. Load balancer will move user regions to other regionservers.

          Seems like this will wreak havoc to data locality.

          Master failover used to be a light weight event. I fear this makes it much more heavy weight.

          Show
          Lars Hofhansl added a comment - Just noticed this issue now. Active master serves table META, namespace, and ACL in a secure installation. So when we failover the master we'll force a move of meta. Won't that lead to bigger "blib" than before. Backup masters are regionservers too. They serve regions, while trying to be the next active master. Once a backup master becomes the active one, it will serve META and namespace table. Load balancer will move user regions to other regionservers. Seems like this will wreak havoc to data locality. Master failover used to be a light weight event. I fear this makes it much more heavy weight.
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #5044 (See https://builds.apache.org/job/HBase-TRUNK/5044/)
          HBASE-10840 Fix findbug warn induced by HBASE-10569.(Anoop) (anoopsamjohn: rev 1582242)

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5044 (See https://builds.apache.org/job/HBase-TRUNK/5044/ ) HBASE-10840 Fix findbug warn induced by HBASE-10569 .(Anoop) (anoopsamjohn: rev 1582242) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
          Hide
          Enis Soztutar added a comment -

          Did minor edit to release notes. Nice work Jimmy!

          Show
          Enis Soztutar added a comment - Did minor edit to release notes. Nice work Jimmy!
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #5040 (See https://builds.apache.org/job/HBase-TRUNK/5040/)
          HBASE-10569 Co-locate meta and master - ADDENDUM (jxiang: rev 1581513)

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
            HBASE-10569 Co-locate meta and master (jxiang: rev 1581479)
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionAdapter.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java
          • /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/DistributedHBaseCluster.java
          • /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java
          • /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartRsHoldingMetaAction.java
          • /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/BackupMasterStatusTmpl.jamon
          • /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon
          • /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon
          • /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/CoprocessorHConnection.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServerInterface.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterStatusServlet.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MetricsMasterWrapperImpl.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/ClusterLoadState.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/DisableTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ModifyTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/monitoring/StateDumpServlet.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/AnnotationReadingPriorityFunction.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSDumpServlet.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RpcSchedulerFactory.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SimpleRpcSchedulerFactory.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
          • /hbase/trunk/hbase-server/src/main/resources/hbase-webapps/master/zk.jsp
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/MockRegionServerServices.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestNamespace.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestHTableWrapper.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/Mocking.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterMetrics.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterMetricsWrapper.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterStatusServlet.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRollingRestart.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestTableLockManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestZKBasedOpenCloseRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestSnapshotFromMaster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/OOMERegionServer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestClusterId.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPriorityRpc.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQosFunction.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenInitializing.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSStatusServlet.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerNoMaster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogFiltering.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestNamespaceCommands.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/SnapshotTestingUtils.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #5040 (See https://builds.apache.org/job/HBase-TRUNK/5040/ ) HBASE-10569 Co-locate meta and master - ADDENDUM (jxiang: rev 1581513) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java HBASE-10569 Co-locate meta and master (jxiang: rev 1581479) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionAdapter.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/DistributedHBaseCluster.java /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartRsHoldingMetaAction.java /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/BackupMasterStatusTmpl.jamon /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/CoprocessorHConnection.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServerInterface.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterStatusServlet.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MetricsMasterWrapperImpl.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/ClusterLoadState.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/DisableTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ModifyTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/monitoring/StateDumpServlet.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/AnnotationReadingPriorityFunction.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSDumpServlet.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RpcSchedulerFactory.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SimpleRpcSchedulerFactory.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java /hbase/trunk/hbase-server/src/main/resources/hbase-webapps/master/zk.jsp /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/MockRegionServerServices.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestNamespace.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestHTableWrapper.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/Mocking.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterMetrics.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterMetricsWrapper.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterStatusServlet.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRollingRestart.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestTableLockManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestZKBasedOpenCloseRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestSnapshotFromMaster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/OOMERegionServer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestClusterId.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPriorityRpc.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQosFunction.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenInitializing.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSStatusServlet.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerNoMaster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogFiltering.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestNamespaceCommands.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/SnapshotTestingUtils.java
          Hide
          Jimmy Xiang added a comment -

          Integrated into trunk. Thanks Stack a lot for the review.

          Show
          Jimmy Xiang added a comment - Integrated into trunk. Thanks Stack a lot for the review.
          Hide
          stack added a comment -

          I'm +1 on this going into trunk. Jimmy is working on ensuring rolling restart works in other issues. Any objections? This thing needs a fat release note Jimmy.

          Show
          stack added a comment - I'm +1 on this going into trunk. Jimmy is working on ensuring rolling restart works in other issues. Any objections? This thing needs a fat release note Jimmy.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12636254/hbase-10569_v3.1.patch
          against trunk revision .
          ATTACHMENT ID: 12636254

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 183 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces the following lines longer than 100:
          + t.getMessage().contains("org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction")) {
          + for(HTableDescriptor htd: master.listTableDescriptorsByNamespace(request.getNamespaceName())) {
          + Threads.setDaemonThreadRunning(this.healthCheckChore.getThread(), getName() + ".healthChecker",
          + Threads.setDaemonThreadRunning(this.nonceManagerChore.getThread(), getName() + ".nonceCleaner",
          + final RegionActionResult.Builder builder, List<CellScannable> cellsToReturn, long nonceGroup) {
          + boolean closed = regionServer.closeRegion(encodedRegionName, false, zk, versionOfClosingNode, sn);
          + Boolean closing = regionServer.regionsInTransitionInRS.get(region.getEncodedNameAsBytes());
          + regionServer.nonceManager.reportOperationFromWal(nonceGroup, nonce, entry.getKey().getWriteTime());
          + MasterCoprocessorHost cpHost = util.getMiniHBaseCluster().getMaster().getMasterCoprocessorHost();
          + for (HRegionInfo r : ProtobufUtil.getOnlineRegions(t.getRegionServer().getRSRpcServices())) {

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636254/hbase-10569_v3.1.patch against trunk revision . ATTACHMENT ID: 12636254 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 183 new or modified tests. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces the following lines longer than 100: + t.getMessage().contains("org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction")) { + for(HTableDescriptor htd: master.listTableDescriptorsByNamespace(request.getNamespaceName())) { + Threads.setDaemonThreadRunning(this.healthCheckChore.getThread(), getName() + ".healthChecker", + Threads.setDaemonThreadRunning(this.nonceManagerChore.getThread(), getName() + ".nonceCleaner", + final RegionActionResult.Builder builder, List<CellScannable> cellsToReturn, long nonceGroup) { + boolean closed = regionServer.closeRegion(encodedRegionName, false, zk, versionOfClosingNode, sn); + Boolean closing = regionServer.regionsInTransitionInRS.get(region.getEncodedNameAsBytes()); + regionServer.nonceManager.reportOperationFromWal(nonceGroup, nonce, entry.getKey().getWriteTime()); + MasterCoprocessorHost cpHost = util.getMiniHBaseCluster().getMaster().getMasterCoprocessorHost(); + for (HRegionInfo r : ProtobufUtil.getOnlineRegions(t.getRegionServer().getRSRpcServices())) { +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9077//console This message is automatically generated.
          Hide
          Jimmy Xiang added a comment -

          Attached v3.1, rebased to trunk latest.

          Show
          Jimmy Xiang added a comment - Attached v3.1, rebased to trunk latest.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12636132/hbase-10569_v3.patch
          against trunk revision .
          ATTACHMENT ID: 12636132

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 183 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9073//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636132/hbase-10569_v3.patch against trunk revision . ATTACHMENT ID: 12636132 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 183 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9073//console This message is automatically generated.
          Hide
          Jimmy Xiang added a comment -

          In patch v3, I reverted the change to MasterServices. ActiveMasterManager doesn't have a thread now. The thread creation is moved to HMaster. Reverted the change in master web UI about the master info port so that it will be easier to support rolling restart.

          Show
          Jimmy Xiang added a comment - In patch v3, I reverted the change to MasterServices. ActiveMasterManager doesn't have a thread now. The thread creation is moved to HMaster. Reverted the change in master web UI about the master info port so that it will be easier to support rolling restart.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12635924/hbase-10569_v2.patch
          against trunk revision .
          ATTACHMENT ID: 12635924

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 183 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 8 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces the following lines longer than 100:
          + t.getMessage().contains("org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction")) {
          + for(HTableDescriptor htd: master.listTableDescriptorsByNamespace(request.getNamespaceName())) {
          + Threads.setDaemonThreadRunning(this.healthCheckChore.getThread(), getName() + ".healthChecker",
          + Threads.setDaemonThreadRunning(this.nonceManagerChore.getThread(), getName() + ".nonceCleaner",
          + final RegionActionResult.Builder builder, List<CellScannable> cellsToReturn, long nonceGroup) {
          + boolean closed = regionServer.closeRegion(encodedRegionName, false, zk, versionOfClosingNode, sn);
          + Boolean closing = regionServer.regionsInTransitionInRS.get(region.getEncodedNameAsBytes());
          + regionServer.nonceManager.reportOperationFromWal(nonceGroup, nonce, entry.getKey().getWriteTime());
          + MasterCoprocessorHost cpHost = util.getMiniHBaseCluster().getMaster().getMasterCoprocessorHost();
          + for (HRegionInfo r : ProtobufUtil.getOnlineRegions(t.getRegionServer().getRSRpcServices())) {

          +1 site. The mvn site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster

          -1 core zombie tests. There are 1 zombie test(s): at org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:368)

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635924/hbase-10569_v2.patch against trunk revision . ATTACHMENT ID: 12635924 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 183 new or modified tests. -1 javadoc . The javadoc tool appears to have generated 8 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces the following lines longer than 100: + t.getMessage().contains("org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction")) { + for(HTableDescriptor htd: master.listTableDescriptorsByNamespace(request.getNamespaceName())) { + Threads.setDaemonThreadRunning(this.healthCheckChore.getThread(), getName() + ".healthChecker", + Threads.setDaemonThreadRunning(this.nonceManagerChore.getThread(), getName() + ".nonceCleaner", + final RegionActionResult.Builder builder, List<CellScannable> cellsToReturn, long nonceGroup) { + boolean closed = regionServer.closeRegion(encodedRegionName, false, zk, versionOfClosingNode, sn); + Boolean closing = regionServer.regionsInTransitionInRS.get(region.getEncodedNameAsBytes()); + regionServer.nonceManager.reportOperationFromWal(nonceGroup, nonce, entry.getKey().getWriteTime()); + MasterCoprocessorHost cpHost = util.getMiniHBaseCluster().getMaster().getMasterCoprocessorHost(); + for (HRegionInfo r : ProtobufUtil.getOnlineRegions(t.getRegionServer().getRSRpcServices())) { +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster -1 core zombie tests . There are 1 zombie test(s): at org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:368) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9056//console This message is automatically generated.
          Hide
          stack added a comment -

          Jimmy Xiang ACK'ing our comments. Sounds good to me.

          Show
          stack added a comment - Jimmy Xiang ACK'ing our comments. Sounds good to me.
          Hide
          Jimmy Xiang added a comment -

          Thanks for reviewing it.

          Make it symmetrical? s/getMetrics/getRegionServerMetrics/

          Sure, will do it.

          Why expose these? Were they exposed before?

          The reason is that HRegionServer doesn't implement the regionserver RPC interface any more. To access the RPC functionalities directly, we need to expose it. Before that, HRegionServer implements the RPC interface, so there is no such issue.

          When I ask the master for its rpc engine, will it be the same as the regionservers?

          Yes, it is the same. Actually, here we are not trying to get the RPC engine. We are just trying to get the RPC functions bypass the RPC.

          add it to the Server Interface?

          Probably not apply. Perhaps the method name is confusing. I was meant to get the RPC interface.

          Do you mean RpcServiceInterface or RpcServerInterface?

          You are right, the RpcServerInterface.

          This was a bad idea in the first place? Or rather, it improved our 'usability' when a single Master only in that Master would 'come back to life'....but if backup Masters, it was racing the backup Master? (IIRC).

          I think it is a good idea. But we are going to move away from ZK for leader selection. So I didn't fix this part.

          Show
          Jimmy Xiang added a comment - Thanks for reviewing it. Make it symmetrical? s/getMetrics/getRegionServerMetrics/ Sure, will do it. Why expose these? Were they exposed before? The reason is that HRegionServer doesn't implement the regionserver RPC interface any more. To access the RPC functionalities directly, we need to expose it. Before that, HRegionServer implements the RPC interface, so there is no such issue. When I ask the master for its rpc engine, will it be the same as the regionservers? Yes, it is the same. Actually, here we are not trying to get the RPC engine. We are just trying to get the RPC functions bypass the RPC. add it to the Server Interface? Probably not apply. Perhaps the method name is confusing. I was meant to get the RPC interface. Do you mean RpcServiceInterface or RpcServerInterface? You are right, the RpcServerInterface. This was a bad idea in the first place? Or rather, it improved our 'usability' when a single Master only in that Master would 'come back to life'....but if backup Masters, it was racing the backup Master? (IIRC). I think it is a good idea. But we are going to move away from ZK for leader selection. So I didn't fix this part.
          Hide
          stack added a comment -

          # Due to 2, HMaster#getMetrics is renamed to getMasterMetrics to avoid naming conflict with HRegionServer#getMetrics. The same has been done to HMaster#getCoprocessors, #getCoprocessorHost.

          Make it symmetrical? s/getMetrics/getRegionServerMetrics/

          Added HRegionServer#getRpcServices and HMaster#getMasterRpcServices to expose the RPC functionalities.

          Why expose these? Were they exposed before? When I ask the master for its rpc engine, will it be the same as the regionservers? (It sounds like they will be the same going by #6). If so, should the method be getRpcServices whether on Master or on HRegionServrer (add it to the Server Interface?).

          Do you mean RpcServiceInterface or RpcServerInterface? If the latter, change it all you want. What is there currently is a bit of a mess.

          Master recovery in case of ZK connection loss is removed since it doesn’t recover listeners added in HRegionServer.

          This was a bad idea in the first place? Or rather, it improved our 'usability' when a single Master only in that Master would 'come back to life'....but if backup Masters, it was racing the backup Master? (IIRC).

          Let me look at the patch.

          This is great Jimmy.

          Show
          stack added a comment - # Due to 2, HMaster#getMetrics is renamed to getMasterMetrics to avoid naming conflict with HRegionServer#getMetrics. The same has been done to HMaster#getCoprocessors, #getCoprocessorHost. Make it symmetrical? s/getMetrics/getRegionServerMetrics/ Added HRegionServer#getRpcServices and HMaster#getMasterRpcServices to expose the RPC functionalities. Why expose these? Were they exposed before? When I ask the master for its rpc engine, will it be the same as the regionservers? (It sounds like they will be the same going by #6). If so, should the method be getRpcServices whether on Master or on HRegionServrer (add it to the Server Interface?). Do you mean RpcServiceInterface or RpcServerInterface? If the latter, change it all you want. What is there currently is a bit of a mess. Master recovery in case of ZK connection loss is removed since it doesn’t recover listeners added in HRegionServer. This was a bad idea in the first place? Or rather, it improved our 'usability' when a single Master only in that Master would 'come back to life'....but if backup Masters, it was racing the backup Master? (IIRC). Let me look at the patch. This is great Jimmy.
          Hide
          Jimmy Xiang added a comment -

          I see. That's a good idea. Thanks.

          Show
          Jimmy Xiang added a comment - I see. That's a good idea. Thanks.
          Hide
          Enis Soztutar added a comment -

          Ted has a fix for it in HBASE-10691, but I think it is good to keep the hadoop1 compile around for some more time, since we are still backporting bug fixes etc to 0.98 and 0.96. Maybe we can make it so that the pre-commit test will continue, and just report that the compilation failed as an FYI.

          Show
          Enis Soztutar added a comment - Ted has a fix for it in HBASE-10691 , but I think it is good to keep the hadoop1 compile around for some more time, since we are still backporting bug fixes etc to 0.98 and 0.96. Maybe we can make it so that the pre-commit test will continue, and just report that the compilation failed as an FYI.
          Hide
          Jimmy Xiang added a comment -
          Show
          Jimmy Xiang added a comment - We are not going to support hadoop 1.0 any more, right? http://search-hadoop.com/m/DHED4OxD4C/Next+releases+of+HBase+will+drop+Hadoop-1.x+support&subj=+ANNOUNCE+Next+releases+of+HBase+will+drop+Hadoop+1+x+support Can we fix the pre-commit test?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12634484/hbase-10569_v1.patch
          against trunk revision .
          ATTACHMENT ID: 12634484

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 176 new or modified tests.

          -1 hadoop1.0. The patch failed to compile against the hadoop 1.0 profile.
          Here is snippet of errors:

          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hbase-server: Compilation failure
          [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java:[87,24] cannot find symbol
          [ERROR] symbol  : method setMiniClusterMode(boolean)
          [ERROR] location: class org.apache.hadoop.metrics2.lib.DefaultMetricsSystem
          [ERROR] -> [Help 1]
          org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hbase-server: Compilation failure
          /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java:[87,24] cannot find symbol
          symbol  : method setMiniClusterMode(boolean)
          location: class org.apache.hadoop.metrics2.lib.DefaultMetricsSystem
          
          	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
          --
          Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation failure
          /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java:[87,24] cannot find symbol
          symbol  : method setMiniClusterMode(boolean)
          location: class org.apache.hadoop.metrics2.lib.DefaultMetricsSystem
          
          	at org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:729)

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8974//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634484/hbase-10569_v1.patch against trunk revision . ATTACHMENT ID: 12634484 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 176 new or modified tests. -1 hadoop1.0 . The patch failed to compile against the hadoop 1.0 profile. Here is snippet of errors: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile ( default -testCompile) on project hbase-server: Compilation failure [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java:[87,24] cannot find symbol [ERROR] symbol : method setMiniClusterMode( boolean ) [ERROR] location: class org.apache.hadoop.metrics2.lib.DefaultMetricsSystem [ERROR] -> [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile ( default -testCompile) on project hbase-server: Compilation failure /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java:[87,24] cannot find symbol symbol : method setMiniClusterMode( boolean ) location: class org.apache.hadoop.metrics2.lib.DefaultMetricsSystem at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213) -- Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation failure /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java:[87,24] cannot find symbol symbol : method setMiniClusterMode( boolean ) location: class org.apache.hadoop.metrics2.lib.DefaultMetricsSystem at org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:729) Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8974//console This message is automatically generated.
          Hide
          Jimmy Xiang added a comment -

          Patch v1 is on RB now: https://reviews.apache.org/r/19198/

          Show
          Jimmy Xiang added a comment - Patch v1 is on RB now: https://reviews.apache.org/r/19198/
          Hide
          Jimmy Xiang added a comment -

          Yes, we can put them on master too.

          Show
          Jimmy Xiang added a comment - Yes, we can put them on master too.
          Hide
          Ted Yu added a comment -

          the active master holds just the meta and the namespace regions

          What about ACL and visibility regions ?

          Show
          Ted Yu added a comment - the active master holds just the meta and the namespace regions What about ACL and visibility regions ?
          Hide
          Jimmy Xiang added a comment -

          Fixed. Thanks.

          Can you elaborate a bit more about what 'moves users regions away' means ?

          Since a backup master is also a region server, it could hold many user regions. After it becomes the active master, we try to move these user regions to other region servers so that the active master holds just the meta and the namespace regions. The purpose is to reduce the load on the active master. We could add other regions to the active master later on.

          Show
          Jimmy Xiang added a comment - Fixed. Thanks. Can you elaborate a bit more about what 'moves users regions away' means ? Since a backup master is also a region server, it could hold many user regions. After it becomes the active master, we try to move these user regions to other region servers so that the active master holds just the meta and the namespace regions. The purpose is to reduce the load on the active master. We could add other regions to the active master later on.
          Hide
          Ted Yu added a comment -

          Backup master moves users regions away (and meta/namespace region to the master if already assigned somewhere else) after becoming active.

          Can you elaborate a bit more about what 'moves users regions away' means ?

          nit: items are under bullets but referenced by B, C, etc
          It would be easier to read if items are labeled alphabetically.

          Show
          Ted Yu added a comment - Backup master moves users regions away (and meta/namespace region to the master if already assigned somewhere else) after becoming active. Can you elaborate a bit more about what 'moves users regions away' means ? nit: items are under bullets but referenced by B, C, etc It would be easier to read if items are labeled alphabetically.
          Hide
          Jimmy Xiang added a comment -

          The patch contains several bug fixes. I will create separate issues (already created some actually) so that I can push the fixes to 0.96 and 0.98.

          Show
          Jimmy Xiang added a comment - The patch contains several bug fixes. I will create separate issues (already created some actually) so that I can push the fixes to 0.96 and 0.98.
          Hide
          Jimmy Xiang added a comment - - edited

          Attached a patch that passed unit tests, integration tests (including ITBLL), and some live cluster tests. Will put it on RB soon when RB is up.

          Here is what I have done in this patch:

          1. Moved RPC related code out of HRegionServer and HMaster so that they are smaller for easier change/maintenance.
          2. Make HMaster extends HRegionServer so that HMaster is also a HRegionServer, removed duplicate code/parameters.
          3. Due to 2, HMaster#getMetrics is renamed to getMasterMetrics to avoid naming conflict with HRegionServer#getMetrics. The same has been done to HMaster#getCoprocessors, #getCoprocessorHost.
          4. Added HRegionServer#getRpcServices and HMaster#getMasterRpcServices to expose the RPC functionalities.
          5. Changed references related to 3 and 4 (a lot, especially in tests).
          6. HMaster and HRegionServer share one RPC server and one InfoServer.
          7. RpcServiceInterface is changed a little. Method #startThreads and #openServer are removed since backup master doesn’t hold the RPC server any more. A parameter HMaster#serviceStarted is introduced to indicate if a master is active so as ServerNotRunningYetException can be thrown before a master is active.
          8. Master recovery in case of ZK connection loss is removed since it doesn’t recover listeners added in HRegionServer. We can get this feature back if needed. The other reason I didn’t try to get it back is because we are going to use raft to choose active master instead of relying on ZK.
          9. HRegionServer on the active HMaster communicates with the active HMaster directly instead of going through the RPC. Shortcut helps.
          10. Master(active/backup) web UI contains info about the corresponding region server.
          11. Backup master moves users regions away (and meta/namespace region to the master if already assigned somewhere else) after becoming active.
          12. Integration testing doesn’t restart the master as a region server, or restart the region server that holds the meta. One reason is because the startup script can’t tell if a region server should be master.

          Here is a list of things to be done (in separate issues):

          1. Need to make sure the master listens to the old ports (RPC + webUI) too, so as to support rolling upgrade from old versions (0.96+), and be backward compatible.
          2. Need to consolidate chores/threads/handlers in master/regionserver, so that the active master manager in the backup master has a high priority so that it can grab the ZK node faster, before we move to raft.
          3. Clean up MetaServerShutdownHandler and HMaster#assignMeta in next major release when rolling upgrade is not an issue any more. This should be done much later.
          Show
          Jimmy Xiang added a comment - - edited Attached a patch that passed unit tests, integration tests (including ITBLL), and some live cluster tests. Will put it on RB soon when RB is up. Here is what I have done in this patch: Moved RPC related code out of HRegionServer and HMaster so that they are smaller for easier change/maintenance. Make HMaster extends HRegionServer so that HMaster is also a HRegionServer, removed duplicate code/parameters. Due to 2, HMaster#getMetrics is renamed to getMasterMetrics to avoid naming conflict with HRegionServer#getMetrics. The same has been done to HMaster#getCoprocessors, #getCoprocessorHost. Added HRegionServer#getRpcServices and HMaster#getMasterRpcServices to expose the RPC functionalities. Changed references related to 3 and 4 (a lot, especially in tests). HMaster and HRegionServer share one RPC server and one InfoServer. RpcServiceInterface is changed a little. Method #startThreads and #openServer are removed since backup master doesn’t hold the RPC server any more. A parameter HMaster#serviceStarted is introduced to indicate if a master is active so as ServerNotRunningYetException can be thrown before a master is active. Master recovery in case of ZK connection loss is removed since it doesn’t recover listeners added in HRegionServer. We can get this feature back if needed. The other reason I didn’t try to get it back is because we are going to use raft to choose active master instead of relying on ZK. HRegionServer on the active HMaster communicates with the active HMaster directly instead of going through the RPC. Shortcut helps. Master(active/backup) web UI contains info about the corresponding region server. Backup master moves users regions away (and meta/namespace region to the master if already assigned somewhere else) after becoming active. Integration testing doesn’t restart the master as a region server, or restart the region server that holds the meta. One reason is because the startup script can’t tell if a region server should be master. Here is a list of things to be done (in separate issues): Need to make sure the master listens to the old ports (RPC + webUI) too, so as to support rolling upgrade from old versions (0.96+), and be backward compatible. Need to consolidate chores/threads/handlers in master/regionserver, so that the active master manager in the backup master has a high priority so that it can grab the ZK node faster, before we move to raft. Clean up MetaServerShutdownHandler and HMaster#assignMeta in next major release when rolling upgrade is not an issue any more. This should be done much later.
          Hide
          Jimmy Xiang added a comment -

          If co-locating, are you thinking of opening the region (and rest of RS machinery in master), or move some of the assignment logic to the RS who hosts meta?

          The first one. I was thinking to make a master also a region server. So that all are region servers. By configuration, some region servers can host the master processes.

          Show
          Jimmy Xiang added a comment - If co-locating, are you thinking of opening the region (and rest of RS machinery in master), or move some of the assignment logic to the RS who hosts meta? The first one. I was thinking to make a master also a region server. So that all are region servers. By configuration, some region servers can host the master processes.
          Hide
          Enis Soztutar added a comment -

          I think this would be a good incremental step towards a more sane master implementation. We can alternatively decide to not co-locate meta, but make meta read only for region servers and clients, only to be written by master. RS operations will be carried through master (RS sends RPC to master, and master does the meta update).

          If co-locating, are you thinking of opening the region (and rest of RS machinery in master), or move some of the assignment logic to the RS who hosts meta?

          Show
          Enis Soztutar added a comment - I think this would be a good incremental step towards a more sane master implementation. We can alternatively decide to not co-locate meta, but make meta read only for region servers and clients, only to be written by master. RS operations will be carried through master (RS sends RPC to master, and master does the meta update). If co-locating, are you thinking of opening the region (and rest of RS machinery in master), or move some of the assignment logic to the RS who hosts meta?
          Hide
          Elliott Clark added a comment -

          +1

          Show
          Elliott Clark added a comment - +1

            People

            • Assignee:
              Jimmy Xiang
              Reporter:
              Jimmy Xiang
            • Votes:
              0 Vote for this issue
              Watchers:
              33 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development