Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: HBASE-12259
    • Component/s: Consensus, wal
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This is the first patch for the HydraBase consensus protocol implemented according to the Raft consensus protocol (https://ramcloud.stanford.edu/raft.pdf), as advertised in (HBASE-12259)

      1. 0001-HydraBase-consensus-protocol.patch
        1.39 MB
        Gaurav Menghani
      2. 0002-HydraBase-consensus-protocol.patch
        1.39 MB
        Gaurav Menghani
      3. RaftProtocolImplementionDoc.pdf
        980 kB
        Rishit Shroff

        Activity

        Hide
        gaurav.menghani Gaurav Menghani added a comment -

        This patch applies cleanly on top of

        commit bbd681541428d3983a9c4e114b1ee479ab22f020
        Author: Misty Stanley-Jones <mstanleyjones@cloudera.com>
        Date: Mon Nov 3 14:28:31 2014 +1000

        HBASE-12409 Add actual tunable parameters to regions per RS calculations

        Show
        gaurav.menghani Gaurav Menghani added a comment - This patch applies cleanly on top of commit bbd681541428d3983a9c4e114b1ee479ab22f020 Author: Misty Stanley-Jones <mstanleyjones@cloudera.com> Date: Mon Nov 3 14:28:31 2014 +1000 HBASE-12409 Add actual tunable parameters to regions per RS calculations
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        This is a big patch.

        Can you put it on reviewboard or Phabricator ?

        Show
        yuzhihong@gmail.com Ted Yu added a comment - This is a big patch. Can you put it on reviewboard or Phabricator ?
        Hide
        gaurav.menghani Gaurav Menghani added a comment -
        Show
        gaurav.menghani Gaurav Menghani added a comment - Here it is: https://reviews.facebook.net/D28941
        Hide
        enis Enis Soztutar added a comment -

        This patch contains all of the client level code + RAFT from what I see. Can we break it down to the RAFT protocol itself, then incrementally bring in the client level changes. As it is, it is duplicating a lot of code.

        Show
        enis Enis Soztutar added a comment - This patch contains all of the client level code + RAFT from what I see. Can we break it down to the RAFT protocol itself, then incrementally bring in the client level changes. As it is, it is duplicating a lot of code.
        Hide
        gaurav.menghani Gaurav Menghani added a comment -

        Enis Soztutar I assume the client level code you are referring to is, QuorumThriftClient etc. That code is required for the protocol to work, otherwise the Leader node would not receive edits. What is the code duplication you are referring to?

        Show
        gaurav.menghani Gaurav Menghani added a comment - Enis Soztutar I assume the client level code you are referring to is, QuorumThriftClient etc. That code is required for the protocol to work, otherwise the Leader node would not receive edits. What is the code duplication you are referring to?
        Hide
        rshroff Rishit Shroff added a comment -

        If we don't have the QuorumThriftClient, then we cannot send transactions to the protocol. So, the unit tests will need QuorumThriftClient to actually test the protocol.

        Show
        rshroff Rishit Shroff added a comment - If we don't have the QuorumThriftClient, then we cannot send transactions to the protocol. So, the unit tests will need QuorumThriftClient to actually test the protocol.
        Hide
        enis Enis Soztutar added a comment -

        I was mentioning the HConstants, HRegionInfo, etc. All the classes that are already in hbase.

        Show
        enis Enis Soztutar added a comment - I was mentioning the HConstants, HRegionInfo, etc. All the classes that are already in hbase.
        Hide
        gaurav.menghani Gaurav Menghani added a comment -

        Since we moved the code from the HydraBase branch based off 0.89-fb, we had to copy a couple of classes (about 8) to avoid the code from breaking when we copy to trunk. These will be removed in the very new future (after we spend time refactoring), so feel free to ignore them.

        Show
        gaurav.menghani Gaurav Menghani added a comment - Since we moved the code from the HydraBase branch based off 0.89-fb, we had to copy a couple of classes (about 8) to avoid the code from breaking when we copy to trunk. These will be removed in the very new future (after we spend time refactoring), so feel free to ignore them.
        Hide
        enis Enis Soztutar added a comment -

        Ok cool.

        Show
        enis Enis Soztutar added a comment - Ok cool.
        Hide
        busbey Sean Busbey added a comment -

        please use reviewboard instead of phabricator. phabricator requires an account to view the review and we shouldn't require ASF contributors to get non-ASF accounts to participate.

        Show
        busbey Sean Busbey added a comment - please use reviewboard instead of phabricator. phabricator requires an account to view the review and we shouldn't require ASF contributors to get non-ASF accounts to participate.
        Hide
        eclark Elliott Clark added a comment -

        Sean Busbey Phabricator's been an acceptable way for patches to be posted on the HBase project for quite a while. I don't think that we need to be perscriptive on what tools to use. In addition the patch is posted here, so if people don't want to sign up they don't have to.

        Per the discussion in HBASE-12259 and the mailing list and in person I'm +1 committing this to a separate branch so that people can start playing with it, without jeopardizing trunk.

        Show
        eclark Elliott Clark added a comment - Sean Busbey Phabricator's been an acceptable way for patches to be posted on the HBase project for quite a while. I don't think that we need to be perscriptive on what tools to use. In addition the patch is posted here, so if people don't want to sign up they don't have to. Per the discussion in HBASE-12259 and the mailing list and in person I'm +1 committing this to a separate branch so that people can start playing with it, without jeopardizing trunk.
        Hide
        ryanobjc ryan rawson added a comment -

        If there is any choice, when committing to a feature branch, not having the '1 giant patch commit' would be nice.

        It's probably impossible to bring forward the hundreds of commits this represents, but it'd be nice to get some structure. For the rest of us looking at this code, it's going to be fairly imperative that you take the effort to structure it, outline the modules, and generally make the code self-instructing.

        Show
        ryanobjc ryan rawson added a comment - If there is any choice, when committing to a feature branch, not having the '1 giant patch commit' would be nice. It's probably impossible to bring forward the hundreds of commits this represents, but it'd be nice to get some structure. For the rest of us looking at this code, it's going to be fairly imperative that you take the effort to structure it, outline the modules, and generally make the code self-instructing.
        Hide
        eclark Elliott Clark added a comment -

        From my understanding this is the smallest module the Hydrabase guys could get up and running enough to pass some tests. Anything less and they were just putting up code that was broken.

        Show
        eclark Elliott Clark added a comment - From my understanding this is the smallest module the Hydrabase guys could get up and running enough to pass some tests. Anything less and they were just putting up code that was broken.
        Hide
        busbey Sean Busbey added a comment -

        Sean Busbey Phabricator's been an acceptable way for patches to be posted on the HBase project for quite a while. I don't think that we need to be perscriptive on what tools to use. In addition the patch is posted here, so if people don't want to sign up they don't have to.

        Thanks for the correction. I've seen phabricator a few times on other ASF projects but hadn't noticed it among the HBase stuff so far. I'll try to revive HBASE-4896.

        Show
        busbey Sean Busbey added a comment - Sean Busbey Phabricator's been an acceptable way for patches to be posted on the HBase project for quite a while. I don't think that we need to be perscriptive on what tools to use. In addition the patch is posted here, so if people don't want to sign up they don't have to. Thanks for the correction. I've seen phabricator a few times on other ASF projects but hadn't noticed it among the HBase stuff so far. I'll try to revive HBASE-4896 .
        Hide
        ndimiduk Nick Dimiduk added a comment -

        I'm with ryan rawson on this one; 1.39mb patch is impenetrable for reviewers. Even phabricator itself has a warning of the excessive size of the patch.

        Re: phabricator, I agree with Sean Busbey in principal, but given it supports both Facebook- and Github-based authentication, and ASF supports Github integration for quite a bit of other tooling, I think the barrier to use is sufficiently low.

        Show
        ndimiduk Nick Dimiduk added a comment - I'm with ryan rawson on this one; 1.39mb patch is impenetrable for reviewers. Even phabricator itself has a warning of the excessive size of the patch. Re: phabricator, I agree with Sean Busbey in principal, but given it supports both Facebook- and Github-based authentication, and ASF supports Github integration for quite a bit of other tooling, I think the barrier to use is sufficiently low.
        Hide
        stack stack added a comment -

        What about this comment?

        From my understanding this is the smallest module the Hydrabase guys could get up and running enough to pass some tests. Anything less and they were just putting up code that was broken.

        Putting up broke or incomplete contribs doesn't work because in review it becomes... "hey...this ain't going to work?" or 'Where is XYZ? Why ain't it included?"

        (We just committed a 1.4MB patch yesterday. Ten pages on RB. It took a lot of reviewing but the contributor was responsive so it was a useful exercise. It added desirable functionality so was worth the effort.)

        Suggest those interested in reviewing start reviewing. First pass might suggest how the patch might be broken up.

        Show
        stack stack added a comment - What about this comment? From my understanding this is the smallest module the Hydrabase guys could get up and running enough to pass some tests. Anything less and they were just putting up code that was broken. Putting up broke or incomplete contribs doesn't work because in review it becomes... "hey...this ain't going to work?" or 'Where is XYZ? Why ain't it included?" (We just committed a 1.4MB patch yesterday. Ten pages on RB. It took a lot of reviewing but the contributor was responsive so it was a useful exercise. It added desirable functionality so was worth the effort.) Suggest those interested in reviewing start reviewing. First pass might suggest how the patch might be broken up.
        Hide
        stack stack added a comment -

        Thanks for putting the code into a module (thanks also for posting)

        What direction do you think we should take this contrib? Toward the layout described in the hydrabase doc or are you thinking we'd recast it as an in-cluster quorum-of-regions? If the latter, would it be on all the time – a different sort of hbase – or would it be something you could enable? HBase 2.0 or HBase 1.0?

        Sorry for the big questions. No need of an answer now.... but we need to all chat.

        What should we do about swift out here in apache hbase? We'd redo it as pb and rpc calls?

        airlift is not for REST services, right? You are just using it for stats? You like it? Better than the histograms we have going on out herein apache hbase? You have a Counter, Duration, TimeDistribution, and ExpotentialDecay.. Need airlift for this or could redo doing our current metrics dependencies?

        Should apache hbase make use of jmxutils too? How you folks using it (Not reviewed the patch yet).

        Let me get you more feedback. Above are first impression. Thanks.

        Show
        stack stack added a comment - Thanks for putting the code into a module (thanks also for posting) What direction do you think we should take this contrib? Toward the layout described in the hydrabase doc or are you thinking we'd recast it as an in-cluster quorum-of-regions? If the latter, would it be on all the time – a different sort of hbase – or would it be something you could enable? HBase 2.0 or HBase 1.0? Sorry for the big questions. No need of an answer now.... but we need to all chat. What should we do about swift out here in apache hbase? We'd redo it as pb and rpc calls? airlift is not for REST services, right? You are just using it for stats? You like it? Better than the histograms we have going on out herein apache hbase? You have a Counter, Duration, TimeDistribution, and ExpotentialDecay.. Need airlift for this or could redo doing our current metrics dependencies? Should apache hbase make use of jmxutils too? How you folks using it (Not reviewed the patch yet). Let me get you more feedback. Above are first impression. Thanks.
        Hide
        rshroff Rishit Shroff added a comment -

        stack: Thanks for looking through the code and providing the inputs. Please find my responses below:

        What direction do you think we should take this contrib?Toward the layout described in the hydrabase doc or are you thinking we'd recast it as an in-cluster quorum-of-regions? If the latter, would it be on all the time – a different sort of hbase – or would it be something you could enable? HBase 2.0 or HBase 1.0?

        Rishit: The RAFT module by itself is pretty independent of in-cluster vs multi-cluster setup. It just deals with peers with ranks and replication. I think the question will be about how we want to take the integration of this protocol into HBase. IMO, we can target for in-cluster mode first and then make it applicable to cross-cluster. Also, since HBase 1.0 is a soon to be released version, I would prefer to put into into HBase 2.0. For sure, we will need to meetup and chat about this. I would prefer to have this as an opt-in option.

        What should we do about swift out here in apache hbase? We'd redo it as pb and rpc calls?

        Rishit: The swift jars used in current HBase are old, we definitely need to upgrade them and fix the changed dependencies.

        airlift is not for REST services, right? You are just using it for stats? You like it? Better than the histograms we have going on out herein apache hbase? You have a Counter, Duration, TimeDistribution, and ExpotentialDecay.. Need airlift for this or could redo doing our current metrics dependencies?

        Rishit: Yes, even in my profiling, I saw airlift as a hot-spot. I will file a JIRA to move the stats out of airlift and choose the other options (hadoop style, or Yi Deng work for histogram at the very least.

        Should apache hbase make use of jmxutils too? How you folks using it (Not reviewed the patch yet).

        Rishit: AFAIK, they are used by airlift package. I will double check.

        Thanks,
        Rishit

        Show
        rshroff Rishit Shroff added a comment - stack : Thanks for looking through the code and providing the inputs. Please find my responses below: What direction do you think we should take this contrib?Toward the layout described in the hydrabase doc or are you thinking we'd recast it as an in-cluster quorum-of-regions? If the latter, would it be on all the time – a different sort of hbase – or would it be something you could enable? HBase 2.0 or HBase 1.0? Rishit: The RAFT module by itself is pretty independent of in-cluster vs multi-cluster setup. It just deals with peers with ranks and replication. I think the question will be about how we want to take the integration of this protocol into HBase. IMO, we can target for in-cluster mode first and then make it applicable to cross-cluster. Also, since HBase 1.0 is a soon to be released version, I would prefer to put into into HBase 2.0. For sure, we will need to meetup and chat about this. I would prefer to have this as an opt-in option. What should we do about swift out here in apache hbase? We'd redo it as pb and rpc calls? Rishit: The swift jars used in current HBase are old, we definitely need to upgrade them and fix the changed dependencies. airlift is not for REST services, right? You are just using it for stats? You like it? Better than the histograms we have going on out herein apache hbase? You have a Counter, Duration, TimeDistribution, and ExpotentialDecay.. Need airlift for this or could redo doing our current metrics dependencies? Rishit: Yes, even in my profiling, I saw airlift as a hot-spot. I will file a JIRA to move the stats out of airlift and choose the other options (hadoop style, or Yi Deng work for histogram at the very least. Should apache hbase make use of jmxutils too? How you folks using it (Not reviewed the patch yet). Rishit: AFAIK, they are used by airlift package. I will double check. Thanks, Rishit
        Hide
        stack stack added a comment -

        Sounds good Rishit Shroff. We can file patches to move over to our rpc+pbs and to purge libs that maybe don't add that much in the end when you fellas get up on a feature branch out here in apache (I believe that is the plan... kick Elliott Clark!)

        Show
        stack stack added a comment - Sounds good Rishit Shroff . We can file patches to move over to our rpc+pbs and to purge libs that maybe don't add that much in the end when you fellas get up on a feature branch out here in apache (I believe that is the plan... kick Elliott Clark !)
        Hide
        gaurav.menghani Gaurav Menghani added a comment -

        Final patch of the squashed commits after addressing Ted's comments.

        Show
        gaurav.menghani Gaurav Menghani added a comment - Final patch of the squashed commits after addressing Ted's comments.
        Hide
        eclark Elliott Clark added a comment -

        I'm +1 on committing this to a feature branch.
        I'll take the kick from stack as the same.

        Pushing this to HBASE-12259

        Show
        eclark Elliott Clark added a comment - I'm +1 on committing this to a feature branch. I'll take the kick from stack as the same. Pushing this to HBASE-12259
        Hide
        stack stack added a comment -

        +1 on creating feature branch

        Show
        stack stack added a comment - +1 on creating feature branch
        Hide
        eclark Elliott Clark added a comment -

        Pushed to feature branch. Small tweak to the pom file to get everything wired up.

        Show
        eclark Elliott Clark added a comment - Pushed to feature branch. Small tweak to the pom file to get everything wired up.
        Hide
        stack stack added a comment -

        Here are some notes on the 0002 patch itself. Mostly questions. No hurry w/ response.

        Purge versions from sub-poms if the dependency is common with other modules. Inherit from parent where you can. I found this in the consensus pom:

        <version>2.15</version>

        and
        <version>4.1</version>

        ... for what I believe are common dependencies (Its fine to have dependencies in modules with versions if the dependency only used by the submodule)

        In the consensus module, there is an hbase-default.xml also? How does it relate to the one over in hbase-common?

        You can purge hadoop-1 references (unless that is how you refer to your hdfs.) We don't support it in hbase 1.0.

        Others have already remarked on undoing the local copies of HColumnDescriptor, etc. You had them in here so this module could 'work' without dependency? (Kill HServerAddress!) Do we need to move stuff around in hbase so you can have minimal dependencies? All the classes you have copied local here, should we move them back up and into hbase-common module if not there already?

        Run the rat tool on your feature branch. Some of these files are still missing licenses.

        Nice. You have already ported over ConfigurationObserver. Good.

        We'll have to redo FetchTask, etc., as protobuf? (the swifty annotations look very nice)

        In FetchTask I see in the run that it has this:

        // TODO in part 2: fetch log files from the peer!

        Does that mean this facility of raft – catchup I presume – is yet to be implemented or is it rather that this class gets subclassed later?

        If the former, is there more to do to have full implementation?

        Each regionserver stands up an instance of a QuorumClient per region?

        A quorum name is different from region name?

        We'd have to move all of the quorum communication up on to hbase rpc and pb? What you lot think? Keep swift?

        The FSM stuff could do w/ a bit of package doc on how it works. That said, it is easy to read. (Maybe we could use these primitives elsewhere in hbase)

        Region splits handled?

        We'd have to figure how to do the RMap stuff in master? And bootstrapping the cluster?

        A regionserver would have to do consensuservice?

        WHen you say 'favored_nodes' in the rmap, does that imply we need 'favored nodes' working out in apache hbase or is this just how you describe quorum members for a region?

        Is the read-side, reading from the quorum, a patch as big as this one (smile)?

        Nice one.

        Show
        stack stack added a comment - Here are some notes on the 0002 patch itself. Mostly questions. No hurry w/ response. Purge versions from sub-poms if the dependency is common with other modules. Inherit from parent where you can. I found this in the consensus pom: <version>2.15</version> and <version>4.1</version> ... for what I believe are common dependencies (Its fine to have dependencies in modules with versions if the dependency only used by the submodule) In the consensus module, there is an hbase-default.xml also? How does it relate to the one over in hbase-common? You can purge hadoop-1 references (unless that is how you refer to your hdfs.) We don't support it in hbase 1.0. Others have already remarked on undoing the local copies of HColumnDescriptor, etc. You had them in here so this module could 'work' without dependency? (Kill HServerAddress!) Do we need to move stuff around in hbase so you can have minimal dependencies? All the classes you have copied local here, should we move them back up and into hbase-common module if not there already? Run the rat tool on your feature branch. Some of these files are still missing licenses. Nice. You have already ported over ConfigurationObserver. Good. We'll have to redo FetchTask, etc., as protobuf? (the swifty annotations look very nice) In FetchTask I see in the run that it has this: // TODO in part 2: fetch log files from the peer! Does that mean this facility of raft – catchup I presume – is yet to be implemented or is it rather that this class gets subclassed later? If the former, is there more to do to have full implementation? Each regionserver stands up an instance of a QuorumClient per region? A quorum name is different from region name? We'd have to move all of the quorum communication up on to hbase rpc and pb? What you lot think? Keep swift? The FSM stuff could do w/ a bit of package doc on how it works. That said, it is easy to read. (Maybe we could use these primitives elsewhere in hbase) Region splits handled? We'd have to figure how to do the RMap stuff in master? And bootstrapping the cluster? A regionserver would have to do consensuservice? WHen you say 'favored_nodes' in the rmap, does that imply we need 'favored nodes' working out in apache hbase or is this just how you describe quorum members for a region? Is the read-side, reading from the quorum, a patch as big as this one (smile)? Nice one.
        Hide
        stack stack added a comment -

        Any feedback on the above notes Gaurav Menghani? Thanks.

        Show
        stack stack added a comment - Any feedback on the above notes Gaurav Menghani ? Thanks.
        Hide
        gaurav.menghani Gaurav Menghani added a comment -

        Stack, sorry about this. Didn't see your comment during the holidays.

        Purge versions from sub-poms if the dependency is common with other modules. Inherit from parent where you can. I found this in the consensus pom:
        <version>2.15</version>
        and
        <version>4.1</version>
        ... for what I believe are common dependencies (Its fine to have dependencies in modules with versions if the dependency only used by the submodule)

        Sure, will do.

        In the consensus module, there is an hbase-default.xml also? How does it relate to the one over in hbase-common?
        You can purge hadoop-1 references (unless that is how you refer to your hdfs.) We don't support it in hbase 1.0.
        Others have already remarked on undoing the local copies of HColumnDescriptor, etc. You had them in here so this module could 'work' without dependency? (Kill HServerAddress!) Do we need to move stuff around in hbase so you can have minimal dependencies? All the classes you have copied local here, should we move them back up and into hbase-common module if not there already?

        I didn't find hbase-default.xml in the consensus package. Yes, we are cleaning up all that dependency cruft in HBASE-12510.

        Run the rat tool on your feature branch. Some of these files are still missing licenses.

        Yes, we didn't add licenses to mostly any of the code. Will do this.

        Nice. You have already ported over ConfigurationObserver. Good.

        Hehe, I wrote the original ConfigurationObserver in 0.89-fb

        We'll have to redo FetchTask, etc., as protobuf? (the swifty annotations look very nice)
        In FetchTask I see in the run that it has this:
        // TODO in part 2: fetch log files from the peer!
        Does that mean this facility of raft – catchup I presume – is yet to be implemented or is it rather that this class gets subclassed later?
        If the former, is there more to do to have full implementation?

        As per Rishit, the FetchTask is for advanced offline log recovery. We aren't using that part yet. We can add more comments to highlight that.

        Each regionserver stands up an instance of a QuorumClient per region?

        Yes, a QuorumClient is used to replicate edits. It is the most clear usage of the protocol inside RS.

        A quorum name is different from region name?

        It need not be. Because the consensus module is intended to be independent of WAL, and even HBase (which it is not right now, with all the dependencies). So today the quorum is being done on the WAL, so the quorum name is the same as region name. Tomorrow, we can do a quorum on something else, in which case quorum name would be something else. It is just a unique identifier for the quorum.

        We'd have to move all of the quorum communication up on to hbase rpc and pb? What you lot think? Keep swift?

        I would prefer swift, because we don't have to write IDL specs for the service and data-structs. That said, if it doesn't play well with the bigger project, we should discuss this in detail.

        The FSM stuff could do w/ a bit of package doc on how it works. That said, it is easy to read. (Maybe we could use these primitives elsewhere in hbase)

        What do you mean by a package doc? Is it synonymous to Javadocs?

        Region splits handled?

        Hydrabase cannot handle region splits as of now.

        We'd have to figure how to do the RMap stuff in master? And bootstrapping the cluster?

        Regarding Bootstrapping, we can share some ideas. There are some basic tools to bootstrap the RMap. I am moving the RMap part to hbase-server in HBASE-12510, since this is part of the integration of HBase + Protocol. Regarding the master part, there is a diff coming out soon. We worked on something known as HydrabaseMaster which is a thin HMaster which alters the RMap as nodes die and join the quorum.

        A regionserver would have to do consensuservice?

        That's correct. Coming up soon.

        WHen you say 'favored_nodes' in the rmap, does that imply we need 'favored nodes' working out in apache hbase or is this just how you describe quorum members for a region?

        Actually favored_nodes is used for hinting the DFS where to replicate the blocks. I believe we don't have that ability in open source HDFS right? In the next diff, we are just ignoring the favored nodes part.

        Is the read-side, reading from the quorum, a patch as big as this one (smile)?
        Nice one.

        Not quite, I think the patch sizes would go down exponentially from here. This probably was the largest chunk of code.

        Show
        gaurav.menghani Gaurav Menghani added a comment - Stack, sorry about this. Didn't see your comment during the holidays. Purge versions from sub-poms if the dependency is common with other modules. Inherit from parent where you can. I found this in the consensus pom: <version>2.15</version> and <version>4.1</version> ... for what I believe are common dependencies (Its fine to have dependencies in modules with versions if the dependency only used by the submodule) Sure, will do. In the consensus module, there is an hbase-default.xml also? How does it relate to the one over in hbase-common? You can purge hadoop-1 references (unless that is how you refer to your hdfs.) We don't support it in hbase 1.0. Others have already remarked on undoing the local copies of HColumnDescriptor, etc. You had them in here so this module could 'work' without dependency? (Kill HServerAddress!) Do we need to move stuff around in hbase so you can have minimal dependencies? All the classes you have copied local here, should we move them back up and into hbase-common module if not there already? I didn't find hbase-default.xml in the consensus package. Yes, we are cleaning up all that dependency cruft in HBASE-12510 . Run the rat tool on your feature branch. Some of these files are still missing licenses. Yes, we didn't add licenses to mostly any of the code. Will do this. Nice. You have already ported over ConfigurationObserver. Good. Hehe, I wrote the original ConfigurationObserver in 0.89-fb We'll have to redo FetchTask, etc., as protobuf? (the swifty annotations look very nice) In FetchTask I see in the run that it has this: // TODO in part 2: fetch log files from the peer! Does that mean this facility of raft – catchup I presume – is yet to be implemented or is it rather that this class gets subclassed later? If the former, is there more to do to have full implementation? As per Rishit, the FetchTask is for advanced offline log recovery. We aren't using that part yet. We can add more comments to highlight that. Each regionserver stands up an instance of a QuorumClient per region? Yes, a QuorumClient is used to replicate edits. It is the most clear usage of the protocol inside RS. A quorum name is different from region name? It need not be. Because the consensus module is intended to be independent of WAL, and even HBase (which it is not right now, with all the dependencies). So today the quorum is being done on the WAL, so the quorum name is the same as region name. Tomorrow, we can do a quorum on something else, in which case quorum name would be something else. It is just a unique identifier for the quorum. We'd have to move all of the quorum communication up on to hbase rpc and pb? What you lot think? Keep swift? I would prefer swift, because we don't have to write IDL specs for the service and data-structs. That said, if it doesn't play well with the bigger project, we should discuss this in detail. The FSM stuff could do w/ a bit of package doc on how it works. That said, it is easy to read. (Maybe we could use these primitives elsewhere in hbase) What do you mean by a package doc? Is it synonymous to Javadocs? Region splits handled? Hydrabase cannot handle region splits as of now. We'd have to figure how to do the RMap stuff in master? And bootstrapping the cluster? Regarding Bootstrapping, we can share some ideas. There are some basic tools to bootstrap the RMap. I am moving the RMap part to hbase-server in HBASE-12510 , since this is part of the integration of HBase + Protocol. Regarding the master part, there is a diff coming out soon. We worked on something known as HydrabaseMaster which is a thin HMaster which alters the RMap as nodes die and join the quorum. A regionserver would have to do consensuservice? That's correct. Coming up soon. WHen you say 'favored_nodes' in the rmap, does that imply we need 'favored nodes' working out in apache hbase or is this just how you describe quorum members for a region? Actually favored_nodes is used for hinting the DFS where to replicate the blocks. I believe we don't have that ability in open source HDFS right? In the next diff, we are just ignoring the favored nodes part. Is the read-side, reading from the quorum, a patch as big as this one (smile)? Nice one. Not quite, I think the patch sizes would go down exponentially from here. This probably was the largest chunk of code.
        Hide
        stack stack added a comment -

        Gaurav Menghani Thank you for the replies. All look good.

        Favored nodes is 'there' in apache hdfs, just not used/tested. The last bit is still to be hooked up in apache hbase. It has been a TODO for a long while.

        What I mean by package-doc is something like this http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html#package_description (yes, javadoc but at package level giving overview... I think this might be all that is needed in FSM case since it is 'easy' to read at the method/class level... its just the high-level story that needs a bit of filling in).

        I understand why you've not done splits for messages but you think it will be possible out in apache hbase atop what you have here? Can be later... just asking.

        On swift, yeah, the annotations are nice but it'd be a headache having thrift for one communication channel and pb/hbase-rpc for another. We can help w/ the conversion here since we have a bit of experience, np. Its fine as swift for now.

        Thanks for putting this up.

        Show
        stack stack added a comment - Gaurav Menghani Thank you for the replies. All look good. Favored nodes is 'there' in apache hdfs, just not used/tested. The last bit is still to be hooked up in apache hbase. It has been a TODO for a long while. What I mean by package-doc is something like this http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html#package_description (yes, javadoc but at package level giving overview... I think this might be all that is needed in FSM case since it is 'easy' to read at the method/class level... its just the high-level story that needs a bit of filling in). I understand why you've not done splits for messages but you think it will be possible out in apache hbase atop what you have here? Can be later... just asking. On swift, yeah, the annotations are nice but it'd be a headache having thrift for one communication channel and pb/hbase-rpc for another. We can help w/ the conversion here since we have a bit of experience, np. Its fine as swift for now. Thanks for putting this up.
        Hide
        gaurav.menghani Gaurav Menghani added a comment -

        Favored nodes is 'there' in apache hdfs, just not used/tested. The last bit is still to be hooked up in apache hbase. It has been a TODO for a long while.

        Cool, we should mark this as a TODO in HydraBase integration as well.

        What I mean by package-doc is something like this http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html#package_description (yes, javadoc but at package level giving overview... I think this might be all that is needed in FSM case since it is 'easy' to read at the method/class level... its just the high-level story that needs a bit of filling in).

        Sure.

        I understand why you've not done splits for messages but you think it will be possible out in apache hbase atop what you have here? Can be later... just asking.

        I think it will be slightly involved because essentially we are stoping the parent quorum and starting two daughter quorums. This could be done as a configuration change, and we can do this in the next HydraBase version.

        On swift, yeah, the annotations are nice but it'd be a headache having thrift for one communication channel and pb/hbase-rpc for another. We can help w/ the conversion here since we have a bit of experience, np. Its fine as swift for now.

        Yes, I understand the concern. Let's follow up on this later then.

        Thanks.

        Show
        gaurav.menghani Gaurav Menghani added a comment - Favored nodes is 'there' in apache hdfs, just not used/tested. The last bit is still to be hooked up in apache hbase. It has been a TODO for a long while. Cool, we should mark this as a TODO in HydraBase integration as well. What I mean by package-doc is something like this http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html#package_description (yes, javadoc but at package level giving overview... I think this might be all that is needed in FSM case since it is 'easy' to read at the method/class level... its just the high-level story that needs a bit of filling in). Sure. I understand why you've not done splits for messages but you think it will be possible out in apache hbase atop what you have here? Can be later... just asking. I think it will be slightly involved because essentially we are stoping the parent quorum and starting two daughter quorums. This could be done as a configuration change, and we can do this in the next HydraBase version. On swift, yeah, the annotations are nice but it'd be a headache having thrift for one communication channel and pb/hbase-rpc for another. We can help w/ the conversion here since we have a bit of experience, np. Its fine as swift for now. Yes, I understand the concern. Let's follow up on this later then. Thanks.
        Hide
        gaurav.menghani Gaurav Menghani added a comment -

        This could be done as a configuration change, and we can do this in the next HydraBase version.

        I meant a quorum configuration change, to be clear.

        Show
        gaurav.menghani Gaurav Menghani added a comment - This could be done as a configuration change, and we can do this in the next HydraBase version. I meant a quorum configuration change, to be clear.
        Hide
        rshroff Rishit Shroff added a comment -

        Attached is the details about the implementation of the RAFT protocol in the hbase-consensus.

        Show
        rshroff Rishit Shroff added a comment - Attached is the details about the implementation of the RAFT protocol in the hbase-consensus.

          People

          • Assignee:
            gaurav.menghani Gaurav Menghani
            Reporter:
            gaurav.menghani Gaurav Menghani
          • Votes:
            0 Vote for this issue
            Watchers:
            32 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development