Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: Coprocessors
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change

      Description

      Support column family scoping via a new HCD attribute. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. Also identify every HLogKey with the original cluster's ID.

      1. HBASE-1728.patch
        21 kB
        Jean-Daniel Cryans
      2. HLogKey-scoping.patch
        31 kB
        Andrew Purtell
      3. HCD-family-scoping.patch
        28 kB
        Andrew Purtell

        Issue Links

          Activity

          Hide
          Jean-Daniel Cryans added a comment -

          Committed to trunk.

          Show
          Jean-Daniel Cryans added a comment - Committed to trunk.
          Hide
          Jean-Daniel Cryans added a comment -

          Does this mean that only one cluster can be associated with one zk instance? Or is the notion that if mutliple clusters are sharing the one zk ensemble, then they will be homed (gaol'd) at different areas up in zk ( I suppose that makes sense - huh - they'd have to be)

          I like the way you are able to answer yourself. Yes they have different home dir so it's ok

          The change to HLogKey means we can't read old logs. Thats probably fine, right? Migration requires that there be no hlogs in filesystem?

          Right, major version change. Although I did this:

              try {
                this.clusterId = in.readByte();
                this.scope = in.readInt();
              } catch(EOFException e) {
                // Means it's an old key, just continue
              }
          

          Will commit with your comments-related fixes. Thanks!

          Show
          Jean-Daniel Cryans added a comment - Does this mean that only one cluster can be associated with one zk instance? Or is the notion that if mutliple clusters are sharing the one zk ensemble, then they will be homed (gaol'd) at different areas up in zk ( I suppose that makes sense - huh - they'd have to be) I like the way you are able to answer yourself. Yes they have different home dir so it's ok The change to HLogKey means we can't read old logs. Thats probably fine, right? Migration requires that there be no hlogs in filesystem? Right, major version change. Although I did this: try { this .clusterId = in.readByte(); this .scope = in.readInt(); } catch (EOFException e) { // Means it's an old key, just continue } Will commit with your comments-related fixes. Thanks!
          Hide
          stack added a comment -

          .bq Ryan's point is valid (to be able to change HCD without disable) but resolving that is outside of the scope of this jira (he agrees on that). This should not be a blocker.

          I agree. Master rewrite should get this. All of HCD and tabledefinition will be up in zk.

          .bq The KV should not carry the scoping information since it's only needed in HLog where we already have access to the HTD.

          Agreed.

          .bq Also note the danger in having a code-oriented 'default' scope that is being written into every single record. Not that the default would ever change, but we should observe the lessons of https://issues.apache.org/jira/browse/HBASE-2213 in recording defaults in every HCD.

          This should be fixed in master rewrite where we record only deviations from default.

          Regards the patch:

          +    String clusterIdName =
          +            conf.get("zookeeper.znode.clusterId", "clusterId");
          

          Does this mean that only one cluster can be associated with one zk instance? Or is the notion that if mutliple clusters are sharing the one zk ensemble, then they will be homed (gaol'd) at different areas up in zk ( I suppose that makes sense – huh – they'd have to be)

          Aside, 'isMaster' is a bad name for a datamember. Should be 'master' at least by javabeans convention.

          What does method name deviate from data member name in below:

          +  public byte getRepId() {
          +    return this.clusterId;
          

          Align them? Make method name getClusterId?

          + public static final String SCOPE = "SCOPE"; ... is too generic. Make it REPLICATION_SCOPE or REP_SCOPE or something.

          The change to HLogKey means we can't read old logs. Thats probably fine, right? Migration requires that there be no hlogs in filesystem?

          +1 on commit fixing above beforehand.

          Show
          stack added a comment - .bq Ryan's point is valid (to be able to change HCD without disable) but resolving that is outside of the scope of this jira (he agrees on that). This should not be a blocker. I agree. Master rewrite should get this. All of HCD and tabledefinition will be up in zk. .bq The KV should not carry the scoping information since it's only needed in HLog where we already have access to the HTD. Agreed. .bq Also note the danger in having a code-oriented 'default' scope that is being written into every single record. Not that the default would ever change, but we should observe the lessons of https://issues.apache.org/jira/browse/HBASE-2213 in recording defaults in every HCD. This should be fixed in master rewrite where we record only deviations from default. Regards the patch: + String clusterIdName = + conf.get( "zookeeper.znode.clusterId" , "clusterId" ); Does this mean that only one cluster can be associated with one zk instance? Or is the notion that if mutliple clusters are sharing the one zk ensemble, then they will be homed (gaol'd) at different areas up in zk ( I suppose that makes sense – huh – they'd have to be) Aside, 'isMaster' is a bad name for a datamember. Should be 'master' at least by javabeans convention. What does method name deviate from data member name in below: + public byte getRepId() { + return this .clusterId; Align them? Make method name getClusterId? + public static final String SCOPE = "SCOPE"; ... is too generic. Make it REPLICATION_SCOPE or REP_SCOPE or something. The change to HLogKey means we can't read old logs. Thats probably fine, right? Migration requires that there be no hlogs in filesystem? +1 on commit fixing above beforehand.
          Hide
          Jean-Daniel Cryans added a comment -

          Also note the danger in having a code-oriented 'default' scope that is being written into every single record. Not that the default would ever change, but we should observe the lessons of https://issues.apache.org/jira/browse/HBASE-2213 in recording defaults in every HCD.

          I will be happy to change it when we have a better solution for all configurations.

          What about table level attributes? For those tables with a lot of families it might make life easier.

          Will it be easier or more complicated? Is setting the scope on 5 families that hard? My feeling is that we can add it later if we really have people asking for it, in the mean time the code is less complicated.

          Show
          Jean-Daniel Cryans added a comment - Also note the danger in having a code-oriented 'default' scope that is being written into every single record. Not that the default would ever change, but we should observe the lessons of https://issues.apache.org/jira/browse/HBASE-2213 in recording defaults in every HCD. I will be happy to change it when we have a better solution for all configurations. What about table level attributes? For those tables with a lot of families it might make life easier. Will it be easier or more complicated? Is setting the scope on 5 families that hard? My feeling is that we can add it later if we really have people asking for it, in the mean time the code is less complicated.
          Hide
          Andrew Purtell added a comment -

          @Ryan: Have you not seen the comments on several replication related issues where I have mentioned using scope > 0 as global, with scope also used to sort a priority queue? That's what I have in mind.

          Show
          Andrew Purtell added a comment - @Ryan: Have you not seen the comments on several replication related issues where I have mentioned using scope > 0 as global, with scope also used to sort a priority queue? That's what I have in mind.
          Hide
          ryan rawson added a comment -

          As an aside, what other values for scope could there be than 'local' and 'global' ?

          Also note the danger in having a code-oriented 'default' scope that is being written into every single record. Not that the default would ever change, but we should observe the lessons of https://issues.apache.org/jira/browse/HBASE-2213 in recording defaults in every HCD.

          What about table level attributes? For those tables with a lot of families it might make life easier.

          Show
          ryan rawson added a comment - As an aside, what other values for scope could there be than 'local' and 'global' ? Also note the danger in having a code-oriented 'default' scope that is being written into every single record. Not that the default would ever change, but we should observe the lessons of https://issues.apache.org/jira/browse/HBASE-2213 in recording defaults in every HCD. What about table level attributes? For those tables with a lot of families it might make life easier.
          Hide
          Jean-Daniel Cryans added a comment -

          Patch that adds both the scope on HCD/HLogKey and the clusterId on HLogKey. It's very minimalist on the core side because replication is contrib and I didn't change anything in other contribs. The handling of the cluster ID is scoped in HBASE-2195 and will have its own set of tests there.

          Show
          Jean-Daniel Cryans added a comment - Patch that adds both the scope on HCD/HLogKey and the clusterId on HLogKey. It's very minimalist on the core side because replication is contrib and I didn't change anything in other contribs. The handling of the cluster ID is scoped in HBASE-2195 and will have its own set of tests there.
          Hide
          Jean-Daniel Cryans added a comment -

          I'm changing the title and description of this issue to reflect the new scope. We won't change stargate and thrift since replication is first added as a contrib. Also I'm bringing in the concept of cluster identification since that change is done around the exact same parts of the code.

          Show
          Jean-Daniel Cryans added a comment - I'm changing the title and description of this issue to reflect the new scope. We won't change stargate and thrift since replication is first added as a contrib. Also I'm bringing in the concept of cluster identification since that change is done around the exact same parts of the code.
          Hide
          Jean-Daniel Cryans added a comment -

          Got it, I didn't see it like that.

          Show
          Jean-Daniel Cryans added a comment - Got it, I didn't see it like that.
          Hide
          Andrew Purtell added a comment -

          As for what we would build into core, I am thinking 1) simple binary scheme – local or global; and, possibly 2) extend the binary scheme such that, for example, a scope of 0 means local, and a scope > 0 means global, with the desired priority of the replication set by the natural ordering of the int.

          Following up with the latter, a generic framework could e.g. read a class name from a family attribute, instantiate that object to make replication/routing decisions (via dynamic load from classpath using hdfs classloader, or using coprocessors at some future time), hand each kv to the object via an interface method, and use an int result as replication scope and priority as described above. This is like mixing filters with replication and a priority queue. Some, but not too much, additional work in return for affording users a lot of function to build upon.

          Note the framework can be flexible enough for someone to go even further and encode destination as well as priority in the int and substitute their own replication engine capable of complex routing even if we think that is out of scope of core. We just need to make the right bits of the replication logic subclassable.

          Show
          Andrew Purtell added a comment - As for what we would build into core, I am thinking 1) simple binary scheme – local or global; and, possibly 2) extend the binary scheme such that, for example, a scope of 0 means local, and a scope > 0 means global, with the desired priority of the replication set by the natural ordering of the int. Following up with the latter, a generic framework could e.g. read a class name from a family attribute, instantiate that object to make replication/routing decisions (via dynamic load from classpath using hdfs classloader, or using coprocessors at some future time), hand each kv to the object via an interface method, and use an int result as replication scope and priority as described above. This is like mixing filters with replication and a priority queue. Some, but not too much, additional work in return for affording users a lot of function to build upon. Note the framework can be flexible enough for someone to go even further and encode destination as well as priority in the int and substitute their own replication engine capable of complex routing even if we think that is out of scope of core. We just need to make the right bits of the replication logic subclassable.
          Hide
          Andrew Purtell added a comment -

          I don't see any reason why we should not at least first have only local or global scope.

          This is my thought as well for at first.

          I'm not sure I agree that we should be able to set destinations on the family scope. What kind of mess are you creating if all families from all tables are going different ways?

          Well, the idea behind using an int for scoping, scoping at the family, and building kv routing as a pluggable framework is to be generic enough to separate mechanism from policy.

          As for what we would build into core, I am thinking 1) simple binary scheme – local or global; and, possibly 2) extend the binary scheme such that, for example, a scope of 0 means local, and a scope > 0 means global, with the desired priority of the replication set by the natural ordering of the int.

          Show
          Andrew Purtell added a comment - I don't see any reason why we should not at least first have only local or global scope. This is my thought as well for at first. I'm not sure I agree that we should be able to set destinations on the family scope. What kind of mess are you creating if all families from all tables are going different ways? Well, the idea behind using an int for scoping, scoping at the family, and building kv routing as a pluggable framework is to be generic enough to separate mechanism from policy. As for what we would build into core, I am thinking 1) simple binary scheme – local or global; and, possibly 2) extend the binary scheme such that, for example, a scope of 0 means local, and a scope > 0 means global, with the desired priority of the replication set by the natural ordering of the int.
          Hide
          Jean-Daniel Cryans added a comment -

          This jira is the next on my list. Here are my thoughts :

          Ryan's point is valid (to be able to change HCD without disable) but resolving that is outside of the scope of this jira (he agrees on that). This should not be a blocker.

          I'm not sure I agree that we should be able to set destinations on the family scope. What kind of mess are you creating if all families from all tables are going different ways? I don't see any reason why we should not at least first have only local or global scope.

          The KV should not carry the scoping information since it's only needed in HLog where we already have access to the HTD.

          As Andrew was saying in HBASE-2129, we need to be able to trace where an edit is coming from and a Byte would be enough to hold that value. It should not go in KV since that means we would store that in HFiles. I think the best would be to put it in HLogKey. How chained clusters should handle that then is when we have:

          master1 => slave1 & master2 => slave2

          The second node should use a new special field in Put and Delete to set the original cluster Byte which will be passed down to HLog in order to create new HLogKey with that same value. So slave2 will still receive the location of the original cluster which may be master1 or master2. If we have a cycle:

          ... => slave3 & master1 => slave1 & master2 => slave2 & master3 => slave3 & master1 =>...

          Then each master needs to consider if the slave cluster it's pushing to is the same as the one in the Byte of every edit it's about to replicate.

          Show
          Jean-Daniel Cryans added a comment - This jira is the next on my list. Here are my thoughts : Ryan's point is valid (to be able to change HCD without disable) but resolving that is outside of the scope of this jira (he agrees on that). This should not be a blocker. I'm not sure I agree that we should be able to set destinations on the family scope. What kind of mess are you creating if all families from all tables are going different ways? I don't see any reason why we should not at least first have only local or global scope. The KV should not carry the scoping information since it's only needed in HLog where we already have access to the HTD. As Andrew was saying in HBASE-2129 , we need to be able to trace where an edit is coming from and a Byte would be enough to hold that value. It should not go in KV since that means we would store that in HFiles. I think the best would be to put it in HLogKey. How chained clusters should handle that then is when we have: master1 => slave1 & master2 => slave2 The second node should use a new special field in Put and Delete to set the original cluster Byte which will be passed down to HLog in order to create new HLogKey with that same value. So slave2 will still receive the location of the original cluster which may be master1 or master2. If we have a cycle: ... => slave3 & master1 => slave1 & master2 => slave2 & master3 => slave3 & master1 =>... Then each master needs to consider if the slave cluster it's pushing to is the same as the one in the Byte of every edit it's about to replicate.
          Hide
          Andrew Purtell added a comment -

          Reopen because it is still useful to have the region server set the scope on KVs to some default which can be configured on a per column family basis.

          Show
          Andrew Purtell added a comment - Reopen because it is still useful to have the region server set the scope on KVs to some default which can be configured on a per column family basis.
          Hide
          Andrew Purtell added a comment -

          This isn't the way to do it. Ultimately there are several places where scoping information must be added to KeyValues to avoid breaking abstractions. Might as well just scope the KVs from the beginning.

          Show
          Andrew Purtell added a comment - This isn't the way to do it. Ultimately there are several places where scoping information must be added to KeyValues to avoid breaking abstractions. Might as well just scope the KVs from the beginning.
          Hide
          Andrew Purtell added a comment -

          @Jim: Regarding policy routing performance I think the choice here was between Integer and String. I suggest the former so policy routing only needs to do integer comparison or bit operations, not operations on arbitrary strings.

          Show
          Andrew Purtell added a comment - @Jim: Regarding policy routing performance I think the choice here was between Integer and String. I suggest the former so policy routing only needs to do integer comparison or bit operations, not operations on arbitrary strings.
          Hide
          Andrew Purtell added a comment -

          @Jim: A byte would probably suffice. That said, the value is just a HCD attribute and also a value in some policy table. We wouldn't be tagging kvs with 32 bits as part of the replication stream. The policy mechanism is within the replicator. Either it forwards a kv to a peer or does not. So the width of this value has no impact on performance.

          Show
          Andrew Purtell added a comment - @Jim: A byte would probably suffice. That said, the value is just a HCD attribute and also a value in some policy table. We wouldn't be tagging kvs with 32 bits as part of the replication stream. The policy mechanism is within the replicator. Either it forwards a kv to a peer or does not. So the width of this value has no impact on performance.
          Hide
          Jim Kellerman added a comment -

          I admit I haven't been following this closely (my other job keeps getting in the way

          However, from what I understand, scoping is currently either

          { local | global }

          is that correct?

          Do we envision other types of scoping?

          If not, wouldn't a byte suffice instead of an int?

          I understand that you are thinking that you were envisioning routing policies, but in that case is an int enough to express that? I suppose the int could select the routing policy, but realistically how many policies will there be? 2**32 - 1 ? It would be hard to imagine that there would be more than 127, so a byte would suffice, and you could use the sign bit to indicate local or global.

          In the Yahoo user database (UDB), we mostly wanted every user's data on every server farm in the US, but we often restricted what international server farms could replicate (or even access) data for users whose home locale was not serviced by that international server farm. To handle that each farm had a name (3-4 characters) and for replication outside the US, the farms that the user's data was replicated to was just a list of farms. That meant that we only had to send updates to those foreign farms if the user's data was present there. Admittedly, this policy was specific to the UDB, but I wanted to share a perspective of replication that came from my deep dark distant past.

          FWIW.

          Show
          Jim Kellerman added a comment - I admit I haven't been following this closely (my other job keeps getting in the way However, from what I understand, scoping is currently either { local | global } is that correct? Do we envision other types of scoping? If not, wouldn't a byte suffice instead of an int? I understand that you are thinking that you were envisioning routing policies, but in that case is an int enough to express that? I suppose the int could select the routing policy, but realistically how many policies will there be? 2**32 - 1 ? It would be hard to imagine that there would be more than 127, so a byte would suffice, and you could use the sign bit to indicate local or global. In the Yahoo user database (UDB), we mostly wanted every user's data on every server farm in the US, but we often restricted what international server farms could replicate (or even access) data for users whose home locale was not serviced by that international server farm. To handle that each farm had a name (3-4 characters) and for replication outside the US, the farms that the user's data was replicated to was just a list of farms. That meant that we only had to send updates to those foreign farms if the user's data was present there. Admittedly, this policy was specific to the UDB, but I wanted to share a perspective of replication that came from my deep dark distant past. FWIW.
          Hide
          Andrew Purtell added a comment -

          @Ryan:

          1) Yes currently but we should not need to take a table offline to update HCD or HTD attributes, so that can be handled orthogonally. One option for that is putting HTDs and HCDs up into ZK, with mirror on disk catalog tables to be used only for cold init scenarios, as discussed on IRC.

          2) This change set only associates a 32 bit integer with column families. We should support pluggable replication policies. Each can encode state into that 32 bit value however they would like. This issue anticipates a simple default policy of yes/no.

          Show
          Andrew Purtell added a comment - @Ryan: 1) Yes currently but we should not need to take a table offline to update HCD or HTD attributes, so that can be handled orthogonally. One option for that is putting HTDs and HCDs up into ZK, with mirror on disk catalog tables to be used only for cold init scenarios, as discussed on IRC. 2) This change set only associates a 32 bit integer with column families. We should support pluggable replication policies. Each can encode state into that 32 bit value however they would like. This issue anticipates a simple default policy of yes/no.
          Hide
          ryan rawson added a comment -

          thanks for making a start, here are 2 thoughts:

          • if it goes in HCD, does that mean we have to take a table outage to enable/disable replication? That might not be acceptable to some people (me included)
          • we want to capture multiple replication destinations, with individual control over each one. Replication will eventually form the backbone of our DR and data analysis strategy, so we will be expecting to have multiple replication streams.
          Show
          ryan rawson added a comment - thanks for making a start, here are 2 thoughts: if it goes in HCD, does that mean we have to take a table outage to enable/disable replication? That might not be acceptable to some people (me included) we want to capture multiple replication destinations, with individual control over each one. Replication will eventually form the backbone of our DR and data analysis strategy, so we will be expecting to have multiple replication streams.

            People

            • Assignee:
              Jean-Daniel Cryans
              Reporter:
              Andrew Purtell
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development