Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-7709

Infinite loop possible in Master/Master replication

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.94.6, 0.95.1
    • Fix Version/s: 0.98.0, 0.94.12, 0.96.0
    • Component/s: Replication
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We just discovered the following scenario:

      1. Cluster A and B are setup in master/master replication
      2. By accident we had Cluster C replicate to Cluster A.

      Now all edit originating from C will be bouncing between A and B. Forever!
      The reason is that when the edit come in from C the cluster ID is already set and won't be reset.

      We have a couple of options here:

      1. Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles > 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format.
      2. Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved.
      3. Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit.
      1. 0.95-trunk-rev1.patch
        86 kB
        Vasu Mariyala
      2. 0.95-trunk-rev2.patch
        103 kB
        Vasu Mariyala
      3. 0.95-trunk-rev3.patch
        102 kB
        Vasu Mariyala
      4. 0.95-trunk-rev4.patch
        104 kB
        Vasu Mariyala
      5. 095-trunk.patch
        86 kB
        Vasu Mariyala
      6. 7709-0.94-rev6.txt
        35 kB
        Lars Hofhansl
      7. HBASE-7709.patch
        12 kB
        Vasu Mariyala
      8. HBASE-7709-rev1.patch
        16 kB
        Vasu Mariyala
      9. HBASE-7709-rev2.patch
        36 kB
        Vasu Mariyala
      10. HBASE-7709-rev3.patch
        36 kB
        Vasu Mariyala
      11. HBASE-7709-rev4.patch
        36 kB
        Vasu Mariyala
      12. HBASE-7709-rev5.patch
        37 kB
        Vasu Mariyala

        Activity

        Hide
        lhofhansl Lars Hofhansl added a comment -

        Thanks to Cody Marcel, Ian Varley, and Jesse Yates for finding the issue.

        Show
        lhofhansl Lars Hofhansl added a comment - Thanks to Cody Marcel , Ian Varley , and Jesse Yates for finding the issue.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Option #2 seems the best.
        I think number of clusters exceeding 10 in master/master replication would be rare.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Option #2 seems the best. I think number of clusters exceeding 10 in master/master replication would be rare.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        I'd agree. Need to check if this is possible in 0.94 while keeping the HLog backwards compatible. If that is tricky for 0.94 we might need option #1.
        Also, I cannot promise that I will get to this any time soon.

        Show
        lhofhansl Lars Hofhansl added a comment - I'd agree. Need to check if this is possible in 0.94 while keeping the HLog backwards compatible. If that is tricky for 0.94 we might need option #1. Also, I cannot promise that I will get to this any time soon.
        Hide
        ivarley Ian Varley added a comment -

        Would another option be to do some kind of checking at add_peer time, to make sure no pernicious cycles are detected? I.e. when I add a peer, first walk the graph of current master/peer relationships and refuse to add if I detect a cycle I'm not part of? Would require an API to ask that question, but that's probably a good thing anyway.

        Show
        ivarley Ian Varley added a comment - Would another option be to do some kind of checking at add_peer time, to make sure no pernicious cycles are detected? I.e. when I add a peer, first walk the graph of current master/peer relationships and refuse to add if I detect a cycle I'm not part of? Would require an API to ask that question, but that's probably a good thing anyway.
        Hide
        ivarley Ian Varley added a comment -

        (Because cycles > 2 are still fine, they just have to include all nodes. It can go A -> B -> C -> A; when an edit from A gets to C, it won't re-send to A, and the cycle will stop. The problem is just when it's a cycle from A -> (B -> C -> B).)

        Show
        ivarley Ian Varley added a comment - (Because cycles > 2 are still fine, they just have to include all nodes. It can go A -> B -> C -> A; when an edit from A gets to C, it won't re-send to A, and the cycle will stop. The problem is just when it's a cycle from A -> (B -> C -> B).)
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        The new API would allow specification of more than one cluster, right ?
        What about (B -> C -> B) -> A where B replicates to A unidirectionally ?

        I think option #2 is the general solution.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - The new API would allow specification of more than one cluster, right ? What about (B -> C -> B) -> A where B replicates to A unidirectionally ? I think option #2 is the general solution.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        For HLog.Entry:

            public void write(DataOutput dataOutput) throws IOException {
              this.key.write(dataOutput);
              this.edit.write(dataOutput);
            }
        

        where the first integer for WALEdit is versionOrLength. If it is > 0, it is length. Otherwise it should be -1.
        We can introduce a marker (a.k.a WALEdit.VERSION_3 == -2) after which additional cluster Ids can be serialized.

        This is an incompatible change which should be acceptable to singularity release.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - For HLog.Entry: public void write(DataOutput dataOutput) throws IOException { this .key.write(dataOutput); this .edit.write(dataOutput); } where the first integer for WALEdit is versionOrLength. If it is > 0, it is length. Otherwise it should be -1. We can introduce a marker (a.k.a WALEdit.VERSION_3 == -2) after which additional cluster Ids can be serialized. This is an incompatible change which should be acceptable to singularity release.
        Hide
        ivarley Ian Varley added a comment -

        Re: (B -> C -> B) -> A, that's fine; no participants are detecting a cycle they're not part of (A isn't adding any peers, it's the slave). B detects a cycle it's part of (B -> C -> B) and C does as well.

        The API would be simple, and would let the caller walk the graph of clusters: ask the peer you're trying to add for all of its peers, then ask each of them in turn, and build up a graph structure that you can interrogate. Only call is "Tell me your current peers".

        I suppose this could cause problems if not all clusters can communicate; say, if B is visible to A, and C is visible to B, but C is not visible to A. And I guess there might be race conditions if you try to add peers on multiple clusters simultaneously, there's not really a way to avoid that.

        Show
        ivarley Ian Varley added a comment - Re: (B -> C -> B) -> A, that's fine; no participants are detecting a cycle they're not part of (A isn't adding any peers, it's the slave). B detects a cycle it's part of (B -> C -> B) and C does as well. The API would be simple, and would let the caller walk the graph of clusters: ask the peer you're trying to add for all of its peers, then ask each of them in turn, and build up a graph structure that you can interrogate. Only call is "Tell me your current peers". I suppose this could cause problems if not all clusters can communicate; say, if B is visible to A, and C is visible to B, but C is not visible to A. And I guess there might be race conditions if you try to add peers on multiple clusters simultaneously, there's not really a way to avoid that.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        What if C -> B -> A is established, and after some time, a potential cycle is formed with B -> C ?

        Along my comment above, we can PB the metadata in WALEdit and HLogKey where cluster Id is declared as repeated in .proto. A tool is provided to convert pre-0.96 WAL files into the new format.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - What if C -> B -> A is established, and after some time, a potential cycle is formed with B -> C ? Along my comment above, we can PB the metadata in WALEdit and HLogKey where cluster Id is declared as repeated in .proto. A tool is provided to convert pre-0.96 WAL files into the new format.
        Hide
        ivarley Ian Varley added a comment -

        Ah, touche. That particular arrangement is still fine (the resulting graph, (B -> C -> B) -> A, doesn't have any bad cycles. However, you raise a good point; if you start with:

        A -> B -> C

        and then later add "C -> B", you'd get:

        A -> (B -> C -> B)

        which is a bad cycle. And C has no way of knowing about A -> B; as a peer, you only know who you replicate to, not who replicates to you.

        A cluster could keep track of who is replicating TO it; in ReplicationSink, we could track all the cluster IDs that have ever sent data in, and report that through the "who do you replicate with" API. So then it would let you build a full graph, because you get the backwards edges.

        Of course, there's still plenty of catches: the race conditions, plus the possibility that someone is set up to replicate to you, but they just haven't sent any edits yet.

        Meh. With this level of complication, a solution in the direction you're talking about (adding info to the WAL) might be safer.

        Show
        ivarley Ian Varley added a comment - Ah, touche. That particular arrangement is still fine (the resulting graph, (B -> C -> B) -> A, doesn't have any bad cycles. However, you raise a good point; if you start with: A -> B -> C and then later add "C -> B", you'd get: A -> (B -> C -> B) which is a bad cycle. And C has no way of knowing about A -> B; as a peer, you only know who you replicate to, not who replicates to you. A cluster could keep track of who is replicating TO it; in ReplicationSink, we could track all the cluster IDs that have ever sent data in, and report that through the "who do you replicate with" API. So then it would let you build a full graph, because you get the backwards edges. Of course, there's still plenty of catches: the race conditions, plus the possibility that someone is set up to replicate to you, but they just haven't sent any edits yet. Meh. With this level of complication, a solution in the direction you're talking about (adding info to the WAL) might be safer.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Thanks Ian for correcting my example given @ 29/Jan/13 21:53

        My point was that replication topology can grow quite complex. If we cannot enumerate all the intricacies, we'd better design something that suits future development.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Thanks Ian for correcting my example given @ 29/Jan/13 21:53 My point was that replication topology can grow quite complex. If we cannot enumerate all the intricacies, we'd better design something that suits future development.
        Hide
        ivarley Ian Varley added a comment -

        Yes, I agree. While it's possible (in theory) to interrogate the actual topology at runtime, a solution that makes such problems impossible is much better.

        Show
        ivarley Ian Varley added a comment - Yes, I agree. While it's possible (in theory) to interrogate the actual topology at runtime, a solution that makes such problems impossible is much better.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Any good ideas for 0.94? We cannot change the HLog forward in a non-backwards compatible way there.
        Maybe in 0.94 we can do something simple along Ian's line of thinking. I don't care if it blows up in this case, even the RSs just aborting is better than a infinite back and forth of replication data (will fill up the memstore is useless versions, forever).

        Show
        lhofhansl Lars Hofhansl added a comment - Any good ideas for 0.94? We cannot change the HLog forward in a non-backwards compatible way there. Maybe in 0.94 we can do something simple along Ian's line of thinking. I don't care if it blows up in this case, even the RSs just aborting is better than a infinite back and forth of replication data (will fill up the memstore is useless versions, forever).
        Hide
        ivarley Ian Varley added a comment -

        At a minimum, this should be called out in the reference guide & replication page. Replication is still a pretty advanced feature, and replication for > 2 clusters even more so; if a patch doesn't go into 0.94.5, it's not end of the world.

        Show
        ivarley Ian Varley added a comment - At a minimum, this should be called out in the reference guide & replication page. Replication is still a pretty advanced feature, and replication for > 2 clusters even more so; if a patch doesn't go into 0.94.5, it's not end of the world.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Looking at HLogKey#readFields():

            if (version.atLeast(Version.INITIAL)) {
              if (in.readBoolean()) {
        

        From the javadoc of readBoolean():

        Reads one input byte and returns true if that byte is nonzero, false if that byte is zero.

        I think there is room to implement option #3 in the description. We can introduce new version (two, considering compression) where write, instead of true, the number of hops that HLog.Entry has gone through - starting with 1. A byte should suffice for this purpose.

        +1 on documenting this intricacy for 0.94.x in the refguide.

        I think we should create several subtasks for this JIRA.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Looking at HLogKey#readFields(): if (version.atLeast(Version.INITIAL)) { if (in.readBoolean()) { From the javadoc of readBoolean(): Reads one input byte and returns true if that byte is nonzero, false if that byte is zero. I think there is room to implement option #3 in the description. We can introduce new version (two, considering compression) where write, instead of true, the number of hops that HLog.Entry has gone through - starting with 1. A byte should suffice for this purpose. +1 on documenting this intricacy for 0.94.x in the refguide. I think we should create several subtasks for this JIRA.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        TestRowProcessorEndpoint shows another potential implementation for option #3:

                // We can also inject some meta data to the walEdit
                KeyValue metaKv = new KeyValue(
                    row, HLog.METAFAMILY,
                    Bytes.toBytes("num of hops"),
                    Bytes.toBytes(hops));
                walEdit.add(metaKv);
        
        Show
        yuzhihong@gmail.com Ted Yu added a comment - TestRowProcessorEndpoint shows another potential implementation for option #3: // We can also inject some meta data to the walEdit KeyValue metaKv = new KeyValue( row, HLog.METAFAMILY, Bytes.toBytes( "num of hops" ), Bytes.toBytes(hops)); walEdit.add(metaKv);
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Any bright ideas? I can't think of a forward and backward compatible way to make this happen in 0.94. Forward and backwards compatibility is needed, because during an upgrade via rolling restarts we might have logs of both formats in a single cluster.
        In 0.96 we do not have this restriction.
        Moving out to 0.94.7.

        Show
        lhofhansl Lars Hofhansl added a comment - Any bright ideas? I can't think of a forward and backward compatible way to make this happen in 0.94. Forward and backwards compatibility is needed, because during an upgrade via rolling restarts we might have logs of both formats in a single cluster. In 0.96 we do not have this restriction. Moving out to 0.94.7.
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        I have another idea which IMHO is better. The basic idea is following:

        1) We maintain a counter value called RD("replication distance") which represents how far a WAL edit from a source cluster to current cluster like the hop-counter mentioned in option 3.
        2) Each replaying & receiving region server maintains an internal memory ClusterDistanceMap <clusterId, MIN(RD)>. Every time, if it sees a WAL with RD less than it currently has seen then just update the internal map with the smaller RD value.
        3) drop all WAL edits from a cluster with RD > the the one current region server has in the ClusterDistanceMap

        Initially we could duplicate data for first several WAL edits but it will be corrected soon so we don't need to persistent any data for fail over scenario.

        The above idea is similar to option 3 but without always double replicating data on some clusters and maintaining the max-hop is human error-prone if we forget to bump up the max hop-count value when more clusters join in replication cycle.

        Why it works? Loop detection: quick walker will catch up slow walker but travel more.
        When we have infinite loop replication as mentioned in the JIRA, the data from a source must come from multiple ways to the destination with different RDs. Because it's evolving some loops, the RD won't be same otherwise there is no loop. Since the RD is different, we just need keep the data from the source with min distance.

        You may ask the diamond situation like following.

        a->b->d
        a->c->d

        where the data from a will be replicated to d twice. This is we configure to let d receive a's data twice. If there is loop involved and the loop-backed data will be dropped by the above way.

        This is general loop detection strategy so we can implement it in 0.96 or above. For 0.94,

        1) we can introduce a new version(3) in HLogKey
        2) use top two bytes of UUID to store the RD value and the remaining 14 bytes as a hash value of the 16 bytes length of origin UUID value without compromising uniqueness because in most cases we have 10s clusters involved in replication and the collision probability is less than 10(-18)
        OR
        using Ted's suggestion to overload the boolean byte.
        3) we can introduce a configuration setting with default to true. When we want to revert the new behavior, we can turn it off.

        please let me how do you think? Assign the ticket to me firstly in case we agree the implement the way I'm proposing.

        Thanks,
        -Jeffrey

        Show
        jeffreyz Jeffrey Zhong added a comment - I have another idea which IMHO is better. The basic idea is following: 1) We maintain a counter value called RD("replication distance") which represents how far a WAL edit from a source cluster to current cluster like the hop-counter mentioned in option 3. 2) Each replaying & receiving region server maintains an internal memory ClusterDistanceMap <clusterId, MIN(RD)>. Every time, if it sees a WAL with RD less than it currently has seen then just update the internal map with the smaller RD value. 3) drop all WAL edits from a cluster with RD > the the one current region server has in the ClusterDistanceMap Initially we could duplicate data for first several WAL edits but it will be corrected soon so we don't need to persistent any data for fail over scenario. The above idea is similar to option 3 but without always double replicating data on some clusters and maintaining the max-hop is human error-prone if we forget to bump up the max hop-count value when more clusters join in replication cycle. Why it works? Loop detection: quick walker will catch up slow walker but travel more. When we have infinite loop replication as mentioned in the JIRA, the data from a source must come from multiple ways to the destination with different RDs. Because it's evolving some loops, the RD won't be same otherwise there is no loop. Since the RD is different, we just need keep the data from the source with min distance. You may ask the diamond situation like following. a->b->d a->c->d where the data from a will be replicated to d twice. This is we configure to let d receive a's data twice. If there is loop involved and the loop-backed data will be dropped by the above way. This is general loop detection strategy so we can implement it in 0.96 or above. For 0.94, 1) we can introduce a new version(3) in HLogKey 2) use top two bytes of UUID to store the RD value and the remaining 14 bytes as a hash value of the 16 bytes length of origin UUID value without compromising uniqueness because in most cases we have 10s clusters involved in replication and the collision probability is less than 10(-18) OR using Ted's suggestion to overload the boolean byte. 3) we can introduce a configuration setting with default to true. When we want to revert the new behavior, we can turn it off. please let me how do you think? Assign the ticket to me firstly in case we agree the implement the way I'm proposing. Thanks, -Jeffrey
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Hey Jeffrey, that is my option #3 in the description, right?

        Show
        lhofhansl Lars Hofhansl added a comment - Hey Jeffrey, that is my option #3 in the description, right?
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        Lars Hofhansl My proposal is similar to option 3 because we both use hop-counter(replication distance in my proposal). While as I mentioned in the proposal

        The above idea is similar to option 3 but without always double replicating data on some clusters and maintaining the max-hop is human error-prone if we forget to bump up the max hop-count value when more clusters join in replication cycle.

        In the new proposal, region servers dynamically discover & maintain the MIN(RD) from a cluster and drop all edits which higher RDs from the same cluster.

        Show
        jeffreyz Jeffrey Zhong added a comment - Lars Hofhansl My proposal is similar to option 3 because we both use hop-counter(replication distance in my proposal). While as I mentioned in the proposal The above idea is similar to option 3 but without always double replicating data on some clusters and maintaining the max-hop is human error-prone if we forget to bump up the max hop-count value when more clusters join in replication cycle. In the new proposal, region servers dynamically discover & maintain the MIN(RD) from a cluster and drop all edits which higher RDs from the same cluster.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Ah yes. Cool. I should read the entire text before replying. Yes, that should work. I like it. The distance data does not have to be persisted as you say, upon restart an RS would just relearn.

        Generally, do you like this better than option #2? #2 would store too much data?

        As for 0.94. I like the config option, but it needs to be default off, so that we can do rolling restarts by default.

        Show
        lhofhansl Lars Hofhansl added a comment - Ah yes. Cool. I should read the entire text before replying. Yes, that should work. I like it. The distance data does not have to be persisted as you say, upon restart an RS would just relearn. Generally, do you like this better than option #2? #2 would store too much data? As for 0.94. I like the config option, but it needs to be default off, so that we can do rolling restarts by default.
        Hide
        enis Enis Soztutar added a comment -

        I like option #2 better than this. It is more simpler. Jeff's idea is good, but has the problem of dealing with the topology changes. If the topology changes in a way to make the normal route to a cluster longer, than all the updates afterwards will be dropped unless we somehow clear the cached mappings. This brings in an operational burden of cleaning the caches of downstream clusters, once the admin changes the topology upstream.

        A -> B <-> C is changed to A -> B -> D -> C -> B 
        

        Orthogonal to this, we also should be dropping the edits at the replication source, not the sink. We are doubling the network cost in cyclic cases. #2 also helps with this condition, because we can detect the sink cluster's id, and filter out.

        We can do a similar dynamic dictionary encoding for storing set of cluster ids. We can do it as a follow up optimization.

        Show
        enis Enis Soztutar added a comment - I like option #2 better than this. It is more simpler. Jeff's idea is good, but has the problem of dealing with the topology changes. If the topology changes in a way to make the normal route to a cluster longer, than all the updates afterwards will be dropped unless we somehow clear the cached mappings. This brings in an operational burden of cleaning the caches of downstream clusters, once the admin changes the topology upstream. A -> B <-> C is changed to A -> B -> D -> C -> B Orthogonal to this, we also should be dropping the edits at the replication source, not the sink. We are doubling the network cost in cyclic cases. #2 also helps with this condition, because we can detect the sink cluster's id, and filter out. We can do a similar dynamic dictionary encoding for storing set of cluster ids. We can do it as a follow up optimization.
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        I agree #2 is simpler but it is at cost of replaying more data and storing more data.

        As Enis mentioned, my proposal will need special handling when a new cluster joins. Either dynamically encoding a special token to let downstream RSs to reset its internal cache or ask operators to reset replication. It maks my proposal less appealing.

        Show
        jeffreyz Jeffrey Zhong added a comment - I agree #2 is simpler but it is at cost of replaying more data and storing more data. As Enis mentioned, my proposal will need special handling when a new cluster joins. Either dynamically encoding a special token to let downstream RSs to reset its internal cache or ask operators to reset replication. It maks my proposal less appealing.
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        Continue with more proposals...

        The disadvantages of option#2 is obvious as its advantages. Even in cases(maybe majority replication usage cases), there is no loop at all and just a long replication queue. The downstream RSs still need to replay and store a long list of clusterIds for each WALEdit. Encoding may help compress the clusterId list in sending part but not in storing.

        Let me firstly try to show you if we can do better than option#2 and then an alternative way which is good in most cases without more storage need. Both options are good IMHO.

        As we know loop is caused by back-edge in graph. We can roughly identify them by the fact if a region server sees there are more than one path from same source. If that's the case, loop situation is likely. Only by then, we need to append current cluster Id to the source cluster Id of a WAL edit for later loop detection. Therefore, in most cases, we don't need store long clusterId list if there is no loop or a simple master-master-master… cycle setup.

        I called the above updated option#2 as adaptive option#2 where it only need more storage when there is a need. We can implement it as following:

        1) Maintain a hash string PathCheckum(= Hash(receivedPathChecksum + current clusterId)) of a WAL edit
        2) Each replaying & receiving region server maintains an internal memory ClusterDistanceMap <clusterId, Set<PathChecksums seen so far>>.
        2.a Every time if it sees a new PathChecksum(which isn't in Set<PathChecksums> ), it add the new PathChecksum into Set<PathChecksums> or drop a stale one from Set<PathChecksums> when it's expired, i.e. after a configurable time period, a region server doesn't see any data coming in from the path.
        3) When Set<PathChecksums>'s size > 1, append current cluster id into the WAL edit for later replication loop detection.

        We can use top 8 bytes of clusterId to store PathChecksum and the rest 8 bytes as the hash of the original cluserId value. After the update, we only need to pay cost when there is a need.

        While you can image in real life replication setup normally doesn't involve any complicated graph, the option#2 is using extra storage need to deal with situations most likely won't happen. Therefore, in the following, I want to propose a solution without changing current WAL format and is good for most cases including the situation triggering the JIRA. In extreme cases, it reports errors for infinite loop.

        The new proposal(option #6) is as following:
        1) Maintain a hash string PathCheckum(= Hash(receivedPathChecksum + current clusterId)) of a WAL edit
        2) Each replaying & receiving region server maintains an internal memory ClusterDistanceMap <clusterId, Set<PathChecksums seen so far>>.
        2.a Every time if it sees a new PathChecksum(which isn't in Set<PathChecksums> ), it add the new PathChecksum into Set<PathChecksums> or drop a stale one from Set<PathChecksums> when it's expired, i.e. after a configurable time period, a region server doesn't any data coming in from the path.
        3) When Set<PathChecksums>'s size > 1, reset a WAL edit's clusterId to current clusterId and increment a counter(ResetCounter) to mark how many times current WAL edit's clusterId has been reset.
        4) When ResetCounter > 64, reports error( we could drop WAL edits as well because when ResetCounter > 64, it means we have at least 64 back-edges or duplicated sources. I think it's way complicated to have such cases.)

        The advantage of the above option is possibly using existing HLog format to prevent possible loop situation in real life cases

        To implement,
        1) we can introduce a new version(3) in HLogKey
        2) use top 7 bytes of UUID to store PathChecksum, use the following 1 byte to store RD and the remaining 8 bytes as a hash value of the 16 bytes length of origin UUID value without compromising uniqueness because in most cases we have 10s clusters involved in replication and the collision probability is less than 10(-18)
        3) we can introduce a configuration setting with default to false(suggested by Lars). After we rollout the feature, we can turn it on and turn if off in revert scenario.

        Thanks,
        -Jeffrey

        Show
        jeffreyz Jeffrey Zhong added a comment - Continue with more proposals... The disadvantages of option#2 is obvious as its advantages. Even in cases(maybe majority replication usage cases), there is no loop at all and just a long replication queue. The downstream RSs still need to replay and store a long list of clusterIds for each WALEdit. Encoding may help compress the clusterId list in sending part but not in storing. Let me firstly try to show you if we can do better than option#2 and then an alternative way which is good in most cases without more storage need. Both options are good IMHO. As we know loop is caused by back-edge in graph. We can roughly identify them by the fact if a region server sees there are more than one path from same source. If that's the case, loop situation is likely. Only by then, we need to append current cluster Id to the source cluster Id of a WAL edit for later loop detection. Therefore, in most cases, we don't need store long clusterId list if there is no loop or a simple master-master-master… cycle setup. I called the above updated option#2 as adaptive option#2 where it only need more storage when there is a need. We can implement it as following: 1) Maintain a hash string PathCheckum(= Hash(receivedPathChecksum + current clusterId)) of a WAL edit 2) Each replaying & receiving region server maintains an internal memory ClusterDistanceMap <clusterId, Set<PathChecksums seen so far>>. 2.a Every time if it sees a new PathChecksum(which isn't in Set<PathChecksums> ), it add the new PathChecksum into Set<PathChecksums> or drop a stale one from Set<PathChecksums> when it's expired, i.e. after a configurable time period, a region server doesn't see any data coming in from the path. 3) When Set<PathChecksums>'s size > 1, append current cluster id into the WAL edit for later replication loop detection. We can use top 8 bytes of clusterId to store PathChecksum and the rest 8 bytes as the hash of the original cluserId value. After the update, we only need to pay cost when there is a need. While you can image in real life replication setup normally doesn't involve any complicated graph, the option#2 is using extra storage need to deal with situations most likely won't happen. Therefore, in the following, I want to propose a solution without changing current WAL format and is good for most cases including the situation triggering the JIRA. In extreme cases, it reports errors for infinite loop. The new proposal(option #6) is as following: 1) Maintain a hash string PathCheckum(= Hash(receivedPathChecksum + current clusterId)) of a WAL edit 2) Each replaying & receiving region server maintains an internal memory ClusterDistanceMap <clusterId, Set<PathChecksums seen so far>>. 2.a Every time if it sees a new PathChecksum(which isn't in Set<PathChecksums> ), it add the new PathChecksum into Set<PathChecksums> or drop a stale one from Set<PathChecksums> when it's expired, i.e. after a configurable time period, a region server doesn't any data coming in from the path. 3) When Set<PathChecksums>'s size > 1, reset a WAL edit's clusterId to current clusterId and increment a counter(ResetCounter) to mark how many times current WAL edit's clusterId has been reset. 4) When ResetCounter > 64, reports error( we could drop WAL edits as well because when ResetCounter > 64, it means we have at least 64 back-edges or duplicated sources. I think it's way complicated to have such cases.) The advantage of the above option is possibly using existing HLog format to prevent possible loop situation in real life cases To implement, 1) we can introduce a new version(3) in HLogKey 2) use top 7 bytes of UUID to store PathChecksum, use the following 1 byte to store RD and the remaining 8 bytes as a hash value of the 16 bytes length of origin UUID value without compromising uniqueness because in most cases we have 10s clusters involved in replication and the collision probability is less than 10(-18) 3) we can introduce a configuration setting with default to false(suggested by Lars). After we rollout the feature, we can turn it on and turn if off in revert scenario. Thanks, -Jeffrey
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Sorry, I missed this. I need to read through and digest it.
        In any event, moving to 0.94.8.

        Show
        lhofhansl Lars Hofhansl added a comment - Sorry, I missed this. I need to read through and digest it. In any event, moving to 0.94.8.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        The proposal sounds good. Are you still planning to work on this Jeffrey Zhong?

        Show
        lhofhansl Lars Hofhansl added a comment - The proposal sounds good. Are you still planning to work on this Jeffrey Zhong ?
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        Lars Hofhansl Sure. Is that all right to implement option #6 for 0.94 and adaptive option#2 for trunk?

        Show
        jeffreyz Jeffrey Zhong added a comment - Lars Hofhansl Sure. Is that all right to implement option #6 for 0.94 and adaptive option#2 for trunk?
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Would #6 still allow rolling upgrades from a prio version of HBase. It looks like it would not since we have to increase the HLogKey version.

        Moving to 0.94.9.

        Show
        lhofhansl Lars Hofhansl added a comment - Would #6 still allow rolling upgrades from a prio version of HBase. It looks like it would not since we have to increase the HLogKey version. Moving to 0.94.9.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Nobody working on this. Moving out.

        Show
        lhofhansl Lars Hofhansl added a comment - Nobody working on this. Moving out.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        It seems for 0.94 we can either do option #1 or nothing at all.

        So I'd like to introduce a config option: hbase.enable.cyclic.replication. The default is "true" to maintain the current functionality.
        If set to false we'd reset the cluster id at each source and hence would only support master-master replication (cycles involving more that 2 nodes would lead to infinite loops).

        Show
        lhofhansl Lars Hofhansl added a comment - It seems for 0.94 we can either do option #1 or nothing at all. So I'd like to introduce a config option: hbase.enable.cyclic.replication. The default is "true" to maintain the current functionality. If set to false we'd reset the cluster id at each source and hence would only support master-master replication (cycles involving more that 2 nodes would lead to infinite loops).
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Any opinions?

        Show
        lhofhansl Lars Hofhansl added a comment - Any opinions?
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        For 0.94, I think it'd be better that we introduce a configuration setting "hbase.replication.reset.clusterid=<clusterId which a user specifies>". Only cluster specified here reset clusterId to itself so that we still can support master-master replication involving more than 2 nodes without bumping up logkey version.

        We could possibly bump up HLogKey version with one upgrade configuration setting like "upgrade.logkey" plus two rounds of rolling restart. Originally we set the config setting to false. First round rolling start to upgrade RS bits(new RS still write hlogkey in old version) and after all RS upgraded, we set the configuration to true and then second rolling start.

        The above complicates the upgrade scenario a little bit and requires all involved clusters in the replication are upgraded.

        Show
        jeffreyz Jeffrey Zhong added a comment - For 0.94, I think it'd be better that we introduce a configuration setting "hbase.replication.reset.clusterid=<clusterId which a user specifies>". Only cluster specified here reset clusterId to itself so that we still can support master-master replication involving more than 2 nodes without bumping up logkey version. We could possibly bump up HLogKey version with one upgrade configuration setting like "upgrade.logkey" plus two rounds of rolling restart. Originally we set the config setting to false. First round rolling start to upgrade RS bits(new RS still write hlogkey in old version) and after all RS upgraded, we set the configuration to true and then second rolling start. The above complicates the upgrade scenario a little bit and requires all involved clusters in the replication are upgraded.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        That would be more flexible, but at the same time more tedious to manage.

        Show
        lhofhansl Lars Hofhansl added a comment - That would be more flexible, but at the same time more tedious to manage.
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        So I'd like to introduce a config option: hbase.enable.cyclic.replication. The default is "true" to maintain the current functionality.
        If set to false we'd reset the cluster id at each source and hence would only support master-master replication (cycles involving more that 2 nodes would lead to infinite loops).

        This seems like a lose-lose. The current functionality has the problem that 7709 is about and setting the config to false would just make it worse?

        Show
        jdcryans Jean-Daniel Cryans added a comment - So I'd like to introduce a config option: hbase.enable.cyclic.replication. The default is "true" to maintain the current functionality. If set to false we'd reset the cluster id at each source and hence would only support master-master replication (cycles involving more that 2 nodes would lead to infinite loops). This seems like a lose-lose. The current functionality has the problem that 7709 is about and setting the config to false would just make it worse?
        Hide
        lhofhansl Lars Hofhansl added a comment - - edited

        It would allow a A -> B <-> C scenario, which is currently not possible.
        At the same time it would break setups like A -> B -> C -> A

        Show
        lhofhansl Lars Hofhansl added a comment - - edited It would allow a A -> B <-> C scenario, which is currently not possible. At the same time it would break setups like A -> B -> C -> A
        Hide
        lhofhansl Lars Hofhansl added a comment - - edited

        In fact we will have the following setup:

        A <-> B, C <-> D, E <-> F, ... (where these are all pairs of DR clusters. We keep them both as master so that a failover for other reasons, even just as exercise does not need further configuration).
        We sometime migrate an entire cluster, say A. In that case we'd also replicate A -> C. Currently we can't do that, because the data from A would bounce between C and D forever.

        Show
        lhofhansl Lars Hofhansl added a comment - - edited In fact we will have the following setup: A <-> B, C <-> D, E <-> F, ... (where these are all pairs of DR clusters. We keep them both as master so that a failover for other reasons, even just as exercise does not need further configuration). We sometime migrate an entire cluster, say A. In that case we'd also replicate A -> C. Currently we can't do that, because the data from A would bounce between C and D forever.
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        Ah ok that's a fancy setup you got there. Sounds ok to me.

        Show
        jdcryans Jean-Daniel Cryans added a comment - Ah ok that's a fancy setup you got there. Sounds ok to me.
        Hide
        stack stack added a comment -

        What is to be done on this for 0.95.2?

        Show
        stack stack added a comment - What is to be done on this for 0.95.2?
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        I'm +1 on Lars Hofhansl's proposition, let's do it in a different jira?

        Show
        jdcryans Jean-Daniel Cryans added a comment - I'm +1 on Lars Hofhansl 's proposition, let's do it in a different jira?
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        I think for 0.95 and onwards, we should store relay cluster ids along replication path in HLogKey to solve the issue . Since the list of replay cluster ids is added into each WALEdit, the storage & network traffic overhead isn't trivial when we have a long replication path.

        We can use an optimization(mentioned above as adaptive #2). We introduce a 4 byte path checksum field into HLogKey, a cluster only adds its cluster id into the relay cluster id list when it finds there exist multiple paths from a single cluster id. In most cases such as a simple replication loop or a acyclic replication path, the relay cluster id list is empty. The overhead is just the 4 bytes path check sum.

        For 0.94, we can either use Lars approach(configuration option "hbase.enable.cyclic.replication") or introduce a new configuration option "hbase.replication.reset.clusterid=<clusterId which a user specifies | >". Only cluster specified here reset clusterId to itself. When hbase.replication.reset.clusterid=, it is equivalent to Lars approach.
        In addition, we can leverage existing field HLogKey.writeTime to detect loop in 0.94 if a WALEdit is stale too long for replication(like configurable 30mins). We can pass the writeTime as an attribute like the way cluster Id is passed during replication so that we can check the original writeTime to see if we have a possible infinite loop situation.

        Thanks.

        Show
        jeffreyz Jeffrey Zhong added a comment - I think for 0.95 and onwards, we should store relay cluster ids along replication path in HLogKey to solve the issue . Since the list of replay cluster ids is added into each WALEdit, the storage & network traffic overhead isn't trivial when we have a long replication path. We can use an optimization(mentioned above as adaptive #2). We introduce a 4 byte path checksum field into HLogKey, a cluster only adds its cluster id into the relay cluster id list when it finds there exist multiple paths from a single cluster id. In most cases such as a simple replication loop or a acyclic replication path, the relay cluster id list is empty. The overhead is just the 4 bytes path check sum. For 0.94, we can either use Lars approach(configuration option "hbase.enable.cyclic.replication") or introduce a new configuration option "hbase.replication.reset.clusterid=<clusterId which a user specifies | >". Only cluster specified here reset clusterId to itself. When hbase.replication.reset.clusterid= , it is equivalent to Lars approach. In addition, we can leverage existing field HLogKey.writeTime to detect loop in 0.94 if a WALEdit is stale too long for replication(like configurable 30mins). We can pass the writeTime as an attribute like the way cluster Id is passed during replication so that we can check the original writeTime to see if we have a possible infinite loop situation. Thanks.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Reading back through the comments here. How about we follow Ted's approach.
        Instead of writing a boolean we write the number of hops into the HLogKey. 0 will still be interpreted as false in the old code, >0 as true. Thus we can store the number hops and still not break the existing code (although the code would not be able to stop the bouncing).
        In our Salesforce scenario we would limit the hop count to 3 and would be able to support our setup that way.

        Yet another option is to make this configurable. At this point we're still able to fully bounce our clusters. So we can do the hop count and optionally (per a config option) store the full path, this might even be applicable to trunk as the user now has the choice between limiting loops to some limit with little extra storage or be precise at the expense of more storage.

        Show
        lhofhansl Lars Hofhansl added a comment - Reading back through the comments here. How about we follow Ted's approach. Instead of writing a boolean we write the number of hops into the HLogKey. 0 will still be interpreted as false in the old code, >0 as true. Thus we can store the number hops and still not break the existing code (although the code would not be able to stop the bouncing). In our Salesforce scenario we would limit the hop count to 3 and would be able to support our setup that way. Yet another option is to make this configurable. At this point we're still able to fully bounce our clusters. So we can do the hop count and optionally (per a config option) store the full path, this might even be applicable to trunk as the user now has the choice between limiting loops to some limit with little extra storage or be precise at the expense of more storage.
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        My impression regarding configuring path or hops count is that if you start changing the clusters the upkeep becomes very expensive and it's not clear what happens while it's being changed (or if a cluster just goes down).

        Show
        jdcryans Jean-Daniel Cryans added a comment - My impression regarding configuring path or hops count is that if you start changing the clusters the upkeep becomes very expensive and it's not clear what happens while it's being changed (or if a cluster just goes down).
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        Due to the 0.94 upgrade complications, the configuration approach is a practical one. Basically we shifts the duty to users to break infinite loop situation with the newly introduced configuration.

        Meantime we have to provide a way to detect possible infinite loop situation in 0.94 so that a user can act upon it. Using max hop counter is better only used for detection not the way to break infinite loop because it's error prone as JD pointed above when replication path changes.

        Show
        jeffreyz Jeffrey Zhong added a comment - Due to the 0.94 upgrade complications, the configuration approach is a practical one. Basically we shifts the duty to users to break infinite loop situation with the newly introduced configuration. Meantime we have to provide a way to detect possible infinite loop situation in 0.94 so that a user can act upon it. Using max hop counter is better only used for detection not the way to break infinite loop because it's error prone as JD pointed above when replication path changes.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Are you referring to switching via config option from storing just the hop count to storing the path?
        Yep, an admin would need to make the call and bounce the cluster (no rolling restart when that option is enabled the first time). Not ideal.
        I'm just looking for ways to avoid local Salesforce-only patches, but maybe backporting a trunk patch that stores the path would not be so bad (it sets a bad precedence here, though )

        The hop count we can always do safely (I think). In our case we'd enable the path option and bounce the cluster(s).

        Show
        lhofhansl Lars Hofhansl added a comment - Are you referring to switching via config option from storing just the hop count to storing the path? Yep, an admin would need to make the call and bounce the cluster (no rolling restart when that option is enabled the first time). Not ideal. I'm just looking for ways to avoid local Salesforce-only patches, but maybe backporting a trunk patch that stores the path would not be so bad (it sets a bad precedence here, though ) The hop count we can always do safely (I think). In our case we'd enable the path option and bounce the cluster(s).
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        Can we add the information of the clusterids which already contain the change to the scopes variable of the WALEdit? Scopes being a navigable map of byte array to integer can contain the byte array of the cluster ids to 1 (indicating the cluster has received the change already)?

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - Can we add the information of the clusterids which already contain the change to the scopes variable of the WALEdit? Scopes being a navigable map of byte array to integer can contain the byte array of the cluster ids to 1 (indicating the cluster has received the change already)?
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Jeffrey Zhong The hop count is always safe to do, no? We'd default it to a reasonably large value (say 1000). This should be immune to topology changes. Without it edits would bounce within the replication ring forever, with no way to stop it (other than disabling replication or deleting the WAL files), that is almost worse than downtime.

        Show
        lhofhansl Lars Hofhansl added a comment - Jeffrey Zhong The hop count is always safe to do, no? We'd default it to a reasonably large value (say 1000). This should be immune to topology changes. Without it edits would bounce within the replication ring forever , with no way to stop it (other than disabling replication or deleting the WAL files), that is almost worse than downtime.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        The scope idea might just work. It's only read and match against column families so as long as we prefix it with something that cannot be contained in a column family name.

        Show
        lhofhansl Lars Hofhansl added a comment - The scope idea might just work. It's only read and match against column families so as long as we prefix it with something that cannot be contained in a column family name.
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        Lars Hofhansl The hop count is only safe when using a big value. Let's say you use 1000. For 3 cluster situation, it means the same data will be re-written 300+ times before we stop. This affects a cluster's performance and slow down regular replication as well.

        Yeah, I think the scope idea can fly as column family only allows printable characters so it's possible to come up a special prefix character to store cluster id

        Show
        jeffreyz Jeffrey Zhong added a comment - Lars Hofhansl The hop count is only safe when using a big value. Let's say you use 1000. For 3 cluster situation, it means the same data will be re-written 300+ times before we stop. This affects a cluster's performance and slow down regular replication as well. Yeah, I think the scope idea can fly as column family only allows printable characters so it's possible to come up a special prefix character to store cluster id
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Vasu Mariyala, wanna work out a patch? We can work on that together if you like.

        Show
        lhofhansl Lars Hofhansl added a comment - Vasu Mariyala , wanna work out a patch? We can work on that together if you like.
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        Lars Hofhansl, working on it. Sure, will take your help.

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - Lars Hofhansl , working on it. Sure, will take your help.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        We also need to keep HBASE-9158 (a bug I just discovered). Here we need to group the edits by path and apply them strictly in these groups.

        Show
        lhofhansl Lars Hofhansl added a comment - We also need to keep HBASE-9158 (a bug I just discovered). Here we need to group the edits by path and apply them strictly in these groups.
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        Please review the patch on the top of 0.94.11.

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - Please review the patch on the top of 0.94.11.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12597230/HBASE-7709.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6689//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597230/HBASE-7709.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6689//console This message is automatically generated.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        HadoopQA only works against trunk. But since the trunk patch would presumably be quite a bit different it wouldn't help with 0.94.
        Will have a look at the patch over the weekend.

        Show
        lhofhansl Lars Hofhansl added a comment - HadoopQA only works against trunk. But since the trunk patch would presumably be quite a bit different it wouldn't help with 0.94. Will have a look at the patch over the weekend.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Patch looks good. Smaller than expected
        Two comments:

        1. Maybe change the new methods to addClusterId and getAllClusterIds?
        2. In ReplicationSink we need to group by unique path, I think, (just like I do now by ClusterId) so that the path is maintained during intermediary replication.
        Show
        lhofhansl Lars Hofhansl added a comment - Patch looks good. Smaller than expected Two comments: Maybe change the new methods to addClusterId and getAllClusterIds? In ReplicationSink we need to group by unique path, I think, (just like I do now by ClusterId) so that the path is maintained during intermediary replication.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        In any event. We should probably have separate issues for the "real" 0.95+ fix and the backwards compatible 0.94 patch. Maybe we can try to keep the API the same, or at least similar.

        Show
        lhofhansl Lars Hofhansl added a comment - In any event. We should probably have separate issues for the "real" 0.95+ fix and the backwards compatible 0.94 patch. Maybe we can try to keep the API the same, or at least similar.
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        Vasu Mariyala I reviewed your patch. The following lines:

        -      if (!logKey.getClusterId().equals(peerClusterId)) {
        +      // don't replicate if the log entries has already been consumed by the peer cluster
        +      if (!edit.hasClusterConsumed(peerClusterId)) {
        

        I think we still need keep the old check around. The above change may cause issues during upgrade. For example, we have ( A > B, B> A) replication setup. If we just upgrade cluster A, the above change may cause infinite loop before we upgrade cluster B.

        Rest code looks good to me.

        Show
        jeffreyz Jeffrey Zhong added a comment - Vasu Mariyala I reviewed your patch. The following lines: - if (!logKey.getClusterId().equals(peerClusterId)) { + // don't replicate if the log entries has already been consumed by the peer cluster + if (!edit.hasClusterConsumed(peerClusterId)) { I think we still need keep the old check around. The above change may cause issues during upgrade. For example, we have ( A > B, B > A) replication setup. If we just upgrade cluster A, the above change may cause infinite loop before we upgrade cluster B. Rest code looks good to me.
        Hide
        vmariyala Vasu Mariyala added a comment -

        Jeffrey Zhong

        The cluster ids in the WALEdit contains all the cluster ids including the cluster id on which the entry was created first.When ReplicationSource runs on A, it marks the cluster itself as consumed in the WALEdit. This entry would then be sent on to B. ReplicationSource running on B checks that the cluster A has already consumed and therefore does not send the entry back to A. If there are any possible scenarios where this would not work, please do let me know.

        Show
        vmariyala Vasu Mariyala added a comment - Jeffrey Zhong The cluster ids in the WALEdit contains all the cluster ids including the cluster id on which the entry was created first.When ReplicationSource runs on A, it marks the cluster itself as consumed in the WALEdit. This entry would then be sent on to B. ReplicationSource running on B checks that the cluster A has already consumed and therefore does not send the entry back to A. If there are any possible scenarios where this would not work, please do let me know.
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        Vasu Mariyala The above you mentioned will work if both A and B cluster are upgraded to the latest bits. Let's say cluster B is NOT upgraded yet while A is upgraded. When B send edits to A, the scope of a WALEdit won't have cluster Ids so A's WALEdits won't have cluster ID B in its scope. Then later the check edit.hasClusterConsumed(peerClusterId) in cluster A will evaluate as false and the edits of B will be sent back to cluster B.

        Show
        jeffreyz Jeffrey Zhong added a comment - Vasu Mariyala The above you mentioned will work if both A and B cluster are upgraded to the latest bits. Let's say cluster B is NOT upgraded yet while A is upgraded. When B send edits to A, the scope of a WALEdit won't have cluster Ids so A's WALEdits won't have cluster ID B in its scope. Then later the check edit.hasClusterConsumed(peerClusterId) in cluster A will evaluate as false and the edits of B will be sent back to cluster B.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12597557/HBASE-7709-rev1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6706//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597557/HBASE-7709-rev1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6706//console This message is automatically generated.
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        Thanks Lars Hofhansl and Jeffrey Zhong for the review comments. I have attached the patch HBASE-7709-rev1.patch with the fixes based on your comments.
        Also added a test cases for A -> B -> C -> A replication. Please do review this patch.

        For the 0.95+ and the trunk fixes, I would like to either remove/deprecate the setClusterId and getClusterId methods of Mutation and also HLogKey as these are primary maintained to avoid cyclic replication. Please do let me know your opinion on this so that I can work on providing the patches for the same.

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - Thanks Lars Hofhansl and Jeffrey Zhong for the review comments. I have attached the patch HBASE-7709 -rev1.patch with the fixes based on your comments. Also added a test cases for A -> B -> C -> A replication. Please do review this patch. For the 0.95+ and the trunk fixes, I would like to either remove/deprecate the setClusterId and getClusterId methods of Mutation and also HLogKey as these are primary maintained to avoid cyclic replication. Please do let me know your opinion on this so that I can work on providing the patches for the same.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12597567/HBASE-7709-rev1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6707//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597567/HBASE-7709-rev1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6707//console This message is automatically generated.
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        The logic here is getting big enough that it should be encapsulated, but I can't think right now if a nice way to do it.

        WALEdit will need more javadoc.

        The reason WALEdit.scopes wasn't instantiated is that you wouldn't need to keep an extra TreeMap around if replication wasn't enabled. It seems this patch changes that assumption. If it makes more sense to always instantiate it then it should be final, there are a bunch of "if (scopes == null)" that aren't needed anymore, and "if (scopes != null)" would always be true.

        A -> B -> C -> A replication

        I think you meant the last cluster to be B? It seems we should refactor TestMasterReplication a bit because with this patch it would just look like the same code is running 3 times (to the untrained eye).

        Show
        jdcryans Jean-Daniel Cryans added a comment - The logic here is getting big enough that it should be encapsulated, but I can't think right now if a nice way to do it. WALEdit will need more javadoc. The reason WALEdit.scopes wasn't instantiated is that you wouldn't need to keep an extra TreeMap around if replication wasn't enabled. It seems this patch changes that assumption. If it makes more sense to always instantiate it then it should be final, there are a bunch of "if (scopes == null)" that aren't needed anymore, and "if (scopes != null)" would always be true. A -> B -> C -> A replication I think you meant the last cluster to be B? It seems we should refactor TestMasterReplication a bit because with this patch it would just look like the same code is running 3 times (to the untrained eye).
        Hide
        davelatham Dave Latham added a comment - - edited

        Thanks all for the great work on this.

        We currently have a pair of clusters in two datacenters in a master/master setup and want to migrate one of them to a new datacenter. I'm trying to determine if this patch will be required for us and would love if someone would be willing to double check my thinking.

        Currently we have
        A -> B, B -> A

        1. Setup C, create presplit tables with replication_scope enabled on them.
        2. Add peer B -> C (New state A -> B, B -> A, B-> C)
        3. Copy table on each table from B -> C
        4. Stop applications in A
        5. Wait for queues from A -> B to clear
        6. Remove peer A -> B (New state B -> A, B-> C
        7. Remove peer B -> A (New state B -> C)
        8. Add peer C -> B (New state B -> C, C -> B)
        9. Start applications in C

        Given that we can live with applications only running in a single datacenter for a period of time we don't ever need to have writes from one cluster replicate to a downstream loop. Therefore I don't think this patch is required for this migration. Does that sound correct? So does the state of (A <-> B) -> C still trigger the problem?

        Edit by LarsH fix formatting.

        Show
        davelatham Dave Latham added a comment - - edited Thanks all for the great work on this. We currently have a pair of clusters in two datacenters in a master/master setup and want to migrate one of them to a new datacenter. I'm trying to determine if this patch will be required for us and would love if someone would be willing to double check my thinking. Currently we have A -> B, B -> A 1. Setup C, create presplit tables with replication_scope enabled on them. 2. Add peer B -> C (New state A -> B, B -> A, B-> C) 3. Copy table on each table from B -> C 4. Stop applications in A 5. Wait for queues from A -> B to clear 6. Remove peer A -> B (New state B -> A, B-> C 7. Remove peer B -> A (New state B -> C) 8. Add peer C -> B (New state B -> C, C -> B) 9. Start applications in C Given that we can live with applications only running in a single datacenter for a period of time we don't ever need to have writes from one cluster replicate to a downstream loop. Therefore I don't think this patch is required for this migration. Does that sound correct? So does the state of (A <-> B) -> C still trigger the problem? Edit by LarsH fix formatting.
        Hide
        lhofhansl Lars Hofhansl added a comment - - edited

        You are correct, you do not need this patch as along as you do step #6 before step #8. (A <-> B) -> C is fine. C -> (A <-> B) is not.

        Edit: Formatting again.

        Show
        lhofhansl Lars Hofhansl added a comment - - edited You are correct, you do not need this patch as along as you do step #6 before step #8. (A <-> B) -> C is fine. C -> (A <-> B) is not. Edit: Formatting again.
        Hide
        vmariyala Vasu Mariyala added a comment -

        Sorry for the confusion in the earlier emails. Yes, the test case was for A -> B -> C -> B.

        Attaching the patch for 0.94 (revision 2)

        Show
        vmariyala Vasu Mariyala added a comment - Sorry for the confusion in the earlier emails. Yes, the test case was for A -> B -> C -> B. Attaching the patch for 0.94 (revision 2)
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12598350/HBASE-7709-rev2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6775//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598350/HBASE-7709-rev2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6775//console This message is automatically generated.
        Hide
        vmariyala Vasu Mariyala added a comment -

        Attaching the patch for 0.95 release

        Show
        vmariyala Vasu Mariyala added a comment - Attaching the patch for 0.95 release
        Hide
        vmariyala Vasu Mariyala added a comment -

        0.95 patch works for trunk as well.

        Show
        vmariyala Vasu Mariyala added a comment - 0.95 patch works for trunk as well.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12598360/HBASE-7709-095.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.regionserver.wal.TestHLog
        org.apache.hadoop.hbase.migration.TestNamespaceUpgrade

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598360/HBASE-7709-095.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestHLog org.apache.hadoop.hbase.migration.TestNamespaceUpgrade Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//console This message is automatically generated.
        Hide
        vmariyala Vasu Mariyala added a comment -

        I ran all the test cases on my local machine with the trunk patch and they are successful. But everytime it is run on jenkins, it throws

        FATAL: Unable to delete script file /tmp/hudson5964600500647866956.sh
        hudson.util.IOException2: remote file operation failed: /tmp/hudson5964600500647866956.sh at hudson.remoting.Channel@5ce45886:hadoop1
        at hudson.FilePath.act(FilePath.java:902)
        at hudson.FilePath.act(FilePath.java:879)
        at hudson.FilePath.delete(FilePath.java:1288)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
        at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
        at hudson.model.Build$BuildExecution.build(Build.java:199)
        at hudson.model.Build$BuildExecution.doRun(Build.java:160)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
        at hudson.model.Run.execute(Run.java:1597)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:247)
        Caused by: hudson.remoting.ChannelClosedException: channel is already closed
        at hudson.remoting.Channel.send(Channel.java:516)
        at hudson.remoting.Request.call(Request.java:129)
        at hudson.remoting.Channel.call(Channel.java:714)
        at hudson.FilePath.act(FilePath.java:895)
        ... 13 more
        Caused by: java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
        Caused by: java.io.EOFException
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at hudson.remoting.Command.readFrom(Command.java:92)
        at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
        FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
        hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
        at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
        at hudson.remoting.Request.call(Request.java:174)
        at hudson.remoting.Channel.call(Channel.java:714)
        at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167)
        at com.sun.proxy.$Proxy40.join(Unknown Source)
        at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:925)
        at hudson.Launcher$ProcStarter.join(Launcher.java:360)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
        at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
        at hudson.model.Build$BuildExecution.build(Build.java:199)
        at hudson.model.Build$BuildExecution.doRun(Build.java:160)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
        at hudson.model.Run.execute(Run.java:1597)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:247)
        Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.Request.abort(Request.java:299)
        at hudson.remoting.Channel.terminate(Channel.java:774)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
        Caused by: java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
        Caused by: java.io.EOFException
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at hudson.remoting.Command.readFrom(Command.java:92)
        at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

        https://builds.apache.org/job/PreCommit-HBASE-Build/6784/console
        https://builds.apache.org/job/PreCommit-HBASE-Build/6781/console

        Can any one please let me how I can resolve this issue?

        Show
        vmariyala Vasu Mariyala added a comment - I ran all the test cases on my local machine with the trunk patch and they are successful. But everytime it is run on jenkins, it throws FATAL: Unable to delete script file /tmp/hudson5964600500647866956.sh hudson.util.IOException2: remote file operation failed: /tmp/hudson5964600500647866956.sh at hudson.remoting.Channel@5ce45886:hadoop1 at hudson.FilePath.act(FilePath.java:902) at hudson.FilePath.act(FilePath.java:879) at hudson.FilePath.delete(FilePath.java:1288) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1597) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:247) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:516) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:714) at hudson.FilePath.act(FilePath.java:895) ... 13 more Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:714) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167) at com.sun.proxy.$Proxy40.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:925) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1597) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:247) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:774) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) https://builds.apache.org/job/PreCommit-HBASE-Build/6784/console https://builds.apache.org/job/PreCommit-HBASE-Build/6781/console Can any one please let me how I can resolve this issue?
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12598500/095-trunk.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.client.TestAdmin

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598500/095-trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12598525/0.95-trunk-rev1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598525/0.95-trunk-rev1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//console This message is automatically generated.
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        0.95-trunk-rev1.patch contains the javadoc fix and is the latest patch for the trunk and 0.95 branches. HBASE-7709-rev2.patch is the updated patch on the top of 0.94 which addresses the comments made by Lars Hofhansl, Jean-Daniel Cryans and Jeffrey Zhong

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - 0.95-trunk-rev1.patch contains the javadoc fix and is the latest patch for the trunk and 0.95 branches. HBASE-7709 -rev2.patch is the updated patch on the top of 0.94 which addresses the comments made by Lars Hofhansl , Jean-Daniel Cryans and Jeffrey Zhong
        Hide
        lhofhansl Lars Hofhansl added a comment -

        The 0.94 patch looks good. Bit large, but then again this is a bad bug to have (when it hits you you'll useless load on your cluster forever, throwing your versions off, etc).
        Nice refactoring of the replication test.

        Few nits:

        • PREFIX_CLUSTER_KEY in WALEdit could just be '_', right? No need to store that longer prefix everywhere.
        • Similarly maybe make PREFIX_CONSUMED_CLUSTER_IDS in Mutation just "_cs.id"
        • The comment for scopes in WALEdit could be a bit more explicit that we're overloading scopes with the cluster id for backwards compatibility.

        +1 otherwise (assuming the full 0.94 test suite passes)

        Looking at trunk patch now.

        Show
        lhofhansl Lars Hofhansl added a comment - The 0.94 patch looks good. Bit large, but then again this is a bad bug to have (when it hits you you'll useless load on your cluster forever, throwing your versions off, etc). Nice refactoring of the replication test. Few nits: PREFIX_CLUSTER_KEY in WALEdit could just be '_', right? No need to store that longer prefix everywhere. Similarly maybe make PREFIX_CONSUMED_CLUSTER_IDS in Mutation just "_cs.id" The comment for scopes in WALEdit could be a bit more explicit that we're overloading scopes with the cluster id for backwards compatibility. +1 otherwise (assuming the full 0.94 test suite passes) Looking at trunk patch now.
        Hide
        lhofhansl Lars Hofhansl added a comment - - edited

        In trunk:

        • should repeated UUID clusters = 8 be optional in WAL.proto? Otherwise we can't read old log entries. But maybe that's not a problem...?
        • in Import:
          +        clusters = new HashSet<UUID>();
          +        clusters.add(ZKClusterId.getUUIDForCluster(zkw));
          

          Can be written as cluster = Collections.Collections.singleton(ZKClusterId.getUUIDForCluster(zkw))

        • Is this right?
          +      for(UUID clusterId : key.getClusters()) {
                   uuidBuilder.setLeastSigBits(clusterId.getLeastSignificantBits());
                   uuidBuilder.setMostSigBits(clusterId.getMostSignificantBits());
          +        keyBuilder.addClusters(uuidBuilder.build());
          

          addClusters expects a Set.

        • Where is HlogKey.PREFIX_CLUSTER_KEY used? Just to read old versions of WALEdits? Need to discuss if that is necessary. stack? This has to do with upgrading WALEdits from pre 0.95.

        Otherwise looks great.

        Show
        lhofhansl Lars Hofhansl added a comment - - edited In trunk: should repeated UUID clusters = 8 be optional in WAL.proto? Otherwise we can't read old log entries. But maybe that's not a problem...? in Import: + clusters = new HashSet<UUID>(); + clusters.add(ZKClusterId.getUUIDForCluster(zkw)); Can be written as cluster = Collections.Collections.singleton(ZKClusterId.getUUIDForCluster(zkw)) Is this right? + for (UUID clusterId : key.getClusters()) { uuidBuilder.setLeastSigBits(clusterId.getLeastSignificantBits()); uuidBuilder.setMostSigBits(clusterId.getMostSignificantBits()); + keyBuilder.addClusters(uuidBuilder.build()); addClusters expects a Set. Where is HlogKey.PREFIX_CLUSTER_KEY used? Just to read old versions of WALEdits? Need to discuss if that is necessary. stack ? This has to do with upgrading WALEdits from pre 0.95. Otherwise looks great.
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        Attached the patches for 0.94 (HBASE-7709-rev3.patch) and 0.95, trunk(0.95-trunk-rev2.patch) which addresses the nits mentioned by Lars

        0.94

        a) Changed PREFIX_CLUSTER_KEY to '.' (period as the column family names can't start with it)

        b) PREFIX_CONSUMED_CLUSTER_IDS changed to "_cs.id"

        c) A comment has been added in WALEdit mentioning that it is done for backwards compatibility and has been removed in 0.95.2+ releases

        trunk/0.95

        a) From protobuf documentation

        "repeated: this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.".
        "optional: a well-formed message can have zero or one of this field (but not more than one)."

        So does repeated imply it is optional? Also, from the WALProtos.java the clusters list is initialized to empty list in the initFields() method so we would not get any NullPointerException. May be, I would do more research on this.

        b) clusters in Import has been changed to use singleton

        c) addClusters has a method public Builder addClusters(org.apache.hadoop.hbase.protobuf.generated.HBaseProtos.UUID value) which takes the UUID as the parameter.

        d) Yes, this is used only to read the older log entries when migrating from 0.94 to 0.95.2.

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - Attached the patches for 0.94 ( HBASE-7709 -rev3.patch) and 0.95, trunk(0.95-trunk-rev2.patch) which addresses the nits mentioned by Lars 0.94 a) Changed PREFIX_CLUSTER_KEY to '.' (period as the column family names can't start with it) b) PREFIX_CONSUMED_CLUSTER_IDS changed to "_cs.id" c) A comment has been added in WALEdit mentioning that it is done for backwards compatibility and has been removed in 0.95.2+ releases trunk/0.95 a) From protobuf documentation "repeated: this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.". "optional: a well-formed message can have zero or one of this field (but not more than one)." So does repeated imply it is optional? Also, from the WALProtos.java the clusters list is initialized to empty list in the initFields() method so we would not get any NullPointerException. May be, I would do more research on this. b) clusters in Import has been changed to use singleton c) addClusters has a method public Builder addClusters(org.apache.hadoop.hbase.protobuf.generated.HBaseProtos.UUID value) which takes the UUID as the parameter. d) Yes, this is used only to read the older log entries when migrating from 0.94 to 0.95.2.
        Hide
        v.himanshu Himanshu Vashishtha added a comment -

        + repeated UUID clusters = 8;

        /*

        • optional CustomEntryType custom_entry_type = 8;
          -
          + optional CustomEntryType custom_entry_type = 9;

        This re-ordering good because 0.96.0 is not released yet?

        I think we should have the flexibility to read older edits as a clean shutdown is a stringent requirement (especially for larger clusters). Also when replication is enabled, there may be some old logs left to replicate.

        Show
        v.himanshu Himanshu Vashishtha added a comment - + repeated UUID clusters = 8; /* optional CustomEntryType custom_entry_type = 8; - + optional CustomEntryType custom_entry_type = 9; This re-ordering good because 0.96.0 is not released yet? I think we should have the flexibility to read older edits as a clean shutdown is a stringent requirement (especially for larger clusters). Also when replication is enabled, there may be some old logs left to replicate.
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        Himanshu Vashishtha There is no re-ordering done with the patch. The entry "custom_entry_type" is and was a commented one. I changed the number to 9 just incase if some one un-comments it in the future. Please let me know if I miss anything

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - Himanshu Vashishtha There is no re-ordering done with the patch. The entry "custom_entry_type" is and was a commented one. I changed the number to 9 just incase if some one un-comments it in the future. Please let me know if I miss anything
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12599256/0.95-trunk-rev2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12599256/0.95-trunk-rev2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//console This message is automatically generated.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -
        +  public void setClusters(Set<UUID> clusterIds) {
        

        Would setClusterIds() be better name for the above method ?

        +   * @return the set of clusters that have consumed the mutation
        

        'set of clusters' -> 'set of cluster Ids'

        +  public Set<UUID> getClusters() {
        

        getClusters -> getClusterIds

        -    private UUID clusterId;
        +    private Set<UUID> clusters;
        

        clusters -> clusterIds

        If you agree with the above comments, please modify names in other places as well.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - + public void setClusters(Set<UUID> clusterIds) { Would setClusterIds() be better name for the above method ? + * @ return the set of clusters that have consumed the mutation 'set of clusters' -> 'set of cluster Ids' + public Set<UUID> getClusters() { getClusters -> getClusterIds - private UUID clusterId; + private Set<UUID> clusters; clusters -> clusterIds If you agree with the above comments, please modify names in other places as well.
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        Thanks Ted Yu for the review. Yes, I feel clusterIds may be better than just clusters. Attached the patches HBASE-7709-rev4.patch (0.94) and 0.95-trunk-rev3.patch (0.95 & trunk) which contain the method name changes.

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - Thanks Ted Yu for the review. Yes, I feel clusterIds may be better than just clusters. Attached the patches HBASE-7709 -rev4.patch (0.94) and 0.95-trunk-rev3.patch (0.95 & trunk) which contain the method name changes.
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        I reviewed the trunk patch. One thing I noticed that the trunk patch deprecats clusterId and related code. I think we should still keep it around. The reason is that one of the semantics of clusterId is the "Original ClusterId" where the changes are generated. This information will be very useful when we build monitoring dashboard to show how many edits from each source cluster. Similarly we could combine the original cluster Id and write time to know replication latency from source to current cluster.
        The rest looks good. Thanks.

        Show
        jeffreyz Jeffrey Zhong added a comment - I reviewed the trunk patch. One thing I noticed that the trunk patch deprecats clusterId and related code. I think we should still keep it around. The reason is that one of the semantics of clusterId is the "Original ClusterId" where the changes are generated. This information will be very useful when we build monitoring dashboard to show how many edits from each source cluster. Similarly we could combine the original cluster Id and write time to know replication latency from source to current cluster. The rest looks good. Thanks.
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        Jeffrey Zhong This cluster information is only stored as part of the HLog and it gets rolled. So do you think it is the place from where we read the information about the originating cluster to build such metrics?

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - Jeffrey Zhong This cluster information is only stored as part of the HLog and it gets rolled. So do you think it is the place from where we read the information about the originating cluster to build such metrics?
        Hide
        lhofhansl Lars Hofhansl added a comment -

        This does raise a good point. Maybe we should store the cluster ids in order of traversal. That would later allow us to reconstruct the replication path between clusters and display it in the shell.

        Show
        lhofhansl Lars Hofhansl added a comment - This does raise a good point. Maybe we should store the cluster ids in order of traversal. That would later allow us to reconstruct the replication path between clusters and display it in the shell.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        we should store the cluster ids in order of traversal

        +1

        Show
        yuzhihong@gmail.com Ted Yu added a comment - we should store the cluster ids in order of traversal +1
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        Attaching the patches 0.95-trunk-rev4.patch (0.95 and trunk) which stores the clusters as a list rather than set. The first cluster id in the list is the originating cluster and the subsequent entries indicate replication path.

        The patch HBASE-7709-rev5.patch (0.94) has the changes to ensure the api of 0.94 is the same as the api of 0.95 and trunk.

        These patches primarily address the monitoring issues mentioned by Jeffrey Zhong

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - Attaching the patches 0.95-trunk-rev4.patch (0.95 and trunk) which stores the clusters as a list rather than set. The first cluster id in the list is the originating cluster and the subsequent entries indicate replication path. The patch HBASE-7709 -rev5.patch (0.94) has the changes to ensure the api of 0.94 is the same as the api of 0.95 and trunk. These patches primarily address the monitoring issues mentioned by Jeffrey Zhong
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12599751/HBASE-7709-rev5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6865//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12599751/HBASE-7709-rev5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6865//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12599751/HBASE-7709-rev5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6866//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12599751/HBASE-7709-rev5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6866//console This message is automatically generated.
        Hide
        vmariyala Vasu Mariyala added a comment -

        Re-attaching the patch version 4 so that hadoop qa can run

        Show
        vmariyala Vasu Mariyala added a comment - Re-attaching the patch version 4 so that hadoop qa can run
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12599751/HBASE-7709-rev5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6896//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12599751/HBASE-7709-rev5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6896//console This message is automatically generated.
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        The patch HBASE-7709-rev5.patch is on the top of 0.94 and hence the hadoop qa would always fail while applying this patch on trunk. Can any one please run the hadoop qa build for the patch "0.95-trunk-rev4.patch" (which is the trunk and 0.95 patch)?

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - The patch HBASE-7709 -rev5.patch is on the top of 0.94 and hence the hadoop qa would always fail while applying this patch on trunk. Can any one please run the hadoop qa build for the patch "0.95-trunk-rev4.patch" (which is the trunk and 0.95 patch)?
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Please attach 0.95-trunk-rev4.patch one more time - Hadoop QA picks up the latest attachment

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Please attach 0.95-trunk-rev4.patch one more time - Hadoop QA picks up the latest attachment
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        Attaching the patch again

        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - Attaching the patch again
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12599990/0.95-trunk-rev4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12599990/0.95-trunk-rev4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//console This message is automatically generated.
        Hide
        vasu.mariyala@gmail.com Vasu Mariyala added a comment -

        The release audit warnings are not related to the patch. This has to do with the missing licenses in the below files. After correcting the license info in these files, the release audit is successful

        *******************************
        
        Unapproved licenses:
        
          /home/vmariyala/bigdata-dev/testhbase/hbase-server/src/main/resources/hbase-webapps/static/css/bootstrap-theme.min.css
          /home/vmariyala/bigdata-dev/testhbase/hbase-server/src/main/resources/hbase-webapps/static/css/bootstrap-theme.css
        
        *******************************
        
        Show
        vasu.mariyala@gmail.com Vasu Mariyala added a comment - The release audit warnings are not related to the patch. This has to do with the missing licenses in the below files. After correcting the license info in these files, the release audit is successful ******************************* Unapproved licenses: /home/vmariyala/bigdata-dev/testhbase/hbase-server/src/main/resources/hbase-webapps/ static /css/bootstrap-theme.min.css /home/vmariyala/bigdata-dev/testhbase/hbase-server/src/main/resources/hbase-webapps/ static /css/bootstrap-theme.css *******************************
        Hide
        jeffreyz Jeffrey Zhong added a comment -

        I reviewed 0.94 and trunk patch. They both looks good to me! +1 from me. Thanks.
        In the trunk, we currently carry all clusterIds in the replication path and we could optimize this later when there is a need.

        Show
        jeffreyz Jeffrey Zhong added a comment - I reviewed 0.94 and trunk patch. They both looks good to me! +1 from me. Thanks. In the trunk, we currently carry all clusterIds in the replication path and we could optimize this later when there is a need.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        +1

        Show
        yuzhihong@gmail.com Ted Yu added a comment - +1
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        +1

        Show
        jdcryans Jean-Daniel Cryans added a comment - +1
        Hide
        stack stack added a comment -

        Applied to 0.95 and to trunk. Want this in 0.94 Lars Hofhansl? Vasu Mariyala Thanks boss. Any chance of a release note on this issue?

        Show
        stack stack added a comment - Applied to 0.95 and to trunk. Want this in 0.94 Lars Hofhansl ? Vasu Mariyala Thanks boss. Any chance of a release note on this issue?
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Yeah, will sync up with Vasu off line and probably commit to 0.94 soon.

        Show
        lhofhansl Lars Hofhansl added a comment - Yeah, will sync up with Vasu off line and probably commit to 0.94 soon.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Discussed offline with Vasu.
        Proposing one small change: Don't store the first cluster in an edit twice. So the clusterID as used now holds the first cluster id and the new clusterIds in Mutation and Scopes on the WALEdit hold the 2nd and 3rd cluster ID if any.
        As master <-> master replication will be the most common scenario this seems an important optimization.

        Show
        lhofhansl Lars Hofhansl added a comment - Discussed offline with Vasu. Proposing one small change: Don't store the first cluster in an edit twice. So the clusterID as used now holds the first cluster id and the new clusterIds in Mutation and Scopes on the WALEdit hold the 2nd and 3rd cluster ID if any. As master <-> master replication will be the most common scenario this seems an important optimization.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Will commit to 0.94 later today if there are no objections.

        Show
        lhofhansl Lars Hofhansl added a comment - Will commit to 0.94 later today if there are no objections.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12600457/7709-0.94-rev6.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6953//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12600457/7709-0.94-rev6.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6953//console This message is automatically generated.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in hbase-0.95 #500 (See https://builds.apache.org/job/hbase-0.95/500/)
        HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518334)

        • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java
        • /hbase/branches/0.95/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
        • /hbase/branches/0.95/hbase-protocol/src/main/protobuf/WAL.proto
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java
        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in hbase-0.95 #500 (See https://builds.apache.org/job/hbase-0.95/500/ ) HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518334) /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java /hbase/branches/0.95/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java /hbase/branches/0.95/hbase-protocol/src/main/protobuf/WAL.proto /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK #4441 (See https://builds.apache.org/job/HBase-TRUNK/4441/)
        HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518335)

        • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java
        • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
        • /hbase/trunk/hbase-protocol/src/main/protobuf/WAL.proto
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4441 (See https://builds.apache.org/job/HBase-TRUNK/4441/ ) HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518335) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java /hbase/trunk/hbase-protocol/src/main/protobuf/WAL.proto /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Committed to 0.94 as well. Many thanks for the great patch, Vasu!

        Show
        lhofhansl Lars Hofhansl added a comment - Committed to 0.94 as well. Many thanks for the great patch, Vasu!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in hbase-0.95-on-hadoop2 #276 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/276/)
        HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518334)

        • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java
        • /hbase/branches/0.95/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
        • /hbase/branches/0.95/hbase-protocol/src/main/protobuf/WAL.proto
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java
        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in hbase-0.95-on-hadoop2 #276 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/276/ ) HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518334) /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java /hbase/branches/0.95/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java /hbase/branches/0.95/hbase-protocol/src/main/protobuf/WAL.proto /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94-security #274 (See https://builds.apache.org/job/HBase-0.94-security/274/)
        HBASE-7709 Infinite loop possible in Master/Master replication (Vasu Mariyala) (larsh: rev 1518410)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Mutation.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in HBase-0.94-security #274 (See https://builds.apache.org/job/HBase-0.94-security/274/ ) HBASE-7709 Infinite loop possible in Master/Master replication (Vasu Mariyala) (larsh: rev 1518410) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Mutation.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94 #1128 (See https://builds.apache.org/job/HBase-0.94/1128/)
        HBASE-7709 Infinite loop possible in Master/Master replication (Vasu Mariyala) (larsh: rev 1518410)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Mutation.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in HBase-0.94 #1128 (See https://builds.apache.org/job/HBase-0.94/1128/ ) HBASE-7709 Infinite loop possible in Master/Master replication (Vasu Mariyala) (larsh: rev 1518410) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Mutation.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #700 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/700/)
        HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518335)

        • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java
        • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
        • /hbase/trunk/hbase-protocol/src/main/protobuf/WAL.proto
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #700 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/700/ ) HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518335) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java /hbase/trunk/hbase-protocol/src/main/protobuf/WAL.proto /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java

          People

          • Assignee:
            vmariyala Vasu Mariyala
            Reporter:
            lhofhansl Lars Hofhansl
          • Votes:
            0 Vote for this issue
            Watchers:
            24 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development