HBase
  1. HBase
  2. HBASE-6165

Replication can overrun .META. scans on cluster re-start

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined.

      1. 6165-v6.txt
        17 kB
        Lars Hofhansl
      2. HBase-6165-94-v1.patch
        16 kB
        Himanshu Vashishtha
      3. HBase-6165-94-v2.patch
        16 kB
        Himanshu Vashishtha
      4. HBase-6165-v1.patch
        6 kB
        Himanshu Vashishtha
      5. HBase-6165-v2.patch
        6 kB
        Himanshu Vashishtha
      6. HBase-6165-v3.patch
        14 kB
        Himanshu Vashishtha
      7. HBase-6165-v4.patch
        14 kB
        Himanshu Vashishtha
      8. HBase-6165-v5.patch
        14 kB
        Himanshu Vashishtha

        Activity

        Hide
        Lars Hofhansl added a comment -

        What's a good approach to avoid this?

        Show
        Lars Hofhansl added a comment - What's a good approach to avoid this?
        Hide
        Elliott Clark added a comment -

        Upping the number of privileged ipc threads is the workaround that we're going to deploy soon.

        Show
        Elliott Clark added a comment - Upping the number of privileged ipc threads is the workaround that we're going to deploy soon.
        Hide
        Jean-Daniel Cryans added a comment -

        The other solution is to have a different set of handlers, but this requires to either hack HBaseServer to add another queue and priority level or refactor it to make it more configurable.

        Show
        Jean-Daniel Cryans added a comment - The other solution is to have a different set of handlers, but this requires to either hack HBaseServer to add another queue and priority level or refactor it to make it more configurable.
        Hide
        Himanshu Vashishtha added a comment -

        I hit this problem while testing a long running replication setup. All priority handlers were blocked by replicationLog method, and cluster became unresponsive.

        Attached is a patch which does the following:
        a) Add a differnt QOS level, customQOS. Methods with this attribute will be processed by a new set of handlers.
        b) Adds customPriorityHandlers, a new set of handlers in Regionserver.

        ReplicationSink#replicateLogEntries uses this attribute.

        Testing: Jenkins is green. Have a long running replication setup, and its up for a few days.

        Show
        Himanshu Vashishtha added a comment - I hit this problem while testing a long running replication setup. All priority handlers were blocked by replicationLog method, and cluster became unresponsive. Attached is a patch which does the following: a) Add a differnt QOS level, customQOS. Methods with this attribute will be processed by a new set of handlers. b) Adds customPriorityHandlers, a new set of handlers in Regionserver. ReplicationSink#replicateLogEntries uses this attribute. Testing: Jenkins is green. Have a long running replication setup, and its up for a few days.
        Hide
        Elliott Clark added a comment -

        A better name is probably needed for the Queue. Custom doesn't really get across what's can go into that qos level (replication).
        Since this starts 0 "custom" priority handlers by default it will add another undocumented step when enabling replication. We should either make the number of handlers start by default > 0, or have the number depend on if replication is enabled.
        Why choose the number 5 for the priority ? Since the QOS_THRESHOLD is 10. (Even if they are arbitrary seems like we should have some reason and a comment about the numbering scheme.)

        Thanks for doing this.

        Show
        Elliott Clark added a comment - A better name is probably needed for the Queue. Custom doesn't really get across what's can go into that qos level (replication). Since this starts 0 "custom" priority handlers by default it will add another undocumented step when enabling replication. We should either make the number of handlers start by default > 0, or have the number depend on if replication is enabled. Why choose the number 5 for the priority ? Since the QOS_THRESHOLD is 10. (Even if they are arbitrary seems like we should have some reason and a comment about the numbering scheme.) Thanks for doing this.
        Hide
        Lars Hofhansl added a comment -

        Patch looks good generally. Few comments:

        1. The naming is weird. These are not "Custom"QOS, but "Medium"QOS methods, right?
        2. Is there a way to generalize this to sets of Handlers with different priority (not important, though).
        3. By default now (if hbase.regionserver.custom.priority.handler.count is not set), replicateWALEntry would use non-priority handlers... Which is not right, I think. It should revert back to the current behavior in that case (which is to do use the priorityQOS.

        What I still do not understand... Does this problem always happen? Does it happen because replicateWALEntry takes too long to finish? Does this only happen when the slave is already degraded for other reasons? Should we also work on replicateWALEntry failing faster in case of problems (shorter/fewer retries, etc)?

        Show
        Lars Hofhansl added a comment - Patch looks good generally. Few comments: The naming is weird. These are not "Custom"QOS, but "Medium"QOS methods, right? Is there a way to generalize this to sets of Handlers with different priority (not important, though). By default now (if hbase.regionserver.custom.priority.handler.count is not set), replicateWALEntry would use non-priority handlers... Which is not right, I think. It should revert back to the current behavior in that case (which is to do use the priorityQOS. What I still do not understand... Does this problem always happen? Does it happen because replicateWALEntry takes too long to finish? Does this only happen when the slave is already degraded for other reasons? Should we also work on replicateWALEntry failing faster in case of problems (shorter/fewer retries, etc)?
        Hide
        Ted Yu added a comment -

        w.r.t. default value for hbase.regionserver.custom.priority.handler.count, I agree with Lars and Elliot that the default should be > 0.
        Actually we should perform check on the actual value: if user specifies 0 and either replication or security is enabled, we should raise the value to, say, 3.

        Show
        Ted Yu added a comment - w.r.t. default value for hbase.regionserver.custom.priority.handler.count, I agree with Lars and Elliot that the default should be > 0. Actually we should perform check on the actual value: if user specifies 0 and either replication or security is enabled, we should raise the value to, say, 3.
        Hide
        Elliott Clark added a comment -

        @Lars
        We had this happen when a large cluster is replication to a small cluster.
        Source (Large Cluster)
        Sink (Small cluster)

        After the sink goes down or re-starts, the source waits for meta to come up. After that lots of replicate wal edits are shipped to all the server. So many in fact that the server holding meta does not have any left to answer meta scans or edits.

        Show
        Elliott Clark added a comment - @Lars We had this happen when a large cluster is replication to a small cluster. Source (Large Cluster) Sink (Small cluster) After the sink goes down or re-starts, the source waits for meta to come up. After that lots of replicate wal edits are shipped to all the server. So many in fact that the server holding meta does not have any left to answer meta scans or edits.
        Hide
        Himanshu Vashishtha added a comment -

        Elliott Clark: I used custom, because the current naming scheme is not appropriate in my opinion (I started with medium/semi QOS, but then changed it to Custom). Using priority is kind of a misnomer as there is no priority as such, its just different set of handlers that is serving the requests.
        Though we call them priorityHandlers, etc, they are just like regular handlers but for meta operations. I think we should change their name to metaOpsHandlers (or metaHandlers). Yea, I just used a threshold b/w 0 and 10.

        Since this starts 0 "custom" priority handlers by default it will add another undocumented step when enabling replication. We should either make the number of handlers start by default > 0, or have the number depend on if replication is enabled.

        I am ok with >0 default; don't think it should be tied to replication as they can be used for other methods too (such as Security, etc)

        @Lars:

        The naming is weird. These are not "Custom"QOS, but "Medium"QOS methods, right?

        Hope you find it rationale now.

        By default now (if hbase.regionserver.custom.priority.handler.count is not set), replicateWALEntry would use non-priority handlers... Which is not right, I think. It should revert back to the current behavior in that case (which is to do use the priorityQOS.

        default > 0 sounds good?

        What I still do not understand... Does this problem always happen? Does it happen because replicateWALEntry takes too long to finish? Does this only happen when the slave is already degraded for other reasons? Should we also work on replicateWALEntry failing faster in case of problems (shorter/fewer retries, etc)?

        It can occur when the slave cluster is slow. And whenever it happens, it will make the entire cluster unresponsive. I have a patch which adds the fail fast behavior in sink and has been testing it too. It looks good so far. I tried creating a new JIRA but IOE while creating it (see INFRA-5131). Will attach the patch once its created.

        Show
        Himanshu Vashishtha added a comment - Elliott Clark : I used custom, because the current naming scheme is not appropriate in my opinion (I started with medium/semi QOS, but then changed it to Custom). Using priority is kind of a misnomer as there is no priority as such, its just different set of handlers that is serving the requests. Though we call them priorityHandlers, etc, they are just like regular handlers but for meta operations. I think we should change their name to metaOpsHandlers (or metaHandlers). Yea, I just used a threshold b/w 0 and 10. Since this starts 0 "custom" priority handlers by default it will add another undocumented step when enabling replication. We should either make the number of handlers start by default > 0, or have the number depend on if replication is enabled. I am ok with >0 default; don't think it should be tied to replication as they can be used for other methods too (such as Security, etc) @Lars: The naming is weird. These are not "Custom"QOS, but "Medium"QOS methods, right? Hope you find it rationale now. By default now (if hbase.regionserver.custom.priority.handler.count is not set), replicateWALEntry would use non-priority handlers... Which is not right, I think. It should revert back to the current behavior in that case (which is to do use the priorityQOS. default > 0 sounds good? What I still do not understand... Does this problem always happen? Does it happen because replicateWALEntry takes too long to finish? Does this only happen when the slave is already degraded for other reasons? Should we also work on replicateWALEntry failing faster in case of problems (shorter/fewer retries, etc)? It can occur when the slave cluster is slow. And whenever it happens, it will make the entire cluster unresponsive. I have a patch which adds the fail fast behavior in sink and has been testing it too. It looks good so far. I tried creating a new JIRA but IOE while creating it (see INFRA-5131 ). Will attach the patch once its created.
        Hide
        Lars Hofhansl added a comment -

        @Himanshu: Thanks. Yes makes sense. I like MetaHandlers.
        Re: failing fast: I think instead of using an HTablePool the sink should create a Connection and ThreadPool and then create HTable on demand using these (see: HBASE-4805), together with short timeouts and few retries.

        Show
        Lars Hofhansl added a comment - @Himanshu: Thanks. Yes makes sense. I like MetaHandlers. Re: failing fast: I think instead of using an HTablePool the sink should create a Connection and ThreadPool and then create HTable on demand using these (see: HBASE-4805 ), together with short timeouts and few retries.
        Hide
        Lars Hofhansl added a comment -

        This should be in 0.94

        Show
        Lars Hofhansl added a comment - This should be in 0.94
        Hide
        Ted Yu added a comment -

        +1 on shifting away from using HTablePool in the JIRA for fail-fast.

        Show
        Ted Yu added a comment - +1 on shifting away from using HTablePool in the JIRA for fail-fast.
        Hide
        Himanshu Vashishtha added a comment -

        Lars, Ted and Elliot: Thanks for the feedback.

        @Lars: Changing the name is beyond the scope of this jira, no? Another jira for that?
        re: failfast: Yeah, the patch still uses HTablePool, but submits the batch in a threadpool (of ReplicationSink). Meanwhile, the handler keeps checking whether the client is still alive or not, while waiting for the task to finish. If the client is out, it cancels the task.
        Also, ReplicationSink now has its own conf object where it can decorate it with its own timeout, number of retrials etc. Is there an open jira for ReplicationSink (can't create a jira yet)?

        Show
        Himanshu Vashishtha added a comment - Lars, Ted and Elliot: Thanks for the feedback. @Lars: Changing the name is beyond the scope of this jira, no? Another jira for that? re: failfast: Yeah, the patch still uses HTablePool, but submits the batch in a threadpool (of ReplicationSink). Meanwhile, the handler keeps checking whether the client is still alive or not, while waiting for the task to finish. If the client is out, it cancels the task. Also, ReplicationSink now has its own conf object where it can decorate it with its own timeout, number of retrials etc. Is there an open jira for ReplicationSink (can't create a jira yet)?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12540074/HBase-6165-v1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

        -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.coprocessor.TestClassLoading
        org.apache.hadoop.hbase.master.TestAssignmentManager
        org.apache.hadoop.hbase.TestLocalHBaseCluster

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540074/HBase-6165-v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.master.TestAssignmentManager org.apache.hadoop.hbase.TestLocalHBaseCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//console This message is automatically generated.
        Hide
        Himanshu Vashishtha added a comment -

        Created fail-fast replicationSink jira HBase-6550 (https://issues.apache.org/jira/browse/HBASE-6550)

        Show
        Himanshu Vashishtha added a comment - Created fail-fast replicationSink jira HBase-6550 ( https://issues.apache.org/jira/browse/HBASE-6550 )
        Hide
        Elliott Clark added a comment -

        Using priority is kind of a misnomer as there is no priority as such

        The actual handlers don't imply some sort of QOS, but the naming does correspond to

        {low|medium|high}

        priority set of operations that can be in that handler's queue.

        Show
        Elliott Clark added a comment - Using priority is kind of a misnomer as there is no priority as such The actual handlers don't imply some sort of QOS, but the naming does correspond to {low|medium|high} priority set of operations that can be in that handler's queue.
        Hide
        Himanshu Vashishtha added a comment -

        Yeah, and I think it should be changed to what it actually do. So, changing the QOS and respective handlers in the line of CLIENT_OPS, CUSTOM_OPS, and META_OPS seems more appropriate.

        Show
        Himanshu Vashishtha added a comment - Yeah, and I think it should be changed to what it actually do. So, changing the QOS and respective handlers in the line of CLIENT_OPS, CUSTOM_OPS, and META_OPS seems more appropriate.
        Hide
        Himanshu Vashishtha added a comment -

        So, shall I upload with a +ve default value for the number of custom handlers then? For the naming of existing handlers, I can another jira? Thoughts?

        Show
        Himanshu Vashishtha added a comment - So, shall I upload with a +ve default value for the number of custom handlers then? For the naming of existing handlers, I can another jira? Thoughts?
        Hide
        Ted Yu added a comment -

        Sounds good.
        Consider renaming "hbase.regionserver.custom.priority.handler.count" to "hbase.regionserver.custom.handler.count"

        Show
        Ted Yu added a comment - Sounds good. Consider renaming "hbase.regionserver.custom.priority.handler.count" to "hbase.regionserver.custom.handler.count"
        Hide
        Himanshu Vashishtha added a comment -

        Making the default custom handlers as 5 instead of 0; Renamed property as per Ted's suggestion.

        Show
        Himanshu Vashishtha added a comment - Making the default custom handlers as 5 instead of 0; Renamed property as per Ted's suggestion.
        Hide
        Ted Yu added a comment -

        From https://builds.apache.org/job/PreCommit-HBASE-Build/2556/console:

        /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/dev-support/test-patch.sh: line 353:   393 Aborted
        
        Show
        Ted Yu added a comment - From https://builds.apache.org/job/PreCommit-HBASE-Build/2556/console: /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/dev-support/test-patch.sh: line 353: 393 Aborted
        Hide
        Himanshu Vashishtha added a comment -

        On current trunk (with commit 7b9cbf0c0b35468591b3a1cf5c93951461590f8c), it applied clean. Shall I upload again?

        Show
        Himanshu Vashishtha added a comment - On current trunk (with commit 7b9cbf0c0b35468591b3a1cf5c93951461590f8c), it applied clean. Shall I upload again?
        Hide
        Ted Yu added a comment -

        Yes, please.
        Aborted test run is different from compilation error.

        Show
        Ted Yu added a comment - Yes, please. Aborted test run is different from compilation error.
        Hide
        Elliott Clark added a comment -

        I still don't understand the naming. There's nothing "custom" about these handlers. They handle replication. REPLICATION_OPS, MISC_OPS, INTERNAL_OPS any of those seem convey more about the type of operations these threads will handle.

        Show
        Elliott Clark added a comment - I still don't understand the naming. There's nothing "custom" about these handlers. They handle replication. REPLICATION_OPS, MISC_OPS, INTERNAL_OPS any of those seem convey more about the type of operations these threads will handle.
        Hide
        Himanshu Vashishtha added a comment -

        @Elliot: I don't want to tie them with replication. As you see, they have +ve default value now, so it will not be correct to call them REPLICATION_OPS.
        Any method with CUSTOM_OPS attributed will be handled with it. The nearest candidate to use this is Security related methods I think.
        MISC/INTERNAL doesn't convey anything specific too
        Don't know, but CUSTOM still looks ok to me... But will be glad to change with more appropriate name.

        @Ted: What does that error mean btw?

        Show
        Himanshu Vashishtha added a comment - @Elliot: I don't want to tie them with replication. As you see, they have +ve default value now, so it will not be correct to call them REPLICATION_OPS. Any method with CUSTOM_OPS attributed will be handled with it. The nearest candidate to use this is Security related methods I think. MISC/INTERNAL doesn't convey anything specific too Don't know, but CUSTOM still looks ok to me... But will be glad to change with more appropriate name. @Ted: What does that error mean btw?
        Hide
        Elliott Clark added a comment -

        It's not that custom doesn't convey enough meaning (I could live with that). Custom implies that there's been some modification from normal or stock. That is not the case. These handlers are there for things that are built in. Replication and security are core pieces of functionality. Naming things custom gives the impression that they are not as supported as other operation, which is not the case.

        Show
        Elliott Clark added a comment - It's not that custom doesn't convey enough meaning (I could live with that). Custom implies that there's been some modification from normal or stock. That is not the case. These handlers are there for things that are built in. Replication and security are core pieces of functionality. Naming things custom gives the impression that they are not as supported as other operation, which is not the case.
        Hide
        Ted Yu added a comment -

        @Himanshu:
        I don't know the root cause for abortion of QA run.

        w.r.t. queue naming, can I assume that misc(ellaneous) is acceptable to everyone ?

        Show
        Ted Yu added a comment - @Himanshu: I don't know the root cause for abortion of QA run. w.r.t. queue naming, can I assume that misc(ellaneous) is acceptable to everyone ?
        Hide
        Andrew Purtell added a comment -

        MISC doesn't have any meaning.

        Neither does "custom".

        IMO, name these after what they actually do. If this is for replication, name it REPLICATION_QOS.

        Show
        Andrew Purtell added a comment - MISC doesn't have any meaning. Neither does "custom". IMO, name these after what they actually do. If this is for replication, name it REPLICATION_QOS.
        Hide
        Himanshu Vashishtha added a comment -

        Thanks Andrew.

        Revised patch with the following changes:
        a) Call Queue name, QOS are replication specific
        b) default number of replication handlers is 3
        c) Moved QOS attributes constants to HConstants.

        Tested replication on a cluster; green jenkins

        Show
        Himanshu Vashishtha added a comment - Thanks Andrew. Revised patch with the following changes: a) Call Queue name, QOS are replication specific b) default number of replication handlers is 3 c) Moved QOS attributes constants to HConstants. Tested replication on a cluster; green jenkins
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12541512/HBase-6165-v3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

        -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.client.TestFromClientSide

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541512/HBase-6165-v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Patch v3 looks clean.
        nit:

        +    if(handlers != null) {
        +      for(Handler h : handlers) {
        

        Space should be added immediately before '('

        Show
        Ted Yu added a comment - Patch v3 looks clean. nit: + if (handlers != null ) { + for (Handler h : handlers) { Space should be added immediately before '('
        Hide
        Himanshu Vashishtha added a comment -

        You are right Sir! Done.

        Show
        Himanshu Vashishtha added a comment - You are right Sir! Done.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12541517/HBase-6165-v4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

        -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541517/HBase-6165-v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//console This message is automatically generated.
        Hide
        Himanshu Vashishtha added a comment -

        Looking forward for more suggestions/comments, as we are talking about 0.94.2 on the list now.

        Show
        Himanshu Vashishtha added a comment - Looking forward for more suggestions/comments, as we are talking about 0.94.2 on the list now.
        Hide
        Lars Hofhansl added a comment -

        v4 looks good. +1

        Show
        Lars Hofhansl added a comment - v4 looks good. +1
        Hide
        Jean-Daniel Cryans added a comment -

        I'm -1 for 0.94 until I test it on one of our clusters which I'll try to do this afternoon.

        Show
        Jean-Daniel Cryans added a comment - I'm -1 for 0.94 until I test it on one of our clusters which I'll try to do this afternoon.
        Hide
        Jean-Daniel Cryans added a comment -

        FWIW the v4 patch really doesn't apply on 0.94:

        su-jdcryans-2:hbase-git-su jdcryans$ patch -p1 -F 10 --dry-run < HBase-6165-v4.patch 
        patching file src/main/java/org/apache/hadoop/hbase/HConstants.java
        Hunk #1 succeeded at 650 with fuzz 2 (offset -42 lines).
        patching file src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
        Hunk #1 succeeded at 98 (offset -11 lines).
        patching file src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hunk #1 succeeded at 225 (offset -51 lines).
        Hunk #2 succeeded at 1304 (offset -360 lines).
        Hunk #3 succeeded at 1335 with fuzz 1 (offset -414 lines).
        Hunk #4 succeeded at 1356 (offset -415 lines).
        Hunk #5 succeeded at 1526 (offset -405 lines).
        Hunk #6 succeeded at 1630 with fuzz 3 (offset -415 lines).
        Hunk #7 succeeded at 1652 (offset -423 lines).
        Hunk #8 succeeded at 1664 (offset -423 lines).
        patching file src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Hunk #1 succeeded at 449 with fuzz 2 (offset 153 lines).
        Hunk #2 FAILED at 658.
        Hunk #3 succeeded at 486 (offset -87 lines).
        Hunk #4 succeeded at 504 (offset -87 lines).
        Hunk #5 succeeded at 520 (offset -87 lines).
        Hunk #6 succeeded at 536 (offset -87 lines).
        Hunk #7 succeeded at 3159 (offset 1061 lines).
        Hunk #8 succeeded at 3170 with fuzz 1 (offset 1059 lines).
        Hunk #9 succeeded at 3630 with fuzz 3 (offset 529 lines).
        Hunk #10 FAILED at 3836.
        Hunk #11 FAILED at 3883.
        Hunk #12 FAILED at 3911.
        Hunk #13 FAILED at 3998.
        Hunk #14 FAILED at 4037.
        Hunk #15 FAILED at 4068.
        Hunk #16 FAILED at 4097.
        Hunk #17 FAILED at 4131.
        9 out of 17 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java.rej
        
        Show
        Jean-Daniel Cryans added a comment - FWIW the v4 patch really doesn't apply on 0.94: su-jdcryans-2:hbase-git-su jdcryans$ patch -p1 -F 10 --dry-run < HBase-6165-v4.patch patching file src/main/java/org/apache/hadoop/hbase/HConstants.java Hunk #1 succeeded at 650 with fuzz 2 (offset -42 lines). patching file src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java Hunk #1 succeeded at 98 (offset -11 lines). patching file src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java Hunk #1 succeeded at 225 (offset -51 lines). Hunk #2 succeeded at 1304 (offset -360 lines). Hunk #3 succeeded at 1335 with fuzz 1 (offset -414 lines). Hunk #4 succeeded at 1356 (offset -415 lines). Hunk #5 succeeded at 1526 (offset -405 lines). Hunk #6 succeeded at 1630 with fuzz 3 (offset -415 lines). Hunk #7 succeeded at 1652 (offset -423 lines). Hunk #8 succeeded at 1664 (offset -423 lines). patching file src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Hunk #1 succeeded at 449 with fuzz 2 (offset 153 lines). Hunk #2 FAILED at 658. Hunk #3 succeeded at 486 (offset -87 lines). Hunk #4 succeeded at 504 (offset -87 lines). Hunk #5 succeeded at 520 (offset -87 lines). Hunk #6 succeeded at 536 (offset -87 lines). Hunk #7 succeeded at 3159 (offset 1061 lines). Hunk #8 succeeded at 3170 with fuzz 1 (offset 1059 lines). Hunk #9 succeeded at 3630 with fuzz 3 (offset 529 lines). Hunk #10 FAILED at 3836. Hunk #11 FAILED at 3883. Hunk #12 FAILED at 3911. Hunk #13 FAILED at 3998. Hunk #14 FAILED at 4037. Hunk #15 FAILED at 4068. Hunk #16 FAILED at 4097. Hunk #17 FAILED at 4131. 9 out of 17 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java.rej
        Hide
        Himanshu Vashishtha added a comment -

        The above patch was for trunk;
        will upload a 0.94 one.

        Show
        Himanshu Vashishtha added a comment - The above patch was for trunk; will upload a 0.94 one.
        Hide
        Himanshu Vashishtha added a comment -

        0.94 patch

        Show
        Himanshu Vashishtha added a comment - 0.94 patch
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12542698/HBase-6165-94-v1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2710//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542698/HBase-6165-94-v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2710//console This message is automatically generated.
        Hide
        Jean-Daniel Cryans added a comment -

        The 0.94 patch doesn't set the proper QOS:

        -  @QosPriority(priority=HIGH_QOS)
        +  @QosPriority(priority=HConstants.HIGH_QOS)
           public void replicateLogEntries(final HLog.Entry[] entries)
        
        Show
        Jean-Daniel Cryans added a comment - The 0.94 patch doesn't set the proper QOS: - @QosPriority(priority=HIGH_QOS) + @QosPriority(priority=HConstants.HIGH_QOS) public void replicateLogEntries( final HLog.Entry[] entries)
        Hide
        Himanshu Vashishtha added a comment -

        replicateLogEntries with replication QOS

        Show
        Himanshu Vashishtha added a comment - replicateLogEntries with replication QOS
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12542957/HBase-6165-94-v2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2724//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542957/HBase-6165-94-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2724//console This message is automatically generated.
        Hide
        Jean-Daniel Cryans added a comment -

        I've been running the 0.94 patch since yesterday, +1.

        Show
        Jean-Daniel Cryans added a comment - I've been running the 0.94 patch since yesterday, +1.
        Hide
        Jean-Daniel Cryans added a comment -

        The trunk patch needs a refresh.

        Show
        Jean-Daniel Cryans added a comment - The trunk patch needs a refresh.
        Hide
        Lars Hofhansl added a comment -

        I'll make an updated patch.

        Show
        Lars Hofhansl added a comment - I'll make an updated patch.
        Hide
        Himanshu Vashishtha added a comment -

        patch refreshed

        Show
        Himanshu Vashishtha added a comment - patch refreshed
        Hide
        Lars Hofhansl added a comment -

        You beat me

        Show
        Lars Hofhansl added a comment - You beat me
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12543165/HBase-6165-v5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2742//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543165/HBase-6165-v5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2742//console This message is automatically generated.
        Hide
        Himanshu Vashishtha added a comment -

        No, I can't. You have the superpower to take it one step forward

        Show
        Himanshu Vashishtha added a comment - No, I can't. You have the superpower to take it one step forward
        Hide
        Lars Hofhansl added a comment -

        Unfortunately v5 still does not apply cleanly to trunk.

        Show
        Lars Hofhansl added a comment - Unfortunately v5 still does not apply cleanly to trunk.
        Hide
        Himanshu Vashishtha added a comment -

        It is based on top of commit: e554fa9b0cc06c7a364c38bed53139da5e354b36; I took an update before creating it. Are there more commits after this which git doesn't has?

        Feel free to create the new patch then.

        Show
        Himanshu Vashishtha added a comment - It is based on top of commit: e554fa9b0cc06c7a364c38bed53139da5e354b36; I took an update before creating it. Are there more commits after this which git doesn't has? Feel free to create the new patch then.
        Hide
        stack added a comment -

        git can lag svn. Would suggest you get an svn checkout and make sure patch applies there.

        Show
        stack added a comment - git can lag svn. Would suggest you get an svn checkout and make sure patch applies there.
        Hide
        Lars Hofhansl added a comment -

        The canonical repository is the SVN repository, Himanshu.

        Show
        Lars Hofhansl added a comment - The canonical repository is the SVN repository, Himanshu.
        Hide
        Himanshu Vashishtha added a comment -

        good to know; will set up a svn/eclipse environment.

        Show
        Himanshu Vashishtha added a comment - good to know; will set up a svn/eclipse environment.
        Hide
        Lars Hofhansl added a comment -

        I'll make a patch for now. For folks who like git, svn is a pain (or so I heard)

        Show
        Lars Hofhansl added a comment - I'll make a patch for now. For folks who like git, svn is a pain (or so I heard)
        Hide
        Lars Hofhansl added a comment -

        Trunk patch that I am going to commit. Also fixed TestPriorityRpc, which didn't compile.

        Show
        Lars Hofhansl added a comment - Trunk patch that I am going to commit. Also fixed TestPriorityRpc, which didn't compile.
        Hide
        Lars Hofhansl added a comment -

        Committed to 0.94 and 0.96.
        Thanks for the patch Himanshu.

        Show
        Lars Hofhansl added a comment - Committed to 0.94 and 0.96. Thanks for the patch Himanshu.
        Hide
        Himanshu Vashishtha added a comment -

        Thanks for the final patch Lars

        Show
        Himanshu Vashishtha added a comment - Thanks for the final patch Lars
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #155 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/155/)
        HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379235)

        Result = FAILURE
        larsh :
        Files :

        • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPriorityRpc.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #155 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/155/ ) HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379235) Result = FAILURE larsh : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPriorityRpc.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #443 (See https://builds.apache.org/job/HBase-0.94/443/)
        HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236)

        Result = SUCCESS
        larsh :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #443 (See https://builds.apache.org/job/HBase-0.94/443/ ) HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #51 (See https://builds.apache.org/job/HBase-0.94-security/51/)
        HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236)

        Result = FAILURE
        larsh :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #51 (See https://builds.apache.org/job/HBase-0.94-security/51/ ) HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security-on-Hadoop-23 #7 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/7/)
        HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236)

        Result = FAILURE
        larsh :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #7 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/7/ ) HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Hide
        Jeff Whiting added a comment -

        I maybe a little late to the party, but why is replication using any kind of higher than normal priority handlers?

        It looks like we all agree that they shouldn't be using the high priority handlers. It looks like they now have their own medium priority handlers. But I don't see an argument as to why they don't just use the normal handlers priority handlers.

        Show
        Jeff Whiting added a comment - I maybe a little late to the party, but why is replication using any kind of higher than normal priority handlers? It looks like we all agree that they shouldn't be using the high priority handlers. It looks like they now have their own medium priority handlers. But I don't see an argument as to why they don't just use the normal handlers priority handlers.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #558 (See https://builds.apache.org/job/HBase-0.92/558/)
        HBASE-6724 Port HBASE-6165 'Replication can overrun .META. scans on cluster re-start' to 0.92 (Revision 1381451)

        Result = FAILURE
        Tedyu :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HConstants.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #558 (See https://builds.apache.org/job/HBase-0.92/558/ ) HBASE-6724 Port HBASE-6165 'Replication can overrun .META. scans on cluster re-start' to 0.92 (Revision 1381451) Result = FAILURE Tedyu : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Hide
        stack added a comment -

        @Jeff IIRC they need to be on a channel other than user priority queue because they can overwhelm user loadings (e.g. big cluster replication into small cluster). We've been learning a bunch of late about replicating and its fair to say that some pieces need a bit of rethink making them more robust around cases such as aforementioned large into small or one we ran into ourselves recently where we couldn't start the small cluster because the high priority handlers were all occupied by replication soon after startup (This patch would help w/ that scenario). I see that this patch has just been backported to 0.92 – hopefully that will be of help to you in your current predicament.

        Show
        stack added a comment - @Jeff IIRC they need to be on a channel other than user priority queue because they can overwhelm user loadings (e.g. big cluster replication into small cluster). We've been learning a bunch of late about replicating and its fair to say that some pieces need a bit of rethink making them more robust around cases such as aforementioned large into small or one we ran into ourselves recently where we couldn't start the small cluster because the high priority handlers were all occupied by replication soon after startup (This patch would help w/ that scenario). I see that this patch has just been backported to 0.92 – hopefully that will be of help to you in your current predicament.
        Hide
        Jean-Daniel Cryans added a comment -

        Jeff Whiting, originally replication was using the normal handlers and was just deadlocking the clusters in a different way. ReplicationSink uses the HBase client which can block for ungodly amounts of time so it would fill up the handlers and the RS would stop serving requests. HBASE-6550 changed the latter that a bit by setting low timeouts via replication-specific client-side configuration parameters (if it was using the normal client configurations it would also affect all the other clients). With HBASE-6165 it's even safer since replication is sandboxed.

        Show
        Jean-Daniel Cryans added a comment - Jeff Whiting , originally replication was using the normal handlers and was just deadlocking the clusters in a different way. ReplicationSink uses the HBase client which can block for ungodly amounts of time so it would fill up the handlers and the RS would stop serving requests. HBASE-6550 changed the latter that a bit by setting low timeouts via replication-specific client-side configuration parameters (if it was using the normal client configurations it would also affect all the other clients). With HBASE-6165 it's even safer since replication is sandboxed.
        Hide
        Jeff Whiting added a comment -

        @stack and @jdcryans Thanks for the explanation. I can see how it would deadlock on itself. I also found HBASE-3401 which talks about the deadlock. We patched our cdh4 cluster with HBASE-6724 and it has been running much smoother.

        Show
        Jeff Whiting added a comment - @stack and @jdcryans Thanks for the explanation. I can see how it would deadlock on itself. I also found HBASE-3401 which talks about the deadlock. We patched our cdh4 cluster with HBASE-6724 and it has been running much smoother.
        Hide
        Himanshu Vashishtha added a comment -

        Jeff Whiting Specifically, replication specific jira about deadlocking on normal handlers is HBASE-4280.

        Show
        Himanshu Vashishtha added a comment - Jeff Whiting Specifically, replication specific jira about deadlocking on normal handlers is HBASE-4280 .
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #143 (See https://builds.apache.org/job/HBase-0.92-security/143/)
        HBASE-6724 Port HBASE-6165 'Replication can overrun .META. scans on cluster re-start' to 0.92 (Revision 1381451)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HConstants.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #143 (See https://builds.apache.org/job/HBase-0.92-security/143/ ) HBASE-6724 Port HBASE-6165 'Replication can overrun .META. scans on cluster re-start' to 0.92 (Revision 1381451) Result = FAILURE tedyu : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Hide
        stack added a comment -

        Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl)

        Show
        stack added a comment - Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl)

          People

          • Assignee:
            Himanshu Vashishtha
            Reporter:
            Elliott Clark
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development