Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7858

Improve HA Namenode Failover detection on the client

    Details

      Description

      In an HA deployment, Clients are configured with the hostnames of both the Active and Standby Namenodes.Clients will first try one of the NNs (non-deterministically) and if its a standby NN, then it will respond to the client to retry the request on the other Namenode.
      If the client happens to talks to the Standby first, and the standby is undergoing some GC / is busy, then those clients might not get a response soon enough to try the other NN.

      Proposed Approach to solve this :
      1) Use hedged RPCs to simultaneously call multiple configured NNs to decide which is the active Namenode.
      2) Subsequent calls, will invoke the previously successful NN.
      3) On failover of the currently active NN, the remaining NNs will be invoked to decide which is the new active

      1. HDFS-7858.9.patch
        33 kB
        Arun Suresh
      2. HDFS-7858.8.patch
        33 kB
        Arun Suresh
      3. HDFS-7858.7.patch
        33 kB
        Arun Suresh
      4. HDFS-7858.6.patch
        30 kB
        Arun Suresh
      5. HDFS-7858.5.patch
        30 kB
        Arun Suresh
      6. HDFS-7858.4.patch
        30 kB
        Arun Suresh
      7. HDFS-7858.3.patch
        18 kB
        Arun Suresh
      8. HDFS-7858.2.patch
        19 kB
        Arun Suresh
      9. HDFS-7858.2.patch
        19 kB
        Arun Suresh
      10. HDFS-7858.13.patch
        37 kB
        Arun Suresh
      11. HDFS-7858.12.patch
        38 kB
        Arun Suresh
      12. HDFS-7858.11.patch
        33 kB
        Arun Suresh
      13. HDFS-7858.10.patch
        33 kB
        Arun Suresh
      14. HDFS-7858.10.patch
        33 kB
        Arun Suresh
      15. HDFS-7858.1.patch
        22 kB
        Arun Suresh

        Issue Links

          Activity

          Hide
          kihwal Kihwal Lee added a comment -

          ZK may not scale to support thousands of clients. I think it will be better to use more aggressive timeout and proper retry policy to get around such problems.

          Show
          kihwal Kihwal Lee added a comment - ZK may not scale to support thousands of clients. I think it will be better to use more aggressive timeout and proper retry policy to get around such problems.
          Hide
          bikassaha Bikas Saha added a comment -

          We need to be careful about how many clients can be supported by ZK (either pinging for info or watchers). ZK is typically a shared service with YARN/HBase etc.

          Show
          bikassaha Bikas Saha added a comment - We need to be careful about how many clients can be supported by ZK (either pinging for info or watchers). ZK is typically a shared service with YARN/HBase etc.
          Hide
          asuresh Arun Suresh added a comment -

          Kihwal Lee, Bikas Saha, I understand your concerns over the use of ZK. But consider the following:

          1. Most DFSClients are cached (since the Filesystem objects are generally cached). Thus long lived clients will probably not have more than 1-2 persistent connections to ZK.
          2. Short lived clients will first check if there is cached entry (possibly in the home directory, something like ~/.lastNN) that contains the last accessed active NN. The client will proceed to connect to that NN first (thereby removing non-determinism from the current scheme).. and will most probably succeed. It will contact ZK only if the connection was unsuccessful.. and we can limit this to just a ping (not a watch registration) so the connection is not persistent.
          3. A Client that has connected to a NN without the need for a ZK connection can continue to NOT talk to ZK till
            1. the client dies and no ZK connection is ever made
            2. after waiting for a configurable time (maybe an hour).. after which it is established that it is a ling lived client,

          Do you think this might reduce the total number of connection to ZK at any point of time ?

          Show
          asuresh Arun Suresh added a comment - Kihwal Lee , Bikas Saha , I understand your concerns over the use of ZK. But consider the following: Most DFSClients are cached (since the Filesystem objects are generally cached). Thus long lived clients will probably not have more than 1-2 persistent connections to ZK. Short lived clients will first check if there is cached entry (possibly in the home directory, something like ~/.lastNN) that contains the last accessed active NN. The client will proceed to connect to that NN first (thereby removing non-determinism from the current scheme).. and will most probably succeed. It will contact ZK only if the connection was unsuccessful.. and we can limit this to just a ping (not a watch registration) so the connection is not persistent. A Client that has connected to a NN without the need for a ZK connection can continue to NOT talk to ZK till the client dies and no ZK connection is ever made after waiting for a configurable time (maybe an hour).. after which it is established that it is a ling lived client, Do you think this might reduce the total number of connection to ZK at any point of time ?
          Hide
          bikassaha Bikas Saha added a comment -

          The client will proceed to connect to that NN first (thereby removing non-determinism from the current scheme).. and will most probably succeed. It will contact ZK only if the connection was unsuccessful..

          Yes. It will most probably succeed. But when will it not succeed? When that NN has failed over or has crashed, right? Which means that every time a known primary NN becomes unavailable there will be surge of failed connections to it (from cached entries that point to it) and then these connections will be redirected to ZK. For a proxy of the number of connections consider MR jobs, where every Map task running on every machine has a DFS client to read from HDFS and every Reduce task on every machine has a DFS client to write to HDFS. MR tasks are typically short lived clients.

          Show
          bikassaha Bikas Saha added a comment - The client will proceed to connect to that NN first (thereby removing non-determinism from the current scheme).. and will most probably succeed. It will contact ZK only if the connection was unsuccessful.. Yes. It will most probably succeed. But when will it not succeed? When that NN has failed over or has crashed, right? Which means that every time a known primary NN becomes unavailable there will be surge of failed connections to it (from cached entries that point to it) and then these connections will be redirected to ZK. For a proxy of the number of connections consider MR jobs, where every Map task running on every machine has a DFS client to read from HDFS and every Reduce task on every machine has a DFS client to write to HDFS. MR tasks are typically short lived clients.
          Hide
          asuresh Arun Suresh added a comment -

          Bikas Saha, you make a very valid point.

          I guess the situation you mentioned can be alleviated as follows :
          Considering the fact a client knows apriori, both the Active and Standby, what if we do the following : if locally cached active namenode entry has become unavailable, yes there will be an initial surge of requests to the failed NN, but the client can directly retry to the Standby without consulting ZK. ZK connections will happen only in the following cases :

          1. If no cached entry is present in the user home directory.
          2. Long living clients

          Also I was thinking, maybe we break this into 2 separate JIRAs :

          1. Adding a cached entry to user's home dir to pick last active NN. If entry is not present, the client picks the Standby from the configuration. No ZK involvement for this, it only brings some determinism in which namenode is picked first.
          2. Have another JIRA to add ZK client optimization. This would in addition to the ZK watch feature for long lived clients can bring in probably additional benefits such as having only the logical nameservice name in the Configuration. Namenodes when it starts up will register under a ZNode and clients find out the actual URI of the Active and Standby directly from ZK (like HBase clients). Short lived clients would then first query ZK, finding the active and standby NN URIs and cache them (rather than reading from the Configuration), so subsequent Client invocation do not hit ZK.
          Show
          asuresh Arun Suresh added a comment - Bikas Saha , you make a very valid point. I guess the situation you mentioned can be alleviated as follows : Considering the fact a client knows apriori, both the Active and Standby, what if we do the following : if locally cached active namenode entry has become unavailable, yes there will be an initial surge of requests to the failed NN, but the client can directly retry to the Standby without consulting ZK. ZK connections will happen only in the following cases : If no cached entry is present in the user home directory. Long living clients Also I was thinking, maybe we break this into 2 separate JIRAs : Adding a cached entry to user's home dir to pick last active NN. If entry is not present, the client picks the Standby from the configuration. No ZK involvement for this, it only brings some determinism in which namenode is picked first. Have another JIRA to add ZK client optimization. This would in addition to the ZK watch feature for long lived clients can bring in probably additional benefits such as having only the logical nameservice name in the Configuration. Namenodes when it starts up will register under a ZNode and clients find out the actual URI of the Active and Standby directly from ZK (like HBase clients). Short lived clients would then first query ZK, finding the active and standby NN URIs and cache them (rather than reading from the Configuration), so subsequent Client invocation do not hit ZK.
          Hide
          bikassaha Bikas Saha added a comment -

          What are long lived client examples? How many such clients would be there in a large busy cluster? Will they be setting watches on ZK?

          Adding a cached entry to user's home dir to pick last active NN. If entry is not present, the client picks the Standby from the configuration.

          This seems like a reasonable improvement to the current scheme which will allow a client to connect to the current active directly (even though it may be listed later in the NN names list).

          Please do keep in mind that ZK is just a notifier in the leader election scheme. The real control lies in the FailoverController which is pluggable. A different FailoverController may not use ZK. The status of the master flag may not be valid/be-empty while the FailoverController is fencing the old master and bringing up the new master.

          Getting configuration from ZK is related but probably orthogonal. The entire config for HDFS could be downloaded from ZK based on a well known HDFS service name.

          Show
          bikassaha Bikas Saha added a comment - What are long lived client examples? How many such clients would be there in a large busy cluster? Will they be setting watches on ZK? Adding a cached entry to user's home dir to pick last active NN. If entry is not present, the client picks the Standby from the configuration. This seems like a reasonable improvement to the current scheme which will allow a client to connect to the current active directly (even though it may be listed later in the NN names list). Please do keep in mind that ZK is just a notifier in the leader election scheme. The real control lies in the FailoverController which is pluggable. A different FailoverController may not use ZK. The status of the master flag may not be valid/be-empty while the FailoverController is fencing the old master and bringing up the new master. Getting configuration from ZK is related but probably orthogonal. The entire config for HDFS could be downloaded from ZK based on a well known HDFS service name.
          Hide
          asuresh Arun Suresh added a comment -

          Bikas Saha, Thank you for your comments..

          Based on our discuss, I guess the immediate improvement for client fail over detection is possibly via the use of a cache file containing the active namenode in the filesystem. The ZK aspect can possibly be explored more indepth, maybe as part of a new JIRA ?

          I am therefore attaching a patch to allow clients to check a cached file and decide which namenode to contact first. This file will live (by default) in the temp directory (Since it is possible that all users might not have a home directory) and can thus be accessed by all users

          Show
          asuresh Arun Suresh added a comment - Bikas Saha , Thank you for your comments.. Based on our discuss, I guess the immediate improvement for client fail over detection is possibly via the use of a cache file containing the active namenode in the filesystem. The ZK aspect can possibly be explored more indepth, maybe as part of a new JIRA ? I am therefore attaching a patch to allow clients to check a cached file and decide which namenode to contact first. This file will live (by default) in the temp directory (Since it is possible that all users might not have a home directory) and can thus be accessed by all users
          Hide
          atm Aaron T. Myers added a comment -

          Hey folks, sorry to come into this discussion so late.

          Given that some folks choose to use HDFS HA without auto failover at all, and thus without ZKFCs or ZK in sight, I think we should target any solution to this problem to work without ZK. I'm also a little leery of using a cache file, as I'm afraid of thundering herd effects (if the file is in HDFS or in a home dir which is network mounted), and also don't like the fact that in a large cluster all users on all machines might need to populate this cache file.

          As such, I'd propose that we pursue either of the following two options:

          1. Optimistically try to connect to both configured NNs simultaneously, thus allowing that one (the standby) may take a while to respond, but also expecting that the active will always respond rather promptly. This is similar to Kihwal's suggestion.
          2. Have the client connect to the JNs to determine which NN is the likely the active. In my experience, even those who don't use automatic failover basically always use the QJM. I think those that continue to use NFS-based HA are very few and far between.

          Thoughts?

          Show
          atm Aaron T. Myers added a comment - Hey folks, sorry to come into this discussion so late. Given that some folks choose to use HDFS HA without auto failover at all, and thus without ZKFCs or ZK in sight, I think we should target any solution to this problem to work without ZK. I'm also a little leery of using a cache file, as I'm afraid of thundering herd effects (if the file is in HDFS or in a home dir which is network mounted), and also don't like the fact that in a large cluster all users on all machines might need to populate this cache file. As such, I'd propose that we pursue either of the following two options: Optimistically try to connect to both configured NNs simultaneously, thus allowing that one (the standby) may take a while to respond, but also expecting that the active will always respond rather promptly. This is similar to Kihwal's suggestion. Have the client connect to the JNs to determine which NN is the likely the active. In my experience, even those who don't use automatic failover basically always use the QJM. I think those that continue to use NFS-based HA are very few and far between. Thoughts?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12701992/HDFS-7858.1.patch
          against trunk revision ca1c00b.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

          org.apache.hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9699//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9699//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9699//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701992/HDFS-7858.1.patch against trunk revision ca1c00b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9699//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9699//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9699//console This message is automatically generated.
          Hide
          jingzhao Jing Zhao added a comment -

          Optimistically try to connect to both configured NNs simultaneously

          I like this better than letting clients connect JNs. JNs are in a critical code path for writing editlog and failing to write a quorum of JNs can cause NN to kill itself, thus maybe we need to be more careful when letting all clients directly connect to them.

          Show
          jingzhao Jing Zhao added a comment - Optimistically try to connect to both configured NNs simultaneously I like this better than letting clients connect JNs. JNs are in a critical code path for writing editlog and failing to write a quorum of JNs can cause NN to kill itself, thus maybe we need to be more careful when letting all clients directly connect to them.
          Hide
          asuresh Arun Suresh added a comment -

          Aaron T. Myers, Jing Zhao, Thank you for your suggestions..

          As per the discussion, I am uploading a patch that includes a special RequestHedgingProxyProvider that

          • Sends requests simultaneously to all the NNs and waits for atleast 1 response within a configurable timeout.
          • On receipt of a valid response from any NN, the other outstanding request is immediately cancelled.
          • If all NNs return some exception, then the Exception is returned to the client.
          Show
          asuresh Arun Suresh added a comment - Aaron T. Myers , Jing Zhao , Thank you for your suggestions.. As per the discussion, I am uploading a patch that includes a special RequestHedgingProxyProvider that Sends requests simultaneously to all the NNs and waits for atleast 1 response within a configurable timeout. On receipt of a valid response from any NN, the other outstanding request is immediately cancelled. If all NNs return some exception, then the Exception is returned to the client.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12702429/HDFS-7858.2.patch
          against trunk revision 3560180.

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9727//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702429/HDFS-7858.2.patch against trunk revision 3560180. -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9727//console This message is automatically generated.
          Hide
          asuresh Arun Suresh added a comment -

          Re-attaching to kick off jenkins again...

          Show
          asuresh Arun Suresh added a comment - Re-attaching to kick off jenkins again...
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12702433/HDFS-7858.2.patch
          against trunk revision 3560180.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestFileTruncate

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9730//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702433/HDFS-7858.2.patch against trunk revision 3560180. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFileTruncate Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9730//console This message is automatically generated.
          Hide
          asuresh Arun Suresh added a comment -

          This testcase failure seems unrelated..

          Show
          asuresh Arun Suresh added a comment - This testcase failure seems unrelated..
          Hide
          asuresh Arun Suresh added a comment -

          Updating patch.. with minor refactorings

          Show
          asuresh Arun Suresh added a comment - Updating patch.. with minor refactorings
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12702886/HDFS-7858.3.patch
          against trunk revision 952640f.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestDecommission
          org.apache.hadoop.hdfs.server.balancer.TestBalancer

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9752//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9752//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702886/HDFS-7858.3.patch against trunk revision 952640f. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.server.balancer.TestBalancer Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9752//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9752//console This message is automatically generated.
          Hide
          kasha Karthik Kambatla added a comment -

          If possible, it would be nice to make the solution here accessible to YARN as well.

          Simultaneously connecting to all the masters (NNs in HDFS and RMs in YARN) might work most of the time. How do we plan to handle a split-brain? In YARN, we don't use an explicit fencing mechanism. IIRR, one is not required to configure a fencing mechanism when using QJM?

          Show
          kasha Karthik Kambatla added a comment - If possible, it would be nice to make the solution here accessible to YARN as well. Simultaneously connecting to all the masters (NNs in HDFS and RMs in YARN) might work most of the time. How do we plan to handle a split-brain? In YARN, we don't use an explicit fencing mechanism. IIRR, one is not required to configure a fencing mechanism when using QJM?
          Hide
          asuresh Arun Suresh added a comment -

          Karthik Kambatla,

          one is not required to configure a fencing mechanism when using QJM ?

          Yup, QJM ensures only 1 namenode can write, but fencing is still recommended since there is still a possibility of the stale reads from the old Active NN before going down (I am hoping this will not be too much of an issue)

          it would be nice to make the solution here accessible to YARN as well.

          The current patch extends the ConfigredFailoverProxyProvider in the hdfs code base. The ConfiguredRMFailoverProxyProvider looks like it belongs to the same class hierarchy.. so it shouldnt be too hard. But like you mentioned, if YARN is not deployed with ZKRMStateStore, there is a possibility of split-brain.. which leads mean to think.. wouldnt it be nice to incorporate QJM and JNs into YARN deployment ? thoughts ?

          Show
          asuresh Arun Suresh added a comment - Karthik Kambatla , one is not required to configure a fencing mechanism when using QJM ? Yup, QJM ensures only 1 namenode can write, but fencing is still recommended since there is still a possibility of the stale reads from the old Active NN before going down (I am hoping this will not be too much of an issue) it would be nice to make the solution here accessible to YARN as well. The current patch extends the ConfigredFailoverProxyProvider in the hdfs code base. The ConfiguredRMFailoverProxyProvider looks like it belongs to the same class hierarchy.. so it shouldnt be too hard. But like you mentioned, if YARN is not deployed with ZKRMStateStore , there is a possibility of split-brain.. which leads mean to think.. wouldnt it be nice to incorporate QJM and JNs into YARN deployment ? thoughts ?
          Hide
          asuresh Arun Suresh added a comment -

          ping Aaron T. Myers, Jing Zhao, Bikas Saha,
          Was wondering if I might get a review for the current patch..

          Show
          asuresh Arun Suresh added a comment - ping Aaron T. Myers , Jing Zhao , Bikas Saha , Was wondering if I might get a review for the current patch..
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for working on this, Arun Suresh.

          One concern is where we should put the new logic. Looks like the current patch wraps things in the following way:

          RequestHedgingInvocationHandler --> proxy returned by RequestHedgingProxyProvider#getProxy --> RetryInvocationHandler

          I'm not sure if this is the best way to go. RetryInvocationHandler has its own logic for retry and failover, which is usually based on the type of the exception thrown by the invocation. With the new design, the exception caught by RetryInvocationHandler is identified based on the exceptions thrown by all the targets inside of RequestHedgingInvocationHandler. Since different targets may return different exceptions, looks like we cannot guarantee RetryInvocationHandler finally gets the exception from the correct target.

          I'm thinking that how about providing RequestHedgingInvocationHandler as a replacement of RetryInvocationHandler? We need to add the retry logic into RequestHedgingInvocationHandler but the whole layer may look more clean.

          Show
          jingzhao Jing Zhao added a comment - Thanks for working on this, Arun Suresh . One concern is where we should put the new logic. Looks like the current patch wraps things in the following way: RequestHedgingInvocationHandler --> proxy returned by RequestHedgingProxyProvider#getProxy --> RetryInvocationHandler I'm not sure if this is the best way to go. RetryInvocationHandler has its own logic for retry and failover, which is usually based on the type of the exception thrown by the invocation. With the new design, the exception caught by RetryInvocationHandler is identified based on the exceptions thrown by all the targets inside of RequestHedgingInvocationHandler . Since different targets may return different exceptions, looks like we cannot guarantee RetryInvocationHandler finally gets the exception from the correct target. I'm thinking that how about providing RequestHedgingInvocationHandler as a replacement of RetryInvocationHandler ? We need to add the retry logic into RequestHedgingInvocationHandler but the whole layer may look more clean.
          Hide
          asuresh Arun Suresh added a comment -

          Thanks for the review Jing Zhao.

          I'm thinking that how about providing RequestHedgingInvocationHandler as a replacement of RetryInvocationHandler? We need to add the retry logic into RequestHedgingInvocationHandler but the whole layer may look more clean.

          Yup.. that makes sense.. let me give a shot at refactoring.. and will post an updated patch shortly

          Show
          asuresh Arun Suresh added a comment - Thanks for the review Jing Zhao . I'm thinking that how about providing RequestHedgingInvocationHandler as a replacement of RetryInvocationHandler? We need to add the retry logic into RequestHedgingInvocationHandler but the whole layer may look more clean. Yup.. that makes sense.. let me give a shot at refactoring.. and will post an updated patch shortly
          Hide
          cnauroth Chris Nauroth added a comment -

          This is very interesting. Thanks for working on it, Arun!

          Yup, QJM ensures only 1 namenode can write, but fencing is still recommended since there is still a possibility of the stale reads from the old Active NN before going down (I am hoping this will not be too much of an issue)

          I don't think the patch introduces any new problems here. If two NameNodes think they are active, there is already a risk of reads being served by the wrong node.

          Show
          cnauroth Chris Nauroth added a comment - This is very interesting. Thanks for working on it, Arun! Yup, QJM ensures only 1 namenode can write, but fencing is still recommended since there is still a possibility of the stale reads from the old Active NN before going down (I am hoping this will not be too much of an issue) I don't think the patch introduces any new problems here. If two NameNodes think they are active, there is already a risk of reads being served by the wrong node.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Hi Arun Suresh, were you thinking of posting an updated patch. The overall approach looks good.

          One comment from a quick look - RequestHedgingProxyProvider sends all requests to both NNs. Should it skip the standby for subsequent requests?

          Show
          arpitagarwal Arpit Agarwal added a comment - Hi Arun Suresh , were you thinking of posting an updated patch. The overall approach looks good. One comment from a quick look - RequestHedgingProxyProvider sends all requests to both NNs. Should it skip the standby for subsequent requests?
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 0s The patch command could not apply the patch during dryrun.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12702886/HDFS-7858.3.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 979c9ca
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11704/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12702886/HDFS-7858.3.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 979c9ca Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11704/console This message was automatically generated.
          Hide
          asuresh Arun Suresh added a comment -

          Arpit Agarwal, apologize for sitting on this...

          I was trying to refactor this as per Jing Zhao's suggestion (replacing RetryInvocationHandler with RequestHedgingInvocationHandler). Unfortunately, it was turning out to be a more far reaching impact (technically request hedging is different from retry.. so the whole policy framework etc. would need to be refactored)

          If everyone is ok with the current approach, we can punt the larger refactoring to another JIRA and I can incorporate Arpit Agarwal's suggestion (skip standby for subsequent requests) and provide a quick patch.

          Show
          asuresh Arun Suresh added a comment - Arpit Agarwal , apologize for sitting on this... I was trying to refactor this as per Jing Zhao 's suggestion (replacing RetryInvocationHandler with RequestHedgingInvocationHandler). Unfortunately, it was turning out to be a more far reaching impact (technically request hedging is different from retry.. so the whole policy framework etc. would need to be refactored) If everyone is ok with the current approach, we can punt the larger refactoring to another JIRA and I can incorporate Arpit Agarwal 's suggestion (skip standby for subsequent requests) and provide a quick patch.
          Hide
          asuresh Arun Suresh added a comment -

          Updating patch:

          Modified failure handling based on Jing Zhao's comments. The Behavior now is :

          • If any one of the proxy succeeds, then the operation succeeds.
          • As per Arpit Agarwal's comments, all subsequent calls will be sent to the successful proxy.
          • If all proxies fail, ALL the exceptions are returned to the RetryInvocationHandler
            • If atleast one of the RetryDecision (derived from the corresponding exception) results in a RETRY or FAILOVER_AND_RETRY, then, the request is retried on all proxies.
            • If all returned exceptions result in a FAIL decision, then the operation is failed.

          Thoughts ?

          Show
          asuresh Arun Suresh added a comment - Updating patch: Modified failure handling based on Jing Zhao 's comments. The Behavior now is : If any one of the proxy succeeds, then the operation succeeds. As per Arpit Agarwal 's comments, all subsequent calls will be sent to the successful proxy. If all proxies fail, ALL the exceptions are returned to the RetryInvocationHandler If atleast one of the RetryDecision (derived from the corresponding exception) results in a RETRY or FAILOVER_AND_RETRY, then, the request is retried on all proxies. If all returned exceptions result in a FAIL decision, then the operation is failed. Thoughts ?
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 18m 51s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          -1 javac 7m 39s The applied patch generated 1 additional warning messages.
          -1 javadoc 9m 45s The applied patch generated 1 additional warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 50s The applied patch generated 6 new checkstyle issues (total was 7, now 12).
          -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 32s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          -1 findbugs 4m 26s The patch appears to introduce 4 new Findbugs (version 3.0.0) warnings.
          -1 common tests 21m 29s Tests failed in hadoop-common.
          -1 hdfs tests 177m 1s Tests failed in hadoop-hdfs.
              243m 49s  



          Reason Tests
          FindBugs module:hadoop-common
          Failed unit tests hadoop.io.retry.TestFailoverProxy
            hadoop.ipc.TestIPC
            hadoop.io.retry.TestRetryProxy
            hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
            hadoop.hdfs.server.namenode.ha.TestHASafeMode
            hadoop.hdfs.TestEncryptionZonesWithHA
            hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
            hadoop.hdfs.server.namenode.ha.TestHAAppend
            hadoop.hdfs.tools.TestDFSZKFailoverController
            hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
            hadoop.hdfs.TestDistributedFileSystem
            hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA
            hadoop.hdfs.server.namenode.ha.TestStandbyIsHot
            hadoop.hdfs.server.namenode.TestCheckpoint
            hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
            hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
            hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics
            hadoop.hdfs.TestGetBlocks
            hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA
            hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled
            hadoop.hdfs.server.namenode.TestBlockUnderConstruction
            hadoop.hdfs.TestDFSClientRetries
            hadoop.hdfs.TestDFSInotifyEventInputStream
            hadoop.hdfs.TestDFSClientFailover
            hadoop.hdfs.server.namenode.ha.TestHAFsck
            hadoop.hdfs.server.namenode.ha.TestDNFencing
          Timed out tests org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12745705/HDFS-7858.4.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 0bda84f
          javac https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/diffJavacWarnings.txt
          javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/diffJavadocWarnings.txt
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/diffcheckstylehadoop-common.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/whitespace.txt
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11733/testReport/
          Java 1.7.0_55
          uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11733/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 51s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. -1 javac 7m 39s The applied patch generated 1 additional warning messages. -1 javadoc 9m 45s The applied patch generated 1 additional warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 50s The applied patch generated 6 new checkstyle issues (total was 7, now 12). -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 32s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. -1 findbugs 4m 26s The patch appears to introduce 4 new Findbugs (version 3.0.0) warnings. -1 common tests 21m 29s Tests failed in hadoop-common. -1 hdfs tests 177m 1s Tests failed in hadoop-hdfs.     243m 49s   Reason Tests FindBugs module:hadoop-common Failed unit tests hadoop.io.retry.TestFailoverProxy   hadoop.ipc.TestIPC   hadoop.io.retry.TestRetryProxy   hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes   hadoop.hdfs.server.namenode.ha.TestHASafeMode   hadoop.hdfs.TestEncryptionZonesWithHA   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes   hadoop.hdfs.server.namenode.ha.TestHAAppend   hadoop.hdfs.tools.TestDFSZKFailoverController   hadoop.hdfs.server.namenode.ha.TestHAStateTransitions   hadoop.hdfs.TestDistributedFileSystem   hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA   hadoop.hdfs.server.namenode.ha.TestStandbyIsHot   hadoop.hdfs.server.namenode.TestCheckpoint   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover   hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication   hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics   hadoop.hdfs.TestGetBlocks   hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA   hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled   hadoop.hdfs.server.namenode.TestBlockUnderConstruction   hadoop.hdfs.TestDFSClientRetries   hadoop.hdfs.TestDFSInotifyEventInputStream   hadoop.hdfs.TestDFSClientFailover   hadoop.hdfs.server.namenode.ha.TestHAFsck   hadoop.hdfs.server.namenode.ha.TestDNFencing Timed out tests org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12745705/HDFS-7858.4.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 0bda84f javac https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/diffJavacWarnings.txt javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/diffJavadocWarnings.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/diffcheckstylehadoop-common.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/whitespace.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11733/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11733/testReport/ Java 1.7.0_55 uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11733/console This message was automatically generated.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thanks Arun Suresh, I will review your patch by next week.

          Show
          arpitagarwal Arpit Agarwal added a comment - Thanks Arun Suresh , I will review your patch by next week.
          Hide
          asuresh Arun Suresh added a comment -

          Updating patch to fix failed test-cases

          Show
          asuresh Arun Suresh added a comment - Updating patch to fix failed test-cases
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 17m 6s Findbugs (version ) appears to be broken on trunk.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          -1 javac 7m 34s The applied patch generated 1 additional warning messages.
          -1 javadoc 9m 35s The applied patch generated 1 additional warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 24s The applied patch generated 6 new checkstyle issues (total was 7, now 12).
          -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 30s mvn install still works.
          +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse.
          +1 findbugs 4m 23s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 22m 31s Tests passed in hadoop-common.
          -1 hdfs tests 160m 43s Tests failed in hadoop-hdfs.
              226m 4s  



          Reason Tests
          Failed unit tests hadoop.hdfs.TestAppendSnapshotTruncate
            hadoop.hdfs.TestDistributedFileSystem
            hadoop.hdfs.server.namenode.ha.TestStandbyIsHot



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12745987/HDFS-7858.5.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 176131f
          javac https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/diffJavacWarnings.txt
          javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/diffJavadocWarnings.txt
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/diffcheckstylehadoop-common.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/whitespace.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11747/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11747/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 17m 6s Findbugs (version ) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. -1 javac 7m 34s The applied patch generated 1 additional warning messages. -1 javadoc 9m 35s The applied patch generated 1 additional warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 24s The applied patch generated 6 new checkstyle issues (total was 7, now 12). -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 30s mvn install still works. +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse. +1 findbugs 4m 23s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 22m 31s Tests passed in hadoop-common. -1 hdfs tests 160m 43s Tests failed in hadoop-hdfs.     226m 4s   Reason Tests Failed unit tests hadoop.hdfs.TestAppendSnapshotTruncate   hadoop.hdfs.TestDistributedFileSystem   hadoop.hdfs.server.namenode.ha.TestStandbyIsHot Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12745987/HDFS-7858.5.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 176131f javac https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/diffJavacWarnings.txt javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/diffJavadocWarnings.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/diffcheckstylehadoop-common.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/whitespace.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11747/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11747/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11747/console This message was automatically generated.
          Hide
          asuresh Arun Suresh added a comment -

          Updating patch to fix javadoc, javac and checkstyle issue
          The three remaining testcase failures are unrelated (They seems to fail intermittently on my laptop)

          Show
          asuresh Arun Suresh added a comment - Updating patch to fix javadoc, javac and checkstyle issue The three remaining testcase failures are unrelated (They seems to fail intermittently on my laptop)
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 18m 44s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 34s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 27s The applied patch generated 11 new checkstyle issues (total was 426, now 436).
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 20s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 4m 23s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 22m 26s Tests passed in hadoop-common.
          -1 hdfs tests 160m 22s Tests failed in hadoop-hdfs.
              227m 53s  



          Reason Tests
          Failed unit tests hadoop.hdfs.TestAppendSnapshotTruncate
            hadoop.hdfs.TestDistributedFileSystem
            hadoop.hdfs.server.namenode.ha.TestStandbyIsHot



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12745991/HDFS-7858.6.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 176131f
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11748/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11748/artifact/patchprocess/whitespace.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11748/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11748/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11748/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11748/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 44s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 34s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 27s The applied patch generated 11 new checkstyle issues (total was 426, now 436). -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 20s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 4m 23s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 22m 26s Tests passed in hadoop-common. -1 hdfs tests 160m 22s Tests failed in hadoop-hdfs.     227m 53s   Reason Tests Failed unit tests hadoop.hdfs.TestAppendSnapshotTruncate   hadoop.hdfs.TestDistributedFileSystem   hadoop.hdfs.server.namenode.ha.TestStandbyIsHot Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12745991/HDFS-7858.6.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 176131f checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11748/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11748/artifact/patchprocess/whitespace.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11748/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11748/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11748/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11748/console This message was automatically generated.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for updating the patch, Arun Suresh! The current approach looks good to me. Some quick comments about the patch:

          1. In RequestHedgingInvocationHandler#invoke, instead of polling the tasks every 10ms, can we use CompletionService here?
          2. For RequestHedgingProxyProvider#performFailover, if the original successfulProxy is not null, we can exclude it for the next time retry.
          Show
          jingzhao Jing Zhao added a comment - Thanks for updating the patch, Arun Suresh ! The current approach looks good to me. Some quick comments about the patch: In RequestHedgingInvocationHandler#invoke , instead of polling the tasks every 10ms, can we use CompletionService here? For RequestHedgingProxyProvider#performFailover , if the original successfulProxy is not null, we can exclude it for the next time retry.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thanks for updating the patch Arun. The MultiException approach looks like a good alternative to refactoring RetryPolicy.

          A few comments:

          1. I didn't understand the call to super.performFailover in RequestHedgingProxyProvider#getProxy.
          2. The documentation in HDFSHighAvailabilityWithQJM.md and HDFSHighAvailabilityWithNFS.md should be updated as it states The only implementation which currently ships with Hadoop is the ConfiguredFailoverProxyProvider. Okay to do this in a separate Jira.
          3. Agree with Jing's suggestion to use a CompletionService.

          Also we should file a task to make RequestHedgingProxyProvider the default eventually.

          Nitpicks:

          1. getDelayMillis javadoc is wrong.
          2. successfullproxy should be successfulproxy.
          3. new LinkedList<RetryAction> - explicit type argument redundant.
          4. static interface ProxyFactory - static is redundant.
          Show
          arpitagarwal Arpit Agarwal added a comment - Thanks for updating the patch Arun. The MultiException approach looks like a good alternative to refactoring RetryPolicy. A few comments: I didn't understand the call to super.performFailover in RequestHedgingProxyProvider#getProxy . The documentation in HDFSHighAvailabilityWithQJM.md and HDFSHighAvailabilityWithNFS.md should be updated as it states The only implementation which currently ships with Hadoop is the ConfiguredFailoverProxyProvider . Okay to do this in a separate Jira. Agree with Jing's suggestion to use a CompletionService . Also we should file a task to make RequestHedgingProxyProvider the default eventually. Nitpicks: getDelayMillis javadoc is wrong. successfullproxy should be successfulproxy . new LinkedList<RetryAction> - explicit type argument redundant. static interface ProxyFactory - static is redundant.
          Hide
          asuresh Arun Suresh added a comment -

          Thanks Arpit Agarwal and Jing Zhao for your reviews.

          Uploading patch addressing your suggestions.

          w.r.t. Using CompletionService.
          Yup.. thanks, it did make the implementation more readable.

          I didn't understand the call to super.performFailover in RequestHedgingProxyProvider#getProxy.

          Yeah.. i wanted to increment the proxy index. Agreed, it does look out of place. Ive created an explicit method to make it more readable.

          For RequestHedgingProxyProvider#performFailover, if the original successfulProxy is not null, we can exclude it for the next time retry.

          So, in the case of the ReqHedgingProxy, performFailover will be called only if ALL the proxies have failed (with retry/failover_and_retry.. ), in which case, in the next attempt, the request will be again sent to all the namenodes, so dont think it makes sense to exclude it.

          new LinkedList<RetryAction> - explicit type argument redundant.

          Oh.. I was thinking we should keep trunk Java 7 compilable ?

          Show
          asuresh Arun Suresh added a comment - Thanks Arpit Agarwal and Jing Zhao for your reviews. Uploading patch addressing your suggestions. w.r.t. Using CompletionService. Yup.. thanks, it did make the implementation more readable. I didn't understand the call to super.performFailover in RequestHedgingProxyProvider#getProxy. Yeah.. i wanted to increment the proxy index. Agreed, it does look out of place. Ive created an explicit method to make it more readable. For RequestHedgingProxyProvider#performFailover, if the original successfulProxy is not null, we can exclude it for the next time retry. So, in the case of the ReqHedgingProxy, performFailover will be called only if ALL the proxies have failed (with retry/failover_and_retry.. ), in which case, in the next attempt, the request will be again sent to all the namenodes, so dont think it makes sense to exclude it. new LinkedList<RetryAction> - explicit type argument redundant. Oh.. I was thinking we should keep trunk Java 7 compilable ?
          Hide
          jingzhao Jing Zhao added a comment - - edited

          Thanks for updating the patch, Arun Suresh.

          So, in the case of the ReqHedgingProxy, performFailover will be called only if ALL the proxies have failed (with retry/failover_and_retry.. )

          But I guess if "the original successfulProxy is not null", this try only uses this single proxy thus the failover_and_retry only talks about it?

          Oh.. I was thinking we should keep trunk Java 7 compilable ?

          Java 7 does not require explicit type argument. Java 6 does.

          Show
          jingzhao Jing Zhao added a comment - - edited Thanks for updating the patch, Arun Suresh . So, in the case of the ReqHedgingProxy, performFailover will be called only if ALL the proxies have failed (with retry/failover_and_retry.. ) But I guess if "the original successfulProxy is not null", this try only uses this single proxy thus the failover_and_retry only talks about it? Oh.. I was thinking we should keep trunk Java 7 compilable ? Java 7 does not require explicit type argument. Java 6 does.
          Hide
          asuresh Arun Suresh added a comment -

          Jing Zhao, I think I get your point.

          Updating patch

          • Now, if ALL proxies return FAILOVER_AND_RETRY, the operation is failed (since there are no proxies to failover to)
          • In the performFailover method, the last successful proxy (which has now failed.. and thus the performFailover is called) is recorded, and is excluded in the next retry...
          • Removed some explicit type arguments (apologize, dont know why i kept thinking its a java 8 feature... will take care of it in future patches)
          Show
          asuresh Arun Suresh added a comment - Jing Zhao , I think I get your point. Updating patch Now, if ALL proxies return FAILOVER_AND_RETRY, the operation is failed (since there are no proxies to failover to) In the performFailover method, the last successful proxy (which has now failed.. and thus the performFailover is called) is recorded, and is excluded in the next retry... Removed some explicit type arguments (apologize, dont know why i kept thinking its a java 8 feature... will take care of it in future patches)
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 22m 3s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 39s There were no new javac warning messages.
          +1 javadoc 9m 38s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 site 2m 59s Site still builds.
          +1 checkstyle 2m 4s There were no new checkstyle issues.
          -1 whitespace 0m 1s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 30s mvn install still works.
          +1 eclipse:eclipse 0m 31s The patch built with eclipse:eclipse.
          +1 findbugs 4m 21s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 22m 25s Tests passed in hadoop-common.
          -1 hdfs tests 161m 5s Tests failed in hadoop-hdfs.
              234m 42s  



          Reason Tests
          Failed unit tests hadoop.hdfs.TestDistributedFileSystem
            hadoop.hdfs.server.namenode.ha.TestStandbyIsHot



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12746903/HDFS-7858.7.patch
          Optional Tests javadoc javac unit findbugs checkstyle site
          git revision trunk / 1d3026e
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11816/artifact/patchprocess/whitespace.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11816/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11816/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11816/testReport/
          Java 1.7.0_55
          uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11816/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 22m 3s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 39s There were no new javac warning messages. +1 javadoc 9m 38s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 site 2m 59s Site still builds. +1 checkstyle 2m 4s There were no new checkstyle issues. -1 whitespace 0m 1s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 30s mvn install still works. +1 eclipse:eclipse 0m 31s The patch built with eclipse:eclipse. +1 findbugs 4m 21s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 22m 25s Tests passed in hadoop-common. -1 hdfs tests 161m 5s Tests failed in hadoop-hdfs.     234m 42s   Reason Tests Failed unit tests hadoop.hdfs.TestDistributedFileSystem   hadoop.hdfs.server.namenode.ha.TestStandbyIsHot Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12746903/HDFS-7858.7.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / 1d3026e whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11816/artifact/patchprocess/whitespace.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11816/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11816/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11816/testReport/ Java 1.7.0_55 uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11816/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 21m 48s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 32s There were no new javac warning messages.
          +1 javadoc 9m 36s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 site 2m 58s Site still builds.
          +1 checkstyle 2m 2s There were no new checkstyle issues.
          -1 whitespace 0m 1s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 29s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          -1 findbugs 4m 25s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings.
          -1 common tests 21m 31s Tests failed in hadoop-common.
          -1 hdfs tests 171m 16s Tests failed in hadoop-hdfs.
              243m 36s  



          Reason Tests
          FindBugs module:hadoop-hdfs
          Failed unit tests hadoop.io.retry.TestFailoverProxy
            hadoop.hdfs.TestDFSInotifyEventInputStream
            hadoop.hdfs.tools.TestDFSZKFailoverController
            hadoop.hdfs.TestDFSClientFailover
            hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
            hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
            hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled
            hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
            hadoop.hdfs.server.namenode.ha.TestDNFencing
            hadoop.hdfs.server.namenode.ha.TestHAMetrics
            hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA
            hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
            hadoop.hdfs.TestDistributedFileSystem
            hadoop.hdfs.server.namenode.ha.TestHAFsck
            hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
            hadoop.hdfs.server.namenode.ha.TestHAAppend
            hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA
            hadoop.hdfs.server.namenode.ha.TestHASafeMode
            hadoop.hdfs.server.namenode.ha.TestStandbyIsHot
            hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
            hadoop.hdfs.TestEncryptionZonesWithHA
          Timed out tests org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12746923/HDFS-7858.8.patch
          Optional Tests javadoc javac unit findbugs checkstyle site
          git revision trunk / ab3197c
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11817/artifact/patchprocess/whitespace.txt
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11817/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11817/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11817/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11817/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11817/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 21m 48s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 32s There were no new javac warning messages. +1 javadoc 9m 36s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 site 2m 58s Site still builds. +1 checkstyle 2m 2s There were no new checkstyle issues. -1 whitespace 0m 1s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 29s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. -1 findbugs 4m 25s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. -1 common tests 21m 31s Tests failed in hadoop-common. -1 hdfs tests 171m 16s Tests failed in hadoop-hdfs.     243m 36s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.io.retry.TestFailoverProxy   hadoop.hdfs.TestDFSInotifyEventInputStream   hadoop.hdfs.tools.TestDFSZKFailoverController   hadoop.hdfs.TestDFSClientFailover   hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover   hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled   hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication   hadoop.hdfs.server.namenode.ha.TestDNFencing   hadoop.hdfs.server.namenode.ha.TestHAMetrics   hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA   hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes   hadoop.hdfs.TestDistributedFileSystem   hadoop.hdfs.server.namenode.ha.TestHAFsck   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes   hadoop.hdfs.server.namenode.ha.TestHAAppend   hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA   hadoop.hdfs.server.namenode.ha.TestHASafeMode   hadoop.hdfs.server.namenode.ha.TestStandbyIsHot   hadoop.hdfs.server.namenode.ha.TestHAStateTransitions   hadoop.hdfs.TestEncryptionZonesWithHA Timed out tests org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12746923/HDFS-7858.8.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / ab3197c whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11817/artifact/patchprocess/whitespace.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11817/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11817/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11817/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11817/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11817/console This message was automatically generated.
          Hide
          asuresh Arun Suresh added a comment -

          Updating patch :

          • Fixing whitespace issues
          • Now, if ALL proxies return FAILOVER_AND_RETRY, the operation is failed (since there are no proxies to failover to)

            Reverting this behavior. Since I think the assumption (atleast in many of the testcases) is that FAILOVER_AND_RETRY implies just RETRY if no other proxy exists to failover to.

          Show
          asuresh Arun Suresh added a comment - Updating patch : Fixing whitespace issues Now, if ALL proxies return FAILOVER_AND_RETRY, the operation is failed (since there are no proxies to failover to) Reverting this behavior. Since I think the assumption (atleast in many of the testcases) is that FAILOVER_AND_RETRY implies just RETRY if no other proxy exists to failover to.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 29m 21s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 9m 34s There were no new javac warning messages.
          +1 javadoc 12m 13s There were no new javadoc warning messages.
          +1 release audit 0m 30s The applied patch does not increase the total number of release audit warnings.
          +1 site 3m 33s Site still builds.
          +1 checkstyle 2m 36s There were no new checkstyle issues.
          -1 whitespace 0m 1s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 59s mvn install still works.
          +1 eclipse:eclipse 0m 40s The patch built with eclipse:eclipse.
          -1 findbugs 5m 36s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings.
          -1 common tests 24m 55s Tests failed in hadoop-common.
          -1 hdfs tests 0m 25s Tests failed in hadoop-hdfs.
              91m 30s  



          Reason Tests
          FindBugs module:hadoop-hdfs
          Failed unit tests hadoop.io.retry.TestFailoverProxy
          Failed build hadoop-hdfs



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12746923/HDFS-7858.8.patch
          Optional Tests javadoc javac unit findbugs checkstyle site
          git revision trunk / 02c0181
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11821/artifact/patchprocess/whitespace.txt
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11821/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11821/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11821/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11821/testReport/
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11821/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 29m 21s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 9m 34s There were no new javac warning messages. +1 javadoc 12m 13s There were no new javadoc warning messages. +1 release audit 0m 30s The applied patch does not increase the total number of release audit warnings. +1 site 3m 33s Site still builds. +1 checkstyle 2m 36s There were no new checkstyle issues. -1 whitespace 0m 1s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 59s mvn install still works. +1 eclipse:eclipse 0m 40s The patch built with eclipse:eclipse. -1 findbugs 5m 36s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. -1 common tests 24m 55s Tests failed in hadoop-common. -1 hdfs tests 0m 25s Tests failed in hadoop-hdfs.     91m 30s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.io.retry.TestFailoverProxy Failed build hadoop-hdfs Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12746923/HDFS-7858.8.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / 02c0181 whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11821/artifact/patchprocess/whitespace.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11821/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11821/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11821/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11821/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11821/console This message was automatically generated.
          Hide
          vinayrpet Vinayakumar B added a comment -

          I have a small question here.
          I believe all client operations will successfully talk to only Active NameNode.
          In current ConfiguredFailoverProxyProvider, only at the beginning, when the client initializes, there will be a need of trying to both Nodes, if standby comes first.
          During failover, if ANN goes down and SNN is still not failedover, then client has to try again to previous ANN and come back to current SNN to check for the failover one more time. Once the successful proxy found, all subsequent requests will go there.

          In case of proposed RequestHedgingProxyProvider, Only at the beginning, there will not be any failed proxy, at that time hedged requests will goto both NNs.
          During failover, current failed proxy (prev ANN) will be ignored for hedged requests, i.e. in case of failover of HA, only one request will be invoked (SNN) in hedged invocations. Am I right?

          This way I feel both ConfiguredFailoverProxyProvider and RequestHedgingProxyProvider work same way, except at the very first time. And yes, if no. of proxies to try to are more than 2 then RequestHedgingProxyProvider will be best.

          Am I missing something here?

          Show
          vinayrpet Vinayakumar B added a comment - I have a small question here. I believe all client operations will successfully talk to only Active NameNode. In current ConfiguredFailoverProxyProvider , only at the beginning, when the client initializes, there will be a need of trying to both Nodes, if standby comes first. During failover, if ANN goes down and SNN is still not failedover, then client has to try again to previous ANN and come back to current SNN to check for the failover one more time. Once the successful proxy found, all subsequent requests will go there. In case of proposed RequestHedgingProxyProvider , Only at the beginning, there will not be any failed proxy, at that time hedged requests will goto both NNs. During failover, current failed proxy (prev ANN) will be ignored for hedged requests, i.e. in case of failover of HA, only one request will be invoked (SNN) in hedged invocations. Am I right? This way I feel both ConfiguredFailoverProxyProvider and RequestHedgingProxyProvider work same way, except at the very first time. And yes, if no. of proxies to try to are more than 2 then RequestHedgingProxyProvider will be best. Am I missing something here?
          Hide
          asuresh Arun Suresh added a comment -

          in case of failover of HA, only one request will be invoked (SNN) in hedged invocations. Am I right?

          yup.. although in the case of more than 2 NNs, the subsequent request will be hedged to ALL remaining NNs except the current failed-over NN.

          This way I feel both ConfiguredFailoverProxyProvider and RequestHedgingProxyProvider work same way, except at the very first time. ..

          Yup.. as well as the above mentioned condition.

          ..if no. of proxies to try to are more than 2 then RequestHedgingProxyProvider will be best.

          yup.. now that HDFS-6440 is resolved, I am hoping ReqHedging would be default. It is also useful in cases where there are large number of adhoc clients (MR jobs) where many of the calls will be one time calls. RequestHedgingProxyProvider will ensure that these tasks don't have to wait for a timed-out request / Exception from a Failed NN to failover to failover to the SNN.

          Show
          asuresh Arun Suresh added a comment - in case of failover of HA, only one request will be invoked (SNN) in hedged invocations. Am I right? yup.. although in the case of more than 2 NNs, the subsequent request will be hedged to ALL remaining NNs except the current failed-over NN. This way I feel both ConfiguredFailoverProxyProvider and RequestHedgingProxyProvider work same way, except at the very first time. .. Yup.. as well as the above mentioned condition. ..if no. of proxies to try to are more than 2 then RequestHedgingProxyProvider will be best. yup.. now that HDFS-6440 is resolved, I am hoping ReqHedging would be default. It is also useful in cases where there are large number of adhoc clients (MR jobs) where many of the calls will be one time calls. RequestHedgingProxyProvider will ensure that these tasks don't have to wait for a timed-out request / Exception from a Failed NN to failover to failover to the SNN.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 26m 54s Pre-patch trunk compilation is healthy.
          +1 @author 0m 1s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 9m 44s There were no new javac warning messages.
          +1 javadoc 11m 53s There were no new javadoc warning messages.
          +1 release audit 0m 27s The applied patch does not increase the total number of release audit warnings.
          +1 site 3m 37s Site still builds.
          +1 checkstyle 2m 39s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 45s mvn install still works.
          +1 eclipse:eclipse 0m 43s The patch built with eclipse:eclipse.
          -1 findbugs 5m 33s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings.
          -1 common tests 25m 29s Tests failed in hadoop-common.
          -1 hdfs tests 161m 41s Tests failed in hadoop-hdfs.
              250m 31s  



          Reason Tests
          FindBugs module:hadoop-hdfs
          Failed unit tests hadoop.ha.TestZKFailoverController
            hadoop.hdfs.TestAppendSnapshotTruncate
            hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider
            hadoop.hdfs.TestDistributedFileSystem
          Timed out tests org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12746976/HDFS-7858.9.patch
          Optional Tests javadoc javac unit findbugs checkstyle site
          git revision trunk / e202efa
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11826/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11826/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11826/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11826/testReport/
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11826/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 26m 54s Pre-patch trunk compilation is healthy. +1 @author 0m 1s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 9m 44s There were no new javac warning messages. +1 javadoc 11m 53s There were no new javadoc warning messages. +1 release audit 0m 27s The applied patch does not increase the total number of release audit warnings. +1 site 3m 37s Site still builds. +1 checkstyle 2m 39s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 45s mvn install still works. +1 eclipse:eclipse 0m 43s The patch built with eclipse:eclipse. -1 findbugs 5m 33s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. -1 common tests 25m 29s Tests failed in hadoop-common. -1 hdfs tests 161m 41s Tests failed in hadoop-hdfs.     250m 31s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.ha.TestZKFailoverController   hadoop.hdfs.TestAppendSnapshotTruncate   hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider   hadoop.hdfs.TestDistributedFileSystem Timed out tests org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12746976/HDFS-7858.9.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / e202efa Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11826/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11826/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11826/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11826/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11826/console This message was automatically generated.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks again for updating the patch, Arun Suresh! Some minor comments on the latest patch:

          1. Do we need the latch in RequestHedgingInvocationHandler#invoke?
          2. I'm not sure if we need requestTimeout. Client/DN now already sets their specific socket timeout for their connection to NameNode thus it seems redundant to have an extra 2 min timeout when polling the CompletionService.
          3. We can use the ExecutionException thrown by callResultFuture.get() to get the exception thrown by the invocation.
          4. Maybe we should use debug/trace here?
            +            LOG.info("Invocation successful on ["
            +                    + callResultFuture.get().name + "]");
            
          Show
          jingzhao Jing Zhao added a comment - Thanks again for updating the patch, Arun Suresh ! Some minor comments on the latest patch: Do we need the latch in RequestHedgingInvocationHandler#invoke ? I'm not sure if we need requestTimeout. Client/DN now already sets their specific socket timeout for their connection to NameNode thus it seems redundant to have an extra 2 min timeout when polling the CompletionService. We can use the ExecutionException thrown by callResultFuture.get() to get the exception thrown by the invocation. Maybe we should use debug/trace here? + LOG.info( "Invocation successful on [" + + callResultFuture.get().name + "]" );
          Hide
          asuresh Arun Suresh added a comment -

          Thanks again for the review Jing Zhao,

          Uploading patch addressing your suggestions..

          do we need the latch in RequestHedgingInvocationHandler#invoke ?

          No necessarily.. just wanted to ensure all requests are started almost at the same time. But yeah, since the size of the thread pool is equal to the number of proxies, it should technically start simultaneously… Ive Removed it

          w.r.t the requestTimeout. ..
          Hmmm.. Agreed, its not really necessary, (But i think we have to doc that if this is refactored as a general Handler where are not sure of the underlying Client/Server protocol and assumptions, a bounding timeout would be good/necessary)

          We can use the ExecutionException thrown by callResultFuture.get() to get the exception thrown by the invocation.

          So, if you notice, I have a CallResult object which is what is actually returned by classResultFuture.get(). I need this to get name of the proxy which was successful (so i can key into the targetProxies map). CallResult catches the exception and sets it as the result.

          Show
          asuresh Arun Suresh added a comment - Thanks again for the review Jing Zhao , Uploading patch addressing your suggestions.. do we need the latch in RequestHedgingInvocationHandler#invoke ? No necessarily.. just wanted to ensure all requests are started almost at the same time. But yeah, since the size of the thread pool is equal to the number of proxies, it should technically start simultaneously… Ive Removed it w.r.t the requestTimeout. .. Hmmm.. Agreed, its not really necessary, (But i think we have to doc that if this is refactored as a general Handler where are not sure of the underlying Client/Server protocol and assumptions, a bounding timeout would be good/necessary) We can use the ExecutionException thrown by callResultFuture.get() to get the exception thrown by the invocation. So, if you notice, I have a CallResult object which is what is actually returned by classResultFuture.get(). I need this to get name of the proxy which was successful (so i can key into the targetProxies map). CallResult catches the exception and sets it as the result.
          Hide
          jingzhao Jing Zhao added a comment -

          I need this to get name of the proxy which was successful (so i can key into the targetProxies map). CallResult catches the exception and sets it as the result.

          Yeah, I did notice the exception has been captured by CallResult. But maybe we can use a future-->proxy map here? In this way we do not need to have a wrapper class like CallResult so maybe the code can be further simplified.

          Show
          jingzhao Jing Zhao added a comment - I need this to get name of the proxy which was successful (so i can key into the targetProxies map). CallResult catches the exception and sets it as the result. Yeah, I did notice the exception has been captured by CallResult. But maybe we can use a future-->proxy map here? In this way we do not need to have a wrapper class like CallResult so maybe the code can be further simplified.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 21m 46s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 35s There were no new javac warning messages.
          +1 javadoc 9m 43s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 site 3m 0s Site still builds.
          +1 checkstyle 2m 2s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 29s mvn install still works.
          +1 eclipse:eclipse 0m 31s The patch built with eclipse:eclipse.
          -1 findbugs 4m 26s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings.
          +1 common tests 22m 16s Tests passed in hadoop-common.
          -1 hdfs tests 158m 42s Tests failed in hadoop-hdfs.
              231m 57s  



          Reason Tests
          FindBugs module:hadoop-hdfs
          Failed unit tests hadoop.hdfs.TestDistributedFileSystem



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12747095/HDFS-7858.10.patch
          Optional Tests javadoc javac unit findbugs checkstyle site
          git revision trunk / d19d187
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11833/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11833/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11833/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11833/testReport/
          Java 1.7.0_55
          uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11833/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 21m 46s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 35s There were no new javac warning messages. +1 javadoc 9m 43s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 site 3m 0s Site still builds. +1 checkstyle 2m 2s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 29s mvn install still works. +1 eclipse:eclipse 0m 31s The patch built with eclipse:eclipse. -1 findbugs 4m 26s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. +1 common tests 22m 16s Tests passed in hadoop-common. -1 hdfs tests 158m 42s Tests failed in hadoop-hdfs.     231m 57s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.hdfs.TestDistributedFileSystem Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747095/HDFS-7858.10.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / d19d187 Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11833/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11833/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11833/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11833/testReport/ Java 1.7.0_55 uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11833/console This message was automatically generated.
          Hide
          asuresh Arun Suresh added a comment -

          Uploading patch

          • As per Jing Zhao's suggestion, removing CallResut class
          • Minor refactoring, since I had missed a case earlier, which was resulting in a NPE
          • Updated test cases
          Show
          asuresh Arun Suresh added a comment - Uploading patch As per Jing Zhao 's suggestion, removing CallResut class Minor refactoring, since I had missed a case earlier, which was resulting in a NPE Updated test cases
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 21m 50s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 34s There were no new javac warning messages.
          +1 javadoc 9m 39s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 site 2m 59s Site still builds.
          +1 checkstyle 1m 59s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 29s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          -1 findbugs 4m 24s The patch appears to introduce 2 new Findbugs (version 3.0.0) warnings.
          +1 common tests 22m 27s Tests passed in hadoop-common.
          -1 hdfs tests 163m 44s Tests failed in hadoop-hdfs.
              237m 6s  



          Reason Tests
          FindBugs module:hadoop-hdfs
          Failed unit tests hadoop.hdfs.TestDistributedFileSystem
            hadoop.hdfs.server.namenode.ha.TestStandbyIsHot
          Timed out tests org.apache.hadoop.hdfs.server.mover.TestStorageMover



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12747173/HDFS-7858.10.patch
          Optional Tests javadoc javac unit findbugs checkstyle site
          git revision trunk / adcf5dd
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11838/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11838/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11838/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11838/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11838/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 21m 50s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 34s There were no new javac warning messages. +1 javadoc 9m 39s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 site 2m 59s Site still builds. +1 checkstyle 1m 59s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 29s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. -1 findbugs 4m 24s The patch appears to introduce 2 new Findbugs (version 3.0.0) warnings. +1 common tests 22m 27s Tests passed in hadoop-common. -1 hdfs tests 163m 44s Tests failed in hadoop-hdfs.     237m 6s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.hdfs.TestDistributedFileSystem   hadoop.hdfs.server.namenode.ha.TestStandbyIsHot Timed out tests org.apache.hadoop.hdfs.server.mover.TestStorageMover Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747173/HDFS-7858.10.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / adcf5dd Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/11838/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11838/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11838/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11838/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11838/console This message was automatically generated.
          Hide
          asuresh Arun Suresh added a comment -

          Fixing findbugs warnings. The Test case failures are unrelated.

          Show
          asuresh Arun Suresh added a comment - Fixing findbugs warnings. The Test case failures are unrelated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 19m 52s Findbugs (version ) appears to be broken on trunk.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 34s There were no new javac warning messages.
          +1 javadoc 9m 32s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 site 2m 59s Site still builds.
          +1 checkstyle 1m 39s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 28s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 4m 21s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 22m 21s Tests passed in hadoop-common.
          -1 hdfs tests 160m 51s Tests failed in hadoop-hdfs.
              231m 36s  



          Reason Tests
          Failed unit tests hadoop.hdfs.TestAppendSnapshotTruncate
            hadoop.hdfs.TestDistributedFileSystem



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12747199/HDFS-7858.11.patch
          Optional Tests javadoc javac unit findbugs checkstyle site
          git revision trunk / 156f24e
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11840/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11840/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11840/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11840/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 19m 52s Findbugs (version ) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 34s There were no new javac warning messages. +1 javadoc 9m 32s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 site 2m 59s Site still builds. +1 checkstyle 1m 39s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 28s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 4m 21s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 22m 21s Tests passed in hadoop-common. -1 hdfs tests 160m 51s Tests failed in hadoop-hdfs.     231m 36s   Reason Tests Failed unit tests hadoop.hdfs.TestAppendSnapshotTruncate   hadoop.hdfs.TestDistributedFileSystem Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747199/HDFS-7858.11.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / 156f24e hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11840/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11840/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11840/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11840/console This message was automatically generated.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          I am +1 on the v11 patch. RequestHedgingProxyProvider is disabled by default so remaining issues can be addressed separately to avoid spinning on this forever.

          1. One optimization with your new approach - In the common HA case with two NameNodes, after performFailover is called, toIgnore will be non-null. We don't need to create a thread pool/completion service, we can simply send the request to the single proxy in the callers thread.
          2. The TODO is not technically a TODO. We can just document this property in the class Javadoc that it can block indefinitely and depends on the caller implementing a timeout.
          3. Couple of documentation nitpicks:
            1. The two implementations which currently ships -> The two implementations which currently ship
            2. so use these --> so use one of these unless you are using a custom proxy provider

          Will hold off committing in case Jing Zhao has any further comments. Thanks for working on this Arun Suresh.

          Show
          arpitagarwal Arpit Agarwal added a comment - I am +1 on the v11 patch. RequestHedgingProxyProvider is disabled by default so remaining issues can be addressed separately to avoid spinning on this forever. One optimization with your new approach - In the common HA case with two NameNodes, after performFailover is called, toIgnore will be non-null. We don't need to create a thread pool/completion service, we can simply send the request to the single proxy in the callers thread. The TODO is not technically a TODO. We can just document this property in the class Javadoc that it can block indefinitely and depends on the caller implementing a timeout. Couple of documentation nitpicks: The two implementations which currently ships -> The two implementations which currently ship so use these --> so use one of these unless you are using a custom proxy provider Will hold off committing in case Jing Zhao has any further comments. Thanks for working on this Arun Suresh .
          Hide
          asuresh Arun Suresh added a comment -

          Thanks for the rev Arpit Agarwal,

          Updated patch :

          • Incorporated your optimization with a minor modification (with consideration for case where you might have more than 2 proxies configured). Also updated
          • Updated docs
          • Updated testcases

          Will commit it by tomorrow, if you and Jing Zhao are ok with the latest patch

          Show
          asuresh Arun Suresh added a comment - Thanks for the rev Arpit Agarwal , Updated patch : Incorporated your optimization with a minor modification (with consideration for case where you might have more than 2 proxies configured). Also updated Updated docs Updated testcases Will commit it by tomorrow, if you and Jing Zhao are ok with the latest patch
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for working on this, Arun Suresh! The latest patch looks good to me. +1. Also agree with Arpit Agarwal that we can keep testing and improving this since this is currently not default.

          Show
          jingzhao Jing Zhao added a comment - Thanks for working on this, Arun Suresh ! The latest patch looks good to me. +1. Also agree with Arpit Agarwal that we can keep testing and improving this since this is currently not default.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 21m 42s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          -1 javac 7m 31s The applied patch generated 1 additional warning messages.
          +1 javadoc 9m 36s There were no new javadoc warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          +1 site 2m 59s Site still builds.
          +1 checkstyle 2m 1s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 27s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 4m 19s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 common tests 22m 20s Tests failed in hadoop-common.
          -1 hdfs tests 162m 20s Tests failed in hadoop-hdfs.
              235m 17s  



          Reason Tests
          Failed unit tests hadoop.ipc.TestRPC
            hadoop.fs.TestLocalFsFCStatistics
            hadoop.hdfs.server.namenode.ha.TestStandbyIsHot
            hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider
            hadoop.hdfs.server.namenode.TestFsck



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12747399/HDFS-7858.12.patch
          Optional Tests javadoc javac unit findbugs checkstyle site
          git revision trunk / f36835f
          javac https://builds.apache.org/job/PreCommit-HDFS-Build/11848/artifact/patchprocess/diffJavacWarnings.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11848/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11848/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11848/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11848/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 21m 42s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. -1 javac 7m 31s The applied patch generated 1 additional warning messages. +1 javadoc 9m 36s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. +1 site 2m 59s Site still builds. +1 checkstyle 2m 1s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 27s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 4m 19s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 common tests 22m 20s Tests failed in hadoop-common. -1 hdfs tests 162m 20s Tests failed in hadoop-hdfs.     235m 17s   Reason Tests Failed unit tests hadoop.ipc.TestRPC   hadoop.fs.TestLocalFsFCStatistics   hadoop.hdfs.server.namenode.ha.TestStandbyIsHot   hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider   hadoop.hdfs.server.namenode.TestFsck Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747399/HDFS-7858.12.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / f36835f javac https://builds.apache.org/job/PreCommit-HDFS-Build/11848/artifact/patchprocess/diffJavacWarnings.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11848/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11848/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11848/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11848/console This message was automatically generated.
          Hide
          asuresh Arun Suresh added a comment -

          The testcase failures seem spurious.
          Attaching a patch to fix the javac warnings and cleaning up some imports.

          Thanks for the reviews Jing Zhao, Arpit Agarwal, Aaron T. Myers & Bikas Saha.
          Will be committing after the next jenkins run.

          Show
          asuresh Arun Suresh added a comment - The testcase failures seem spurious. Attaching a patch to fix the javac warnings and cleaning up some imports. Thanks for the reviews Jing Zhao , Arpit Agarwal , Aaron T. Myers & Bikas Saha . Will be committing after the next jenkins run.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 22m 33s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          -1 javac 7m 38s The applied patch generated 2 additional warning messages.
          +1 javadoc 9m 51s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 site 2m 57s Site still builds.
          +1 checkstyle 2m 2s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 27s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 4m 24s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 22m 20s Tests passed in hadoop-common.
          -1 hdfs tests 158m 53s Tests failed in hadoop-hdfs.
              233m 4s  



          Reason Tests
          Failed unit tests hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12747474/HDFS-7858.13.patch
          Optional Tests javadoc javac unit findbugs checkstyle site
          git revision trunk / 3572ebd
          javac https://builds.apache.org/job/PreCommit-HDFS-Build/11852/artifact/patchprocess/diffJavacWarnings.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11852/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11852/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11852/testReport/
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11852/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 22m 33s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. -1 javac 7m 38s The applied patch generated 2 additional warning messages. +1 javadoc 9m 51s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 site 2m 57s Site still builds. +1 checkstyle 2m 2s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 27s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 4m 24s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 22m 20s Tests passed in hadoop-common. -1 hdfs tests 158m 53s Tests failed in hadoop-hdfs.     233m 4s   Reason Tests Failed unit tests hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747474/HDFS-7858.13.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / 3572ebd javac https://builds.apache.org/job/PreCommit-HDFS-Build/11852/artifact/patchprocess/diffJavacWarnings.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/11852/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11852/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11852/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11852/console This message was automatically generated.
          Hide
          asuresh Arun Suresh added a comment -

          Test Case error was due to a timing issue. Modified test case to ensure that it doesn't happen

          Committed to trunk and branch-2

          Show
          asuresh Arun Suresh added a comment - Test Case error was due to a timing issue. Modified test case to ensure that it doesn't happen Committed to trunk and branch-2
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8231 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8231/)
          HDFS-7858. Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8231 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8231/ ) HDFS-7858 . Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #270 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/270/)
          HDFS-7858. Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #270 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/270/ ) HDFS-7858 . Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1000 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1000/)
          HDFS-7858. Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1000 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1000/ ) HDFS-7858 . Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2197 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2197/)
          HDFS-7858. Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2197 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2197/ ) HDFS-7858 . Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #259 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/259/)
          HDFS-7858. Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #259 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/259/ ) HDFS-7858 . Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #267 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/267/)
          HDFS-7858. Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #267 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/267/ ) HDFS-7858 . Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2216 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2216/)
          HDFS-7858. Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java
          • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2216 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2216/ ) HDFS-7858 . Improve HA Namenode Failover detection on the client. (asuresh) (Arun Suresh: rev 030fcfa99c345ad57625486eeabedebf2fd4411f) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/MultiException.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          > ... then those clients might not get a response soon enough to try the other NN.

          Arun Suresh, do you recall how long have you seen for the client waiting? I might hit a similar problem recently.

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - > ... then those clients might not get a response soon enough to try the other NN. Arun Suresh , do you recall how long have you seen for the client waiting? I might hit a similar problem recently.
          Hide
          asuresh Arun Suresh added a comment -

          Tsz Wo Nicholas Sze, Unfortunately, I do not remember the specifics, but think it went in to minutes..

          Show
          asuresh Arun Suresh added a comment - Tsz Wo Nicholas Sze , Unfortunately, I do not remember the specifics, but think it went in to minutes..
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          Never mind. Thanks for the response.

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - Never mind. Thanks for the response.
          Hide
          hexiaoqiao He Xiaoqiao added a comment -

          hi Arun Suresh when i patch this to 2.7.1, it throws some exception when submit job as following:

          2016-07-01 17:45:37,497 WARN [pool-9-thread-2] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : java.nio.channels.ClosedByInterruptException
          2016-07-01 17:45:37,542 WARN [pool-10-thread-2] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : java.nio.channels.ClosedByInterruptException
          2016-07-01 17:45:37,571 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Emitting job history data to the timeline server is not enabled
          2016-07-01 17:45:37,572 WARN [pool-11-thread-2] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : java.nio.channels.ClosedByInterruptException
          2016-07-01 17:45:37,573 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Recovery is enabled. Will try to recover from previous life on best effort basis.
          2016-07-01 17:45:37,633 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [viewfs://ha/]
          2016-07-01 17:45:37,698 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at viewfs://ha/hadoop-yarn/staging/yarn/.staging/job_1467365572539_3212/job_1467365572539_3212_1.jhist
          2016-07-01 17:45:37,713 WARN [main] org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider: Invocation returned exception on [nn1host/ip:port]
          2016-07-01 17:45:37,716 WARN [pool-12-thread-2] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
          2016-07-01 17:45:37,717 WARN [main] org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider: Invocation returned exception on [nn2host/ip:port]
          2016-07-01 17:45:37,725 WARN [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Unable to parse prior job history, aborting recovery
          MultiException[

          Unknown macro: {java.util.concurrent.ExecutionException}

          ]
          at org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler.invoke(RequestHedgingProxyProvider.java:133)
          at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:606)
          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
          at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
          at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226)
          at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
          at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201)
          at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:303)
          at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:269)
          at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:261)
          at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526)
          at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:303)
          at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
          at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
          at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:299)
          at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:161)
          at org.apache.hadoop.fs.viewfs.ChRootedFileSystem.open(ChRootedFileSystem.java:257)
          at org.apache.hadoop.fs.viewfs.ViewFileSystem.open(ViewFileSystem.java:423)
          at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:788)
          at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.getPreviousJobHistoryStream(MRAppMaster.java:1199)
          at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.parsePreviousJobHistory(MRAppMaster.java:1203)
          at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.processRecovery(MRAppMaster.java:1175)
          at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1039)
          at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
          at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:415)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
          at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
          at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)

          is this out of expectation?

          Show
          hexiaoqiao He Xiaoqiao added a comment - hi Arun Suresh when i patch this to 2.7.1, it throws some exception when submit job as following: 2016-07-01 17:45:37,497 WARN [pool-9-thread-2] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : java.nio.channels.ClosedByInterruptException 2016-07-01 17:45:37,542 WARN [pool-10-thread-2] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : java.nio.channels.ClosedByInterruptException 2016-07-01 17:45:37,571 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Emitting job history data to the timeline server is not enabled 2016-07-01 17:45:37,572 WARN [pool-11-thread-2] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : java.nio.channels.ClosedByInterruptException 2016-07-01 17:45:37,573 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Recovery is enabled. Will try to recover from previous life on best effort basis. 2016-07-01 17:45:37,633 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [viewfs://ha/] 2016-07-01 17:45:37,698 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at viewfs://ha/hadoop-yarn/staging/yarn/.staging/job_1467365572539_3212/job_1467365572539_3212_1.jhist 2016-07-01 17:45:37,713 WARN [main] org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider: Invocation returned exception on [nn1host/ip:port] 2016-07-01 17:45:37,716 WARN [pool-12-thread-2] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby 2016-07-01 17:45:37,717 WARN [main] org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider: Invocation returned exception on [nn2host/ip:port] 2016-07-01 17:45:37,725 WARN [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Unable to parse prior job history, aborting recovery MultiException[ Unknown macro: {java.util.concurrent.ExecutionException} ] at org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler.invoke(RequestHedgingProxyProvider.java:133) at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:303) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:269) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:261) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:303) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:299) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:161) at org.apache.hadoop.fs.viewfs.ChRootedFileSystem.open(ChRootedFileSystem.java:257) at org.apache.hadoop.fs.viewfs.ViewFileSystem.open(ViewFileSystem.java:423) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:788) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.getPreviousJobHistoryStream(MRAppMaster.java:1199) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.parsePreviousJobHistory(MRAppMaster.java:1203) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.processRecovery(MRAppMaster.java:1175) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1039) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448) is this out of expectation?

            People

            • Assignee:
              asuresh Arun Suresh
              Reporter:
              asuresh Arun Suresh
            • Votes:
              0 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development