Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1623 High Availability Framework for HDFS NN
  3. HDFS-1973

HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: HA branch (HDFS-1623)
    • Component/s: ha, hdfs-client
    • Labels:
      None

      Description

      During failover, a client must detect the current active namenode failure and switch over to the new active namenode. The switch over might make use of IP failover or some thing more elaborate such as zookeeper to discover the new active.

      1. hdfs-1973.0.patch
        20 kB
        Aaron T. Myers
      2. HDFS-1973-HDFS-1623.patch
        23 kB
        Aaron T. Myers
      3. HDFS-1973-HDFS-1623.patch
        23 kB
        Aaron T. Myers
      4. HDFS-1973-HDFS-1623.patch
        23 kB
        Aaron T. Myers
      5. HDFS-1973-HDFS-1623.patch
        23 kB
        Aaron T. Myers
      6. HDFS-1973-HDFS-1623.patch
        23 kB
        Aaron T. Myers
      7. HDFS-1973-HDFS-1623.patch
        23 kB
        Aaron T. Myers
      8. HDFS-1973-HDFS-1623.patch
        28 kB
        Aaron T. Myers

        Issue Links

          Activity

          Hide
          atm Aaron T. Myers added a comment -

          If we end up going with a ZK-based solution, this JIRA may be of interest.

          Show
          atm Aaron T. Myers added a comment - If we end up going with a ZK-based solution, this JIRA may be of interest.
          Hide
          harivishnu Hari A V added a comment -

          Hi Aron,

          Can you please elaborate a little bit on your area of interest with ZOOKEEPER-1080?

          If it is for NN HA, i would be very happy to get your comments on the design which i have suggested, so that i can consider them for the patch.
          I have refered to the NameNode+HA_v2_1.pdf uploaded at HDFS-1623 while preparing the design doc (that helped me a lot for my design doc). The usecases are almost matching. I am currently in the process of making the patch ready for submitting.

          • Hari
          Show
          harivishnu Hari A V added a comment - Hi Aron, Can you please elaborate a little bit on your area of interest with ZOOKEEPER-1080 ? If it is for NN HA, i would be very happy to get your comments on the design which i have suggested, so that i can consider them for the patch. I have refered to the NameNode+HA_v2_1.pdf uploaded at HDFS-1623 while preparing the design doc (that helped me a lot for my design doc). The usecases are almost matching. I am currently in the process of making the patch ready for submitting. Hari
          Hide
          atm Aaron T. Myers added a comment -

          Hi Hari,

          Can you please elaborate a little bit on your area of interest with ZOOKEEPER-1080?

          As noted in Sanjay's design doc, one proposal for detecting NN failure would be to use an external ZK service. The HDFS proposal doesn't go into great detail on this, but it suggests using ZK with a heartbeat mechanism to see if the NN is still alive. I personally like the ZK recipe better (i.e. using ephemeral + sequence nodes).

          Another possible use for ZK in the implementation of NN HA would be to use ZK as the source of truth for clients to determine the active NN. This would seem to flow naturally from the part of the ZK recipe which says "Applications may consider creating a separate to znode to acknowledge that the leader has executed the leader procedure." If NN HA were to utilize an implementation of the ZK leader election recipe, then perhaps this "leader-procedure-complete znode" could store the IP or hostname of the active NN which clients could use.

          I haven't read the design doc posted on ZOOKEEPER-1080 yet. I'll go ahead and do that and post my comments there.

          I should also mention that we have not settled upon what strategy we'll take to do NN failure detection or client failover. As noted in Sanjay's design doc, we're also strongly considering using virtual IPs for client failover.

          Show
          atm Aaron T. Myers added a comment - Hi Hari, Can you please elaborate a little bit on your area of interest with ZOOKEEPER-1080 ? As noted in Sanjay's design doc, one proposal for detecting NN failure would be to use an external ZK service. The HDFS proposal doesn't go into great detail on this, but it suggests using ZK with a heartbeat mechanism to see if the NN is still alive. I personally like the ZK recipe better (i.e. using ephemeral + sequence nodes). Another possible use for ZK in the implementation of NN HA would be to use ZK as the source of truth for clients to determine the active NN. This would seem to flow naturally from the part of the ZK recipe which says "Applications may consider creating a separate to znode to acknowledge that the leader has executed the leader procedure." If NN HA were to utilize an implementation of the ZK leader election recipe, then perhaps this "leader-procedure-complete znode" could store the IP or hostname of the active NN which clients could use. I haven't read the design doc posted on ZOOKEEPER-1080 yet. I'll go ahead and do that and post my comments there. I should also mention that we have not settled upon what strategy we'll take to do NN failure detection or client failover. As noted in Sanjay's design doc, we're also strongly considering using virtual IPs for client failover.
          Hide
          harivishnu Hari A V added a comment -

          Hi Aron,

          Thanks for the answer. I will watch these issues to get more information

          -Hari

          Show
          harivishnu Hari A V added a comment - Hi Aron, Thanks for the answer. I will watch these issues to get more information -Hari
          Hide
          atm Aaron T. Myers added a comment -

          Client Failover overview

          On failover between active and standby NNs, it's necessary for clients to be redirected to the new active NN. The goal of HDFS-1623 is to provide a framework for HDFS HA which can in fact support multiple underlying mechanisms. As such, the client failover approach should support multiple options.

          Cases to support

          1. Proxy-based client failover. Clients always communicate with an in-band proxy service which forwards all RPCs on to the correct NN. On failure, a process causes this proxy to begin sending requests to the now-active NN.
          2. Virtual IP-based client failover. Clients always connect to a hostname which resolves to a particular IP address. On failure of the active NN, a process is initiated to switch which NIC will receive packets intended for said IP address to the now-active NN. (From a client's perspective, this case is equivalent to case #1.)
          3. Zookeeper-based client failover. The URI to contact the active NN is stored in Zookeeper or some other highly-available service. Clients look up which NN to talk to by communicating with ZK to discern the currently active NN. On failure, some process causes the address stored in ZK to be changed to point to the now-active NN.
          4. Configuration-based client failover. Clients are configured with a set of NN addresses to try until an operation succeeds. This configuration might exist in client-side configuration files, or perhaps in DNS via a SRV record that lists the NNs with different priorities.

          Assumptions

          This proposal assumes that NN fencing works, and that after a failover any standby NN is either unreachable or will throw a StandbyException on any RPC from a client. That is, a client will not possibly receive incorrect results if it chooses to contact the wrong NN. This proposal also presumes that there is no direct coordination required between any central failover coordinator and clients, i.e. there's an intermediate name resolution system of some sort (ZK, DNS, local configuration, etc.)

          Proposal

          The commit of HADOOP-7380 already introduced a facility whereby an IPC RetryInvocationHandler can utilize a FailoverProxyProvider implementation to perform the appropriate client-side action in the event of failover. At the moment, the only implementation of a FailoverProxyProvider is the DefaultFailoverProxyProvider, which does nothing in the case of failover. HADOOP-7380 also added an @Idempotent annotation which can be used to identify which methods can be safely retried during a failover event.

          What remains, then, is:

          1. To implement FailoverProxyProviders which can support the cases outlined above (and perhaps others).
          2. To provide a mechanism to select which FailoverProxyProvider implementation to use for a given HDFS URI.
          3. To annotate the appropriate HDFS ClientProtocol interface methods with the @Idempotent tag.

          FailoverProxyProvider implementations

          Cases 1 and 2 above can be achieved by implementing a single FailoverProxyProvider which simply retries to reconnect to the previous hostname/IP address on failover. Cases 3 and 4 can be implemented as distinct custom FailoverProxyProviders.

          A mechanism to select the appropriate FailoverProxyProvider implementation

          I propose we add a mechanism to configure a mapping from URI authority -> FailoverProxyProvider implementation. Absolute URIs which previously specified the NN host name will instead contain a logical cluster name (which might be chosen to be identical to one of the NN's host names) which will be used by the chosen FailoverProxyProvider to determine the appropriate host to connect to. Introducing the concept of a cluster name will be a useful abstraction in general if, for example, in the future someone develops a fully-distributed NN, the cluster name still applies.

          On instantiation of a DFSClient (or other user of an HDFS URI, e.g. HFTP), the mapping would be checked to see if there's an entry for the given URI authority. If there is not, then a normal RPC client with connected socket to the given authority will be created as is done today with a DefaultProxyProvider. If there is an entry, then the authority will be treated as a logical cluster name, a FailoverProxyProvider of the correct type will be instantiated (via a factory class), and an RPC client will be created which utilizes this FailoverProxyProvider. The various FailoverProxyProvider implementations are responsible for their own configuration.

          As a straw man example, consider the following configuration:

          <configuration>
            <property>
              <name>fs.defaultFS</name>
              <value>cluster1.foo.com</name>
            </property>
          
            <property>
              <name>dfs.ha.client.failover-method.cluster1.foo.com</name>
              <value>org.apache.hadoop.ha.ZookeeperFailoverProxyProvider</value>
            </property>
          </configuration>
          

          This would cause URIs which begin with hdfs://cluster1.foo.com to use the ZookeeperFailoverProxyProvider. Slash-relative URIs would also default to using this. An absolute URI which, for example, referenced an NN in another cluster (e.g. nn.cluster2.foo.com) which was not HA-enabled would default to using the DefaultFailoverProxyProvider.

          Questions

          1. I believe this scheme will work transparently with viewfs. Instead of configuring the mount table to communicate with a particular NN for a given portion of the name space, one would configure viewfs to use the logical cluster name, which when paired with the configuration from URI authority -> FailoverProxyProvider will cause the appropriate FailoverProxyProvider to be selected and the appropriate NN to be located. I'm no viewfs expert and so would love to hear any thoughts on this.
          2. Are there any desirable client failover mechanisms I'm forgetting about?
          3. I'm sure there are places (which I haven't fully identified yet) where host names are cached client side. Those may need to get changed as well.
          4. Federation already introduced a concept of a "cluster ID", but in Federation this is not intended to be user-facing. Should we combine this notion with the "logical cluster name" I described above?
          Show
          atm Aaron T. Myers added a comment - Client Failover overview On failover between active and standby NNs, it's necessary for clients to be redirected to the new active NN. The goal of HDFS-1623 is to provide a framework for HDFS HA which can in fact support multiple underlying mechanisms. As such, the client failover approach should support multiple options. Cases to support Proxy-based client failover. Clients always communicate with an in-band proxy service which forwards all RPCs on to the correct NN. On failure, a process causes this proxy to begin sending requests to the now-active NN. Virtual IP-based client failover. Clients always connect to a hostname which resolves to a particular IP address. On failure of the active NN, a process is initiated to switch which NIC will receive packets intended for said IP address to the now-active NN. (From a client's perspective, this case is equivalent to case #1.) Zookeeper-based client failover. The URI to contact the active NN is stored in Zookeeper or some other highly-available service. Clients look up which NN to talk to by communicating with ZK to discern the currently active NN. On failure, some process causes the address stored in ZK to be changed to point to the now-active NN. Configuration-based client failover. Clients are configured with a set of NN addresses to try until an operation succeeds. This configuration might exist in client-side configuration files, or perhaps in DNS via a SRV record that lists the NNs with different priorities. Assumptions This proposal assumes that NN fencing works, and that after a failover any standby NN is either unreachable or will throw a StandbyException on any RPC from a client. That is, a client will not possibly receive incorrect results if it chooses to contact the wrong NN. This proposal also presumes that there is no direct coordination required between any central failover coordinator and clients, i.e. there's an intermediate name resolution system of some sort (ZK, DNS, local configuration, etc.) Proposal The commit of HADOOP-7380 already introduced a facility whereby an IPC RetryInvocationHandler can utilize a FailoverProxyProvider implementation to perform the appropriate client-side action in the event of failover. At the moment, the only implementation of a FailoverProxyProvider is the DefaultFailoverProxyProvider , which does nothing in the case of failover. HADOOP-7380 also added an @Idempotent annotation which can be used to identify which methods can be safely retried during a failover event. What remains, then, is: To implement FailoverProxyProviders which can support the cases outlined above (and perhaps others). To provide a mechanism to select which FailoverProxyProvider implementation to use for a given HDFS URI. To annotate the appropriate HDFS ClientProtocol interface methods with the @Idempotent tag. FailoverProxyProvider implementations Cases 1 and 2 above can be achieved by implementing a single FailoverProxyProvider which simply retries to reconnect to the previous hostname/IP address on failover. Cases 3 and 4 can be implemented as distinct custom FailoverProxyProviders . A mechanism to select the appropriate FailoverProxyProvider implementation I propose we add a mechanism to configure a mapping from URI authority -> FailoverProxyProvider implementation. Absolute URIs which previously specified the NN host name will instead contain a logical cluster name (which might be chosen to be identical to one of the NN's host names) which will be used by the chosen FailoverProxyProvider to determine the appropriate host to connect to. Introducing the concept of a cluster name will be a useful abstraction in general if, for example, in the future someone develops a fully-distributed NN, the cluster name still applies. On instantiation of a DFSClient (or other user of an HDFS URI, e.g. HFTP), the mapping would be checked to see if there's an entry for the given URI authority. If there is not, then a normal RPC client with connected socket to the given authority will be created as is done today with a DefaultProxyProvider . If there is an entry, then the authority will be treated as a logical cluster name, a FailoverProxyProvider of the correct type will be instantiated (via a factory class), and an RPC client will be created which utilizes this FailoverProxyProvider . The various FailoverProxyProvider implementations are responsible for their own configuration. As a straw man example, consider the following configuration: <configuration> <property> <name>fs.defaultFS</name> <value>cluster1.foo.com</name> </property> <property> <name>dfs.ha.client.failover-method.cluster1.foo.com</name> <value>org.apache.hadoop.ha.ZookeeperFailoverProxyProvider</value> </property> </configuration> This would cause URIs which begin with hdfs://cluster1.foo.com to use the ZookeeperFailoverProxyProvider . Slash-relative URIs would also default to using this. An absolute URI which, for example, referenced an NN in another cluster (e.g. nn.cluster2.foo.com ) which was not HA-enabled would default to using the DefaultFailoverProxyProvider . Questions I believe this scheme will work transparently with viewfs. Instead of configuring the mount table to communicate with a particular NN for a given portion of the name space, one would configure viewfs to use the logical cluster name, which when paired with the configuration from URI authority -> FailoverProxyProvider will cause the appropriate FailoverProxyProvider to be selected and the appropriate NN to be located. I'm no viewfs expert and so would love to hear any thoughts on this. Are there any desirable client failover mechanisms I'm forgetting about? I'm sure there are places (which I haven't fully identified yet) where host names are cached client side. Those may need to get changed as well. Federation already introduced a concept of a "cluster ID", but in Federation this is not intended to be user-facing. Should we combine this notion with the "logical cluster name" I described above?
          Hide
          atm Aaron T. Myers added a comment -

          Any feedback on the above proposal?

          Sanjay/Suresh - I'd particularly like your feedback since you guys have a lot of viewfs expertise, and some of this is similar/related.

          Show
          atm Aaron T. Myers added a comment - Any feedback on the above proposal? Sanjay/Suresh - I'd particularly like your feedback since you guys have a lot of viewfs expertise, and some of this is similar/related.
          Hide
          sureshms Suresh Srinivas added a comment -

          Sorry for the late comment. I had been traveling.

          Before Cases to support, could we add a section like this:
          >>
          On failover, clients need the address of the new active. This could be done by:

          1. Contacting zookeeper to get the current active NN.
          2. Alternatively client gets the address of both the namenodes. Tries them one at a time until it gets connected to the new active.
          3. For setups using IP failover, clients always use the same VIP/failover address, which moves to active.
            >>
            Given this, I am not sure about the Cases to support:
            Proxy based client failover is an implementation details. It still needs to figure out the new active based on one of the schemes above. I am not very clear on Configuration based support. Do you mean here, client config will be changed to point to the new active? DNS SRV records are also unnecessary given our config would have both the namenode addresses.

          +1 for logical URI. We could consider merging this requirement with HDFS-2231 to do this.

          Logical URI is needed for identifying a nameservice and not cluster, since federation supports multiple namenodes with in a cluster. We could use the concept of nameservice, introduced in federation for that? So URI would be nameservice1.foo.com. nameservices1 maps to nn1, nn2.

          As regards to viewfs, I think this scheme will work for viewfs. The viewfs mounttables will point to the logical URI, which in turn will use the mechanism you are proposing.

          Why should failover method be based on URI cluster part? Can it be a single mechanism across all the nameservices? Hence change the parameter to dfs.client.ha.failover.method?

          These are my early thoughts. Some questions I am left with are:

          1. The scheme you have defined works only for RPC protocols. How about HTTP?
          2. I am not sure why logical URI is required for VIP/failover based setup.

          We could continue to add more details.

          Show
          sureshms Suresh Srinivas added a comment - Sorry for the late comment. I had been traveling. Before Cases to support , could we add a section like this: >> On failover, clients need the address of the new active. This could be done by: Contacting zookeeper to get the current active NN. Alternatively client gets the address of both the namenodes. Tries them one at a time until it gets connected to the new active. For setups using IP failover, clients always use the same VIP/failover address, which moves to active. >> Given this, I am not sure about the Cases to support : Proxy based client failover is an implementation details. It still needs to figure out the new active based on one of the schemes above. I am not very clear on Configuration based support. Do you mean here, client config will be changed to point to the new active? DNS SRV records are also unnecessary given our config would have both the namenode addresses. +1 for logical URI. We could consider merging this requirement with HDFS-2231 to do this. Logical URI is needed for identifying a nameservice and not cluster, since federation supports multiple namenodes with in a cluster. We could use the concept of nameservice, introduced in federation for that? So URI would be nameservice1.foo.com. nameservices1 maps to nn1, nn2. As regards to viewfs, I think this scheme will work for viewfs. The viewfs mounttables will point to the logical URI, which in turn will use the mechanism you are proposing. Why should failover method be based on URI cluster part? Can it be a single mechanism across all the nameservices? Hence change the parameter to dfs.client.ha.failover.method? These are my early thoughts. Some questions I am left with are: The scheme you have defined works only for RPC protocols. How about HTTP? I am not sure why logical URI is required for VIP/failover based setup. We could continue to add more details.
          Hide
          atm Aaron T. Myers added a comment -

          Alternatively client gets the address of both the namenodes. Tries them one at a time until it gets connected to the new active.

          This is what I was trying to communicate with "Configuration-based client failover. Clients are configured with a set of NN addresses to try until an operation succeeds." I think we're in agreement on this point, just talking past each other a little bit.

          Proxy based client failover is an implementation details. It still needs to figure out the new active based on one of the schemes above.

          The proxy process would need to figure out the address of the new active, but clients wouldn't - the clients would just have the address of the proxy. The only thing for the client to do, then, would be to retry the RPC to the same address (the address of the proxy.)

          +1 for logical URI. We could consider merging this requirement with HDFS-2231 to do this.

          Good point. I'll comment there.

          Logical URI is needed for identifying a nameservice and not cluster, since federation supports multiple namenodes with in a cluster.

          Good point. In the above design document: s/cluster/nameservice/g.

          Why should failover method be based on URI cluster part? Can it be a single mechanism across all the nameservices? Hence change the parameter to dfs.client.ha.failover.method?

          Imagine that one writes a program which uses absolute URIs to connect to two distinct clusters, one of which is HA-enabled using ZK to resolve the address, and the other is not. In this case we should use some ZK-based FailoverProxyProvider for the first, and just the normal RPC connection for the second. Thus, the configuration should be per-nameservice. I suppose we could do something like introduce dfs.client.ha.failover.method.<nameservice identifier>, but that seems more annoying to configure to me.

          The scheme you have defined works only for RPC protocols. How about HTTP?

          Yes, that's certainly true. My thinking there was that since it's generally less critical for the NN web interfaces to immediately fail over, and since we don't generally control the HTTP clients which access the NN web interface, that this be out of scope for this JIRA. To facilitate this, the operator could either run a standard HTTP proxy, use round-robin DNS, or even change DNS resolution of the NN and wait for clients to get the update address.

          I am not sure why logical URI is required for VIP/failover based setup.

          The value of the logical URI could be the same as the actual URI of the proxy. It would then only be used to configure an appropriate FailoverProxyProvider which would retry failed RPCs to the same address.

          Show
          atm Aaron T. Myers added a comment - Alternatively client gets the address of both the namenodes. Tries them one at a time until it gets connected to the new active. This is what I was trying to communicate with "Configuration-based client failover. Clients are configured with a set of NN addresses to try until an operation succeeds." I think we're in agreement on this point, just talking past each other a little bit. Proxy based client failover is an implementation details. It still needs to figure out the new active based on one of the schemes above. The proxy process would need to figure out the address of the new active, but clients wouldn't - the clients would just have the address of the proxy. The only thing for the client to do, then, would be to retry the RPC to the same address (the address of the proxy.) +1 for logical URI. We could consider merging this requirement with HDFS-2231 to do this. Good point. I'll comment there. Logical URI is needed for identifying a nameservice and not cluster, since federation supports multiple namenodes with in a cluster. Good point. In the above design document: s/cluster/nameservice/g. Why should failover method be based on URI cluster part? Can it be a single mechanism across all the nameservices? Hence change the parameter to dfs.client.ha.failover.method? Imagine that one writes a program which uses absolute URIs to connect to two distinct clusters, one of which is HA-enabled using ZK to resolve the address, and the other is not. In this case we should use some ZK-based FailoverProxyProvider for the first, and just the normal RPC connection for the second. Thus, the configuration should be per-nameservice. I suppose we could do something like introduce dfs.client.ha.failover.method.<nameservice identifier> , but that seems more annoying to configure to me. The scheme you have defined works only for RPC protocols. How about HTTP? Yes, that's certainly true. My thinking there was that since it's generally less critical for the NN web interfaces to immediately fail over, and since we don't generally control the HTTP clients which access the NN web interface, that this be out of scope for this JIRA. To facilitate this, the operator could either run a standard HTTP proxy, use round-robin DNS, or even change DNS resolution of the NN and wait for clients to get the update address. I am not sure why logical URI is required for VIP/failover based setup. The value of the logical URI could be the same as the actual URI of the proxy. It would then only be used to configure an appropriate FailoverProxyProvider which would retry failed RPCs to the same address.
          Hide
          eli Eli Collins added a comment -

          @atm - mind filing a separate jira for HTTP fail-over? In some cases we may be able to define away the problem. Eg now that 1073 is in we could modify the 2NN to not have to fetch files via http (at least when using shared storage).

          Show
          eli Eli Collins added a comment - @atm - mind filing a separate jira for HTTP fail-over? In some cases we may be able to define away the problem. Eg now that 1073 is in we could modify the 2NN to not have to fetch files via http (at least when using shared storage).
          Hide
          atm Aaron T. Myers added a comment -

          @Eli, sure I'll file a separate JIRA. It'd certainly be worth enumerating all of the places where HTTP fail-over is an issue.

          The example you provided is an interesting one. It seems you're assuming that an HA setup would have three nodes - active, standby, and 2NN, with the 2NN failing over to do checkpointing against the standby after a failure of the active. The design document in HDFS-1623 doesn't really address checkpointing. I've heard from Suresh and Todd informally that the intention is probably to make the standby node also capable of performing checkpointing. I'll file a separate JIRA to address this as well.

          Show
          atm Aaron T. Myers added a comment - @Eli, sure I'll file a separate JIRA. It'd certainly be worth enumerating all of the places where HTTP fail-over is an issue. The example you provided is an interesting one. It seems you're assuming that an HA setup would have three nodes - active, standby, and 2NN, with the 2NN failing over to do checkpointing against the standby after a failure of the active. The design document in HDFS-1623 doesn't really address checkpointing. I've heard from Suresh and Todd informally that the intention is probably to make the standby node also capable of performing checkpointing. I'll file a separate JIRA to address this as well.
          Hide
          atm Aaron T. Myers added a comment -

          Here's a preliminary patch (not intended for commit) to give people an idea how this will work.

          The main things this is missing before it could reasonably be committed are:

          1. It currently doesn't handle clean-up of fail-over client resources at all. The way RPC resource cleanup currently works is by looking up the appropriate RPCEngine given a protocol class, and leaving it up to that class's InvocationHandler. This implicitly assumes that there is a one-to-one mapping from protocol class -> invocation handler, which is no longer true. It's not obvious to me at the moment what's the best way to deal with this.
          2. Currently only one of the ClientProtocol methods is annotated with the @Idempotent annotation.
          3. It currently doesn't handle concurrent connections at all.
          Show
          atm Aaron T. Myers added a comment - Here's a preliminary patch (not intended for commit) to give people an idea how this will work. The main things this is missing before it could reasonably be committed are: It currently doesn't handle clean-up of fail-over client resources at all. The way RPC resource cleanup currently works is by looking up the appropriate RPCEngine given a protocol class, and leaving it up to that class's InvocationHandler. This implicitly assumes that there is a one-to-one mapping from protocol class -> invocation handler, which is no longer true. It's not obvious to me at the moment what's the best way to deal with this. Currently only one of the ClientProtocol methods is annotated with the @Idempotent annotation. It currently doesn't handle concurrent connections at all.
          Hide
          atm Aaron T. Myers added a comment -

          Here's an updated patch which builds upon the previous patch and addresses all of the issues I mentioned above.

          The key differences between this patch and the previous:

          1. Take advantage of the changes in HADOOP-7635 and HDFS-2337 to enable cleanup of RPC connection resources.
          2. Add the @Idempotent annotation to more methods of ClientProtocol.
          3. Add some appropriate synchronization to ConfiguredFailoverProxyProvider.

          Please review this patch for potential inclusion in the HDFS-1623 branch.

          Show
          atm Aaron T. Myers added a comment - Here's an updated patch which builds upon the previous patch and addresses all of the issues I mentioned above. The key differences between this patch and the previous: Take advantage of the changes in HADOOP-7635 and HDFS-2337 to enable cleanup of RPC connection resources. Add the @Idempotent annotation to more methods of ClientProtocol . Add some appropriate synchronization to ConfiguredFailoverProxyProvider . Please review this patch for potential inclusion in the HDFS-1623 branch.
          Hide
          tlipcon Todd Lipcon added a comment -
          • not sure if removing the DFSClient(Configuration conf) constructor is a good idea - it was deprecated at 0.21 but since we haven't had a well-adopted release since then, seems we should keep it around. (it's trivial to keep, right?)
          • some log messages left in: LOG.info("address of nn1: " + conf.get("dfs.atm.nn1"))
          • some of the lines seem to have a lot of columns - try to wrap at 80?

          -  DFSClient(InetSocketAddress nameNodeAddr, ClientProtocol rpcNamenode,
          +  DFSClient(URI nameNodeAddr, ClientProtocol rpcNamenode,
          

          maybe rename this parameter to nameNodeUri?


          +  public static final String  DFS_CLIENT_FAILOVER_PROXY_PROVIDER_KEY_PREFIX = "dfs.client.failover.proxy.provider";
          

          Shouldn't that have a "." at the end? It seems authorities are appended without a "." in the middle.


          +    } catch (RuntimeException e) {
          +      if (e.getCause() instanceof ClassNotFoundException) {
          +        return null;
          

          why? It seems we should at least be logging a WARN if this is configured to point to a class that can't be loaded, if not re-throwing as IOE.


          • I'm not sure that o.a.h.h.protocol is the right package for ConfiguredFailoverProxyProvider - maybe it belongs in the HA package?
          • missing license headers on many of the new files

          +      InetSocketAddress first = NameNode.getAddress(
          +          new URI(conf.get(CONFIGURED_NAMENODE_ADDRESS_FIRST)).getAuthority());
          +      InetSocketAddress second = NameNode.getAddress(
          +          new URI(conf.get(CONFIGURED_NAMENODE_ADDRESS_SECOND)).getAuthority());
          

          these will throw NPE in the case of misconfiguration. Should at least throw an exception indicating what the mistaken config is.

          Also, it seems more sensible to have a single config like "dfs.ha.namenode.addresses" with comma-separated URIs, since there's nothing explicitly tied to having exactly 2 NNs here, right?

          A general note: this failover proxy provider doesn't get passed enough info to choose different configs based on the "logical URI". For example, if I have two clusters, foo and bar I'd expect to be able to configure dfs.ha.namenodes.foo separately from dfs.ha.namenodes.bar after setting both of dfs.client.failover.proxy.provider.foo|bar.

          • Can we separate the addition of Idempotent annotations to a different patch?

          +  // This test should probably be made to fail if a client fails over to talk to
          +  // an NN with a different block pool id.
          

          file a followup JIRA or use "TODO" so we don't forget about this case?

          Show
          tlipcon Todd Lipcon added a comment - not sure if removing the DFSClient(Configuration conf) constructor is a good idea - it was deprecated at 0.21 but since we haven't had a well-adopted release since then, seems we should keep it around. (it's trivial to keep, right?) some log messages left in: LOG.info("address of nn1: " + conf.get("dfs.atm.nn1")) some of the lines seem to have a lot of columns - try to wrap at 80? - DFSClient(InetSocketAddress nameNodeAddr, ClientProtocol rpcNamenode, + DFSClient(URI nameNodeAddr, ClientProtocol rpcNamenode, maybe rename this parameter to nameNodeUri ? + public static final String DFS_CLIENT_FAILOVER_PROXY_PROVIDER_KEY_PREFIX = "dfs.client.failover.proxy.provider" ; Shouldn't that have a "." at the end? It seems authorities are appended without a "." in the middle. + } catch (RuntimeException e) { + if (e.getCause() instanceof ClassNotFoundException) { + return null ; why? It seems we should at least be logging a WARN if this is configured to point to a class that can't be loaded, if not re-throwing as IOE. I'm not sure that o.a.h.h.protocol is the right package for ConfiguredFailoverProxyProvider - maybe it belongs in the HA package? missing license headers on many of the new files + InetSocketAddress first = NameNode.getAddress( + new URI(conf.get(CONFIGURED_NAMENODE_ADDRESS_FIRST)).getAuthority()); + InetSocketAddress second = NameNode.getAddress( + new URI(conf.get(CONFIGURED_NAMENODE_ADDRESS_SECOND)).getAuthority()); these will throw NPE in the case of misconfiguration. Should at least throw an exception indicating what the mistaken config is. Also, it seems more sensible to have a single config like "dfs.ha.namenode.addresses" with comma-separated URIs, since there's nothing explicitly tied to having exactly 2 NNs here, right? A general note: this failover proxy provider doesn't get passed enough info to choose different configs based on the "logical URI". For example, if I have two clusters, foo and bar I'd expect to be able to configure dfs.ha.namenodes.foo separately from dfs.ha.namenodes.bar after setting both of dfs.client.failover.proxy.provider.foo|bar . Can we separate the addition of Idempotent annotations to a different patch? + // This test should probably be made to fail if a client fails over to talk to + // an NN with a different block pool id. file a followup JIRA or use "TODO" so we don't forget about this case?
          Hide
          atm Aaron T. Myers added a comment -

          Thanks a lot for the review, Todd. I'll attach an updated patch addressing most of your comments in just a moment. Comments below.

          not sure if removing the DFSClient(Configuration conf) constructor is a good idea - it was deprecated at 0.21 but since we haven't had a well-adopted release since then, seems we should keep it around. (it's trivial to keep, right?)

          Sure, but the whole of DFSClient is annotated @InterfaceAudience.Private. You still think we should keep it around?

          some log messages left in: LOG.info("address of nn1: " + conf.get("dfs.atm.nn1"))

          Whoops! Fixed. Any more that you noticed? Or just that one?

          some of the lines seem to have a lot of columns - try to wrap at 80?

          Like which? I wasn't super strict about wrapping at 80, but I'm pretty sure none go over 90.

          maybe rename this parameter to nameNodeUri?

          Done.

          Shouldn't that have a "." at the end? It seems authorities are appended without a "." in the middle.

          Yep, great catch. Fixed.

          why? It seems we should at least be logging a WARN if this is configured to point to a class that can't be loaded, if not re-throwing as IOE.

          Good point. I've changed it to throw an IOE with a descriptive error message.

          I'm not sure that o.a.h.h.protocol is the right package for ConfiguredFailoverProxyProvider - maybe it belongs in the HA package?

          Good point. I've moved it to o.a.h.server.namenode.ha.

          missing license headers on many of the new files

          Fixed. It's just two new files.

          these will throw NPE in the case of misconfiguration. Should at least throw an exception indicating what the mistaken config is.

          Fixed.

          Also, it seems more sensible to have a single config like "dfs.ha.namenode.addresses" with comma-separated URIs, since there's nothing explicitly tied to having exactly 2 NNs here, right?

          Good point. Fixed.

          A general note: this failover proxy provider doesn't get passed enough info to choose different configs based on the "logical URI". For example, if I have two clusters, foo and bar I'd expect to be able to configure dfs.ha.namenodes.foo separately from dfs.ha.namenodes.bar after setting both of dfs.client.failover.proxy.provider.foo|bar.

          Good point. Mind if we address this in a follow-up JIRA? It seems like the solution would be to replace "dfs.ha.namenode.addresses" with something like "dfs.ha.namenode.addresses.<logical-uri>", so as to be able to support configuring the addresses of multiple HA clusters.

          Can we separate the addition of Idempotent annotations to a different patch?

          Certainly. I've removed all of them from this patch except for getBlockLocations, which is necessary for the test to work.

          file a followup JIRA or use "TODO" so we don't forget about this case?

          Changed to:

          // TODO(HA): This test should probably be made to fail if a client fails over
          // to talk to an NN with a different block pool id. Once failover between
          // active/standy in a single block pool is implemented, this test should be
          // changed to exercise that.
          
          Show
          atm Aaron T. Myers added a comment - Thanks a lot for the review, Todd. I'll attach an updated patch addressing most of your comments in just a moment. Comments below. not sure if removing the DFSClient(Configuration conf) constructor is a good idea - it was deprecated at 0.21 but since we haven't had a well-adopted release since then, seems we should keep it around. (it's trivial to keep, right?) Sure, but the whole of DFSClient is annotated @InterfaceAudience.Private . You still think we should keep it around? some log messages left in: LOG.info("address of nn1: " + conf.get("dfs.atm.nn1")) Whoops! Fixed. Any more that you noticed? Or just that one? some of the lines seem to have a lot of columns - try to wrap at 80? Like which? I wasn't super strict about wrapping at 80, but I'm pretty sure none go over 90. maybe rename this parameter to nameNodeUri? Done. Shouldn't that have a "." at the end? It seems authorities are appended without a "." in the middle. Yep, great catch. Fixed. why? It seems we should at least be logging a WARN if this is configured to point to a class that can't be loaded, if not re-throwing as IOE. Good point. I've changed it to throw an IOE with a descriptive error message. I'm not sure that o.a.h.h.protocol is the right package for ConfiguredFailoverProxyProvider - maybe it belongs in the HA package? Good point. I've moved it to o.a.h.server.namenode.ha. missing license headers on many of the new files Fixed. It's just two new files. these will throw NPE in the case of misconfiguration. Should at least throw an exception indicating what the mistaken config is. Fixed. Also, it seems more sensible to have a single config like "dfs.ha.namenode.addresses" with comma-separated URIs, since there's nothing explicitly tied to having exactly 2 NNs here, right? Good point. Fixed. A general note: this failover proxy provider doesn't get passed enough info to choose different configs based on the "logical URI". For example, if I have two clusters, foo and bar I'd expect to be able to configure dfs.ha.namenodes.foo separately from dfs.ha.namenodes.bar after setting both of dfs.client.failover.proxy.provider.foo|bar. Good point. Mind if we address this in a follow-up JIRA? It seems like the solution would be to replace "dfs.ha.namenode.addresses" with something like "dfs.ha.namenode.addresses.<logical-uri>", so as to be able to support configuring the addresses of multiple HA clusters. Can we separate the addition of Idempotent annotations to a different patch? Certainly. I've removed all of them from this patch except for getBlockLocations , which is necessary for the test to work. file a followup JIRA or use "TODO" so we don't forget about this case? Changed to: // TODO(HA): This test should probably be made to fail if a client fails over // to talk to an NN with a different block pool id. Once failover between // active/standy in a single block pool is implemented, this test should be // changed to exercise that.
          Hide
          atm Aaron T. Myers added a comment -

          A general note: this failover proxy provider doesn't get passed enough info to choose different configs based on the "logical URI". For example, if I have two clusters, foo and bar I'd expect to be able to configure dfs.ha.namenodes.foo separately from dfs.ha.namenodes.bar after setting both of dfs.client.failover.proxy.provider.foo|bar.

          I've filed HDFS-2367 to address this issue.

          Show
          atm Aaron T. Myers added a comment - A general note: this failover proxy provider doesn't get passed enough info to choose different configs based on the "logical URI". For example, if I have two clusters, foo and bar I'd expect to be able to configure dfs.ha.namenodes.foo separately from dfs.ha.namenodes.bar after setting both of dfs.client.failover.proxy.provider.foo|bar. I've filed HDFS-2367 to address this issue.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12496201/HDFS-1973-HDFS-1623.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hdfs.TestDfsOverAvroRpc
          org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1291//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1291//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1291//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12496201/HDFS-1973-HDFS-1623.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDfsOverAvroRpc org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1291//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1291//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1291//console This message is automatically generated.
          Hide
          tlipcon Todd Lipcon added a comment -

          Sure, but the whole of DFSClient is annotated @InterfaceAudience.Private. You still think we should keep it around?

          Yes, because unfortunately the Private annotation was added after the 0.20 release. If it's problematic to keep around, we don't have to, but it seemed easy enough to maintain for now.


          +  @Override
          +  public synchronized void performFailover(Object currentProxy) {
          +    if (proxies.get(currentProxyIndex) != currentProxy) {
          +      currentProxyIndex = (currentProxyIndex + 1) % proxies.size();
          +    }
          +  }
          

          This code confuses me – isn't currentProxy a proxy object, whereas proxies.get(...) is an AddressRpcProxyPair? Which is to say they're always un-equal?

          Show
          tlipcon Todd Lipcon added a comment - Sure, but the whole of DFSClient is annotated @InterfaceAudience.Private. You still think we should keep it around? Yes, because unfortunately the Private annotation was added after the 0.20 release. If it's problematic to keep around, we don't have to, but it seemed easy enough to maintain for now. + @Override + public synchronized void performFailover( Object currentProxy) { + if (proxies.get(currentProxyIndex) != currentProxy) { + currentProxyIndex = (currentProxyIndex + 1) % proxies.size(); + } + } This code confuses me – isn't currentProxy a proxy object, whereas proxies.get(...) is an AddressRpcProxyPair ? Which is to say they're always un-equal?
          Hide
          atm Aaron T. Myers added a comment -

          Thanks a lot for the review, Todd. Here's a patch which addresses your latest comments.

          Yes, because unfortunately the Private annotation was added after the 0.20 release. If it's problematic to keep around, we don't have to, but it seemed easy enough to maintain for now.

          That's fair. I've put it back.

          This code confuses me – isn't currentProxy a proxy object, whereas proxies.get(...) is an AddressRpcProxyPair? Which is to say they're always un-equal?

          Yep, good catch. Fixed.

          Show
          atm Aaron T. Myers added a comment - Thanks a lot for the review, Todd. Here's a patch which addresses your latest comments. Yes, because unfortunately the Private annotation was added after the 0.20 release. If it's problematic to keep around, we don't have to, but it seemed easy enough to maintain for now. That's fair. I've put it back. This code confuses me – isn't currentProxy a proxy object, whereas proxies.get(...) is an AddressRpcProxyPair? Which is to say they're always un-equal? Yep, good catch. Fixed.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12497230/HDFS-1973-HDFS-1623.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1324//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1324//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1324//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12497230/HDFS-1973-HDFS-1623.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1324//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1324//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1324//console This message is automatically generated.
          Hide
          tlipcon Todd Lipcon added a comment -

          Aaron and I just chatted about this a bit. Here's a summary of what we discussed:

          • the if condition in performFailover was somewhat confusing to me as to its purpose. Aaron explained that its purpose is to avoid the case where multiple outstanding RPC calls fail, and then they all call performFailover at the same time. If there were an even number of such calls, and you didn't do any such checks for "already failed over", then you'd have a case where you failover twice and end up back at the original proxy object.
          • we decided that, rather than try to handle this situation in the FailoverProvider itself, it would be better to do this at the caller. Otherwise, each failover provider implementation will have to have this same concern.

          So, Aaron is going to update the patch to include a safeguard at the call site of performFailver which checks that, before calling performFailover, another thread hasn't already failed over to a new proxy object.

          Show
          tlipcon Todd Lipcon added a comment - Aaron and I just chatted about this a bit. Here's a summary of what we discussed: the if condition in performFailover was somewhat confusing to me as to its purpose. Aaron explained that its purpose is to avoid the case where multiple outstanding RPC calls fail, and then they all call performFailover at the same time. If there were an even number of such calls, and you didn't do any such checks for "already failed over", then you'd have a case where you failover twice and end up back at the original proxy object. we decided that, rather than try to handle this situation in the FailoverProvider itself, it would be better to do this at the caller. Otherwise, each failover provider implementation will have to have this same concern. So, Aaron is going to update the patch to include a safeguard at the call site of performFailver which checks that, before calling performFailover, another thread hasn't already failed over to a new proxy object.
          Hide
          atm Aaron T. Myers added a comment -

          Thanks a lot for the review, Todd. Doing what you describe will require Common changes. I've filed HADOOP-7717 to address these. Once that's committed, I'll update the patch here to remove the synchronization from ConfiguredFailoverProxyProvider.

          Show
          atm Aaron T. Myers added a comment - Thanks a lot for the review, Todd. Doing what you describe will require Common changes. I've filed HADOOP-7717 to address these. Once that's committed, I'll update the patch here to remove the synchronization from ConfiguredFailoverProxyProvider .
          Hide
          atm Aaron T. Myers added a comment -

          Here's an updated patch which takes advantage of HADOOP-7717 by no longer needing to check for concurrent fail-overs in the event of concurrent method invocation failures.

          Show
          atm Aaron T. Myers added a comment - Here's an updated patch which takes advantage of HADOOP-7717 by no longer needing to check for concurrent fail-overs in the event of concurrent method invocation failures.
          Hide
          tlipcon Todd Lipcon added a comment -

          One small thing I didn't notice before (sorry):

          • in the test cases, you have new methods verifyFileContents and writeContentsToFile. These are very similar to AppendTestUtil.write and AppndTestUtil.check. Can you use those instead? If not, you should fix the call to in.read to use IOUtils.readFully since read isn't guaranteed to read the whole length.
          Show
          tlipcon Todd Lipcon added a comment - One small thing I didn't notice before (sorry): in the test cases, you have new methods verifyFileContents and writeContentsToFile. These are very similar to AppendTestUtil.write and AppndTestUtil.check. Can you use those instead? If not, you should fix the call to in.read to use IOUtils.readFully since read isn't guaranteed to read the whole length.
          Hide
          atm Aaron T. Myers added a comment -

          Here's an updated patch which reuses AppendTestUtil.(write|check). Note that I also had to add the @Idempotent annotation to ClientProtocol.getFileInfo since AppendTestUtil.check calls FileSystem.getFileStatus.

          Show
          atm Aaron T. Myers added a comment - Here's an updated patch which reuses AppendTestUtil.(write|check) . Note that I also had to add the @Idempotent annotation to ClientProtocol.getFileInfo since AppendTestUtil.check calls FileSystem.getFileStatus .
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12497942/HDFS-1973-HDFS-1623.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1341//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1341//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1341//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12497942/HDFS-1973-HDFS-1623.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1341//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1341//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1341//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12497935/HDFS-1973-HDFS-1623.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1340//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1340//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1340//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12497935/HDFS-1973-HDFS-1623.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1340//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1340//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1340//console This message is automatically generated.
          Hide
          atm Aaron T. Myers added a comment -

          Trivial change from the last patch to remove the findbugs warning.

          Show
          atm Aaron T. Myers added a comment - Trivial change from the last patch to remove the findbugs warning.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12497950/HDFS-1973-HDFS-1623.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1342//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1342//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1342//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12497950/HDFS-1973-HDFS-1623.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1342//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1342//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1342//console This message is automatically generated.
          Hide
          atm Aaron T. Myers added a comment -

          Trivial change from the last patch to get rid of the new findbugs warning.

          Third time's a charm?

          Show
          atm Aaron T. Myers added a comment - Trivial change from the last patch to get rid of the new findbugs warning. Third time's a charm?
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12498050/HDFS-1973-HDFS-1623.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1346//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1346//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12498050/HDFS-1973-HDFS-1623.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1346//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1346//console This message is automatically generated.
          Hide
          tlipcon Todd Lipcon added a comment -

          +1 lgtm

          Show
          tlipcon Todd Lipcon added a comment - +1 lgtm
          Hide
          atm Aaron T. Myers added a comment -

          Thanks a lot for the several reviews, Todd. I've just committed this to the HA branch.

          Show
          atm Aaron T. Myers added a comment - Thanks a lot for the several reviews, Todd. I've just committed this to the HA branch.

            People

            • Assignee:
              atm Aaron T. Myers
              Reporter:
              sureshms Suresh Srinivas
            • Votes:
              0 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development