Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9001

DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0, 2.7.0, 2.7.1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      If defaultFS differs from rpc-address, then DFSUtil.getNsServiceRpcUris() will return two entries: one for the [service] RPC address and one for the default FS. This behavior violates the expected behavior stated in the JavaDoc header.

      1. HDFS-9001.001.patch
        10 kB
        Daniel Templeton

        Activity

        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #437 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/437/)
        HDFS-9001. DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #437 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/437/ ) HDFS-9001 . DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2377 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2377/)
        HDFS-9001. DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2377 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2377/ ) HDFS-9001 . DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2405 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2405/)
        HDFS-9001. DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93)

        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2405 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2405/ ) HDFS-9001 . DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #470 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/470/)
        HDFS-9001. DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #470 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/470/ ) HDFS-9001 . DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #1201 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1201/)
        HDFS-9001. DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1201 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1201/ ) HDFS-9001 . DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #462 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/462/)
        HDFS-9001. DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93)

        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #462 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/462/ ) HDFS-9001 . DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8545 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8545/)
        HDFS-9001. DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93)

        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8545 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8545/ ) HDFS-9001 . DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster. Contributed by Daniel Templeton. (atm: rev 071733dc69a6f83c0cdca046b31ffd4f13304e93) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        atm Aaron T. Myers added a comment -

        I've just committed this to trunk and branch-2.

        Thanks a lot for the contribution, Daniel.

        Show
        atm Aaron T. Myers added a comment - I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Daniel.
        Hide
        atm Aaron T. Myers added a comment -

        +1, the patch looks good to me. I ran the failed tests locally both with and without the patch and the pass just fine.

        I'm going to commit this momentarily.

        Show
        atm Aaron T. Myers added a comment - +1, the patch looks good to me. I ran the failed tests locally both with and without the patch and the pass just fine. I'm going to commit this momentarily.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 18m 58s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
        +1 javac 7m 51s There were no new javac warning messages.
        +1 javadoc 10m 16s There were no new javadoc warning messages.
        +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings.
        +1 checkstyle 1m 23s There were no new checkstyle issues.
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 30s mvn install still works.
        +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
        +1 findbugs 2m 32s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 native 3m 18s Pre-build of native portion
        -1 hdfs tests 136m 46s Tests failed in hadoop-hdfs.
            183m 39s  



        Reason Tests
        Failed unit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory
          hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement
        Timed out tests org.apache.hadoop.hdfs.server.namenode.TestHostsFiles



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12756086/HDFS-9001.001.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / 8c1cdb1
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12458/artifact/patchprocess/testrun_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12458/testReport/
        Java 1.7.0_55
        uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12458/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 58s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 51s There were no new javac warning messages. +1 javadoc 10m 16s There were no new javadoc warning messages. +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 1m 23s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 30s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 2m 32s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 18s Pre-build of native portion -1 hdfs tests 136m 46s Tests failed in hadoop-hdfs.     183m 39s   Reason Tests Failed unit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory   hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement Timed out tests org.apache.hadoop.hdfs.server.namenode.TestHostsFiles Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12756086/HDFS-9001.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 8c1cdb1 hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12458/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12458/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12458/console This message was automatically generated.
        Hide
        templedf Daniel Templeton added a comment -

        Cool. Here's a patch.

        Show
        templedf Daniel Templeton added a comment - Cool. Here's a patch.
        Hide
        atm Aaron T. Myers added a comment -

        Hey Daniel,

        I think this makes sense, thanks very much for the lucid explanation. I think the really correct thing to do (ignoring backward compatibility concerns) would be to make fs.defaultFS a client-side config only. It's only for historical reasons that it's not, in that I believe in the NN we fall back to having the NN bind to that as its RPC address if no rpc-address address is actually configured. We can only change that behavior in a major version, though, so I don't think we should do that here.

        I think the way to go is to fix this JIRA as you propose, and then file a separate, backward-incompatible JIRA to change the NN (and all admin commands) to no longer look at fs.defaultFS at all. We'd then only commit that to trunk (and not branch-2) so it'll show up in Hadoop 3.0.

        Does that make sense?

        Show
        atm Aaron T. Myers added a comment - Hey Daniel, I think this makes sense, thanks very much for the lucid explanation. I think the really correct thing to do (ignoring backward compatibility concerns) would be to make fs.defaultFS a client-side config only. It's only for historical reasons that it's not, in that I believe in the NN we fall back to having the NN bind to that as its RPC address if no rpc-address address is actually configured. We can only change that behavior in a major version, though, so I don't think we should do that here. I think the way to go is to fix this JIRA as you propose, and then file a separate, backward-incompatible JIRA to change the NN (and all admin commands) to no longer look at fs.defaultFS at all. We'd then only commit that to trunk (and not branch-2) so it'll show up in Hadoop 3.0. Does that make sense?
        Hide
        templedf Daniel Templeton added a comment -

        I've stared at this thing for quite a while now, and I think I've finally found a way to look at it that makes sense. The way I'm looking at the NN configuration is that We can configure 1 or more name services. The first name service is an implicit name server defined by the servicerpc-address and rpc-address. All additional name services are explicitly named name services configured through dfs.nameservices et al. defaultFS must either point to a service that is already defined (i.e. rpc-address), or it must define the client service for the implicit name service.

        For each name service, the getNsServiceRpcUris() method should return exactly one URI. If I have a configuration with:

        dfs.nameservices = ns1
        dfs.namenode.rpc-address.ns1 = nn1addr:p1
        dfs.namenode.servicerpc-address = nn2addr:p2
        fs.defaultFS = nn2addr:p3
        

        then I have two distinct name services: ns1 and the implicit name service. The getNsServiceRpcUris() method should return nn1addr:p1 and nn2addr:p2, but not nn2addr:p3, because that's just another interface to the implicit name service.

        Another example:

        dfs.nameservices = ns1
        dfs.namenode.servicerpc-address.ns1 = nn1addr:p1
        fs.defaultFS = nn1addr:p2
        

        Again, we treat these as two separate name services and report nn1addr:p1 and nn1addr:p2. If defaultFS had pointed to nn1addr:p1, then we would only report that address once, because both instances are clearly the same service.

        Finally, we get to the one where this approach diverges from prior art:

        dfs.nameservices = ns1,ns2
        dfs.ha.namenodes.ns1 = nn1,nn2
        dfs.namenode.rpc-address.ns1.nn1 = nn1addr:p1
        dfs.namenode.rpc-address.ns1.nn2 = nn2addr:p1
        dfs.namenode.servicerpc-address.ns2 = nn3addr:p1
        dfs.namenode.rpc-address = nn1addr:p2
        fs.defaultFS = nn2addr:p2
        

        This example comes from TestDFSUtil. In that test class, the expectation is that we will return four URIs: ns1, nn3addr:p1, nn1addr:p2, nn2addr:p2. Using my definition, the getNsServiceRpcUris() method would return three URIs: ns1, nn3addr:p1, nn1addr:p2. The defaultFS would be ignored because a service URI is already defined for the implicit name service.

        Since I'm proposing a change to some basic configuration semantics, I'd like some external validation. The getNsServiceRpcUris() method is only used by the mover and the balancer, so I might be overthinking this a little. Nonetheless, I think it's important to do it right and then document it thoroughly so that no one else has to try to reconstruct the intention again later.

        Show
        templedf Daniel Templeton added a comment - I've stared at this thing for quite a while now, and I think I've finally found a way to look at it that makes sense. The way I'm looking at the NN configuration is that We can configure 1 or more name services. The first name service is an implicit name server defined by the servicerpc-address and rpc-address. All additional name services are explicitly named name services configured through dfs.nameservices et al. defaultFS must either point to a service that is already defined (i.e. rpc-address), or it must define the client service for the implicit name service. For each name service, the getNsServiceRpcUris() method should return exactly one URI. If I have a configuration with: dfs.nameservices = ns1 dfs.namenode.rpc-address.ns1 = nn1addr:p1 dfs.namenode.servicerpc-address = nn2addr:p2 fs.defaultFS = nn2addr:p3 then I have two distinct name services: ns1 and the implicit name service. The getNsServiceRpcUris() method should return nn1addr:p1 and nn2addr:p2, but not nn2addr:p3, because that's just another interface to the implicit name service. Another example: dfs.nameservices = ns1 dfs.namenode.servicerpc-address.ns1 = nn1addr:p1 fs.defaultFS = nn1addr:p2 Again, we treat these as two separate name services and report nn1addr:p1 and nn1addr:p2. If defaultFS had pointed to nn1addr:p1, then we would only report that address once, because both instances are clearly the same service. Finally, we get to the one where this approach diverges from prior art: dfs.nameservices = ns1,ns2 dfs.ha.namenodes.ns1 = nn1,nn2 dfs.namenode.rpc-address.ns1.nn1 = nn1addr:p1 dfs.namenode.rpc-address.ns1.nn2 = nn2addr:p1 dfs.namenode.servicerpc-address.ns2 = nn3addr:p1 dfs.namenode.rpc-address = nn1addr:p2 fs.defaultFS = nn2addr:p2 This example comes from TestDFSUtil. In that test class, the expectation is that we will return four URIs: ns1, nn3addr:p1, nn1addr:p2, nn2addr:p2. Using my definition, the getNsServiceRpcUris() method would return three URIs: ns1, nn3addr:p1, nn1addr:p2. The defaultFS would be ignored because a service URI is already defined for the implicit name service. Since I'm proposing a change to some basic configuration semantics, I'd like some external validation. The getNsServiceRpcUris() method is only used by the mover and the balancer, so I might be overthinking this a little. Nonetheless, I think it's important to do it right and then document it thoroughly so that no one else has to try to reconstruct the intention again later.

          People

          • Assignee:
            templedf Daniel Templeton
            Reporter:
            templedf Daniel Templeton
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development