Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3070

HDFS balancer doesn't ensure that hdfs-site.xml is loaded

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.0-alpha
    • Component/s: balancer
    • Labels:
      None

      Description

      I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, both have over 3% disk usage.
      Attached is a screenshot of the Live Nodes web UI.

      On styx01, I run the hdfs balancer command with threshold 1% and don't see the blocks being balanced across all 4 datanodes (all blocks on styx01 and styx02 stay put).

      HA is currently enabled.

      [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
      active
      [schu@styx01 ~]$ hdfs balancer -threshold 1
      12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
      12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
      12/03/08 10:10:32 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
      Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
      Balancing took 95.0 milliseconds
      [schu@styx01 ~]$

      I believe with a threshold of 1% the balancer should trigger blocks being moved across DataNodes, right? I am curious about the "namenode = []" from the above output.

      [schu@styx01 ~]$ hadoop version
      Hadoop 0.24.0-SNAPSHOT
      Subversion git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common -r f6a577d697bbcd04ffbc568167c97b79479ff319
      Compiled by schu on Thu Mar 8 15:32:50 PST 2012
      From source with checksum ec971a6e7316f7fbf471b617905856b8

      From http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
      The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently.

      1. HDFS-3070.patch
        1 kB
        Aaron T. Myers
      2. unbalanced_nodes_inservice.png
        59 kB
        Stephen Chu
      3. unbalanced_nodes.png
        59 kB
        Stephen Chu

        Activity

        Stephen Chu created issue -
        Stephen Chu made changes -
        Field Original Value New Value
        Attachment unbalanced_nodes.png [ 12517657 ]
        Hide
        Stephen Chu added a comment -

        Woops, the first screenshot shows that 2 nodes are decommissioned. After recommissioning them and attempting to run hdfs balancer, the nodes still don't become balanced and the balancer claims to complete ~100 ms.

        Show
        Stephen Chu added a comment - Woops, the first screenshot shows that 2 nodes are decommissioned. After recommissioning them and attempting to run hdfs balancer, the nodes still don't become balanced and the balancer claims to complete ~100 ms.
        Stephen Chu made changes -
        Attachment unbalanced_nodes_inservice.png [ 12517658 ]
        Eli Collins made changes -
        Target Version/s 0.23.3 [ 12320052 ]
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []

        The namenode lists is empty. You have to set dfs.namenode.servicerpc-address.

        Show
        Tsz Wo Nicholas Sze added a comment - > 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = [] The namenode lists is empty. You have to set dfs.namenode.servicerpc-address.
        Hide
        Eli Collins added a comment -

        Stephen,
        What are dfs.namenode.rpc-address and servicerpc-address set to in the configs?

        I suspect at least the 1st is set so it might be a bug in the method the balancer uses to determine the namenodes (eg doesn't work for a federated or HA conf).

        Show
        Eli Collins added a comment - Stephen, What are dfs.namenode.rpc-address and servicerpc-address set to in the configs? I suspect at least the 1st is set so it might be a bug in the method the balancer uses to determine the namenodes (eg doesn't work for a federated or HA conf).
        Hide
        Stephen Chu added a comment -

        Eli, servicerpc-address was not configured in hdfs-site.xml.

        dfs.namenode.rpc-address:

          <property>
            <name>dfs.namenode.rpc-address.ha-nn-uri.nn1</name>
            <value>styx01.sf.cloudera.com:12020</value>
          </property>
          <property>
            <name>dfs.namenode.rpc-address.ha-nn-uri.nn2</name>
            <value>styx02.sf.cloudera.com:12020</value>
          </property>
        
        Show
        Stephen Chu added a comment - Eli, servicerpc-address was not configured in hdfs-site.xml. dfs.namenode.rpc-address: <property> <name>dfs.namenode.rpc-address.ha-nn-uri.nn1</name> <value>styx01.sf.cloudera.com:12020</value> </property> <property> <name>dfs.namenode.rpc-address.ha-nn-uri.nn2</name> <value>styx02.sf.cloudera.com:12020</value> </property>
        Hide
        Aaron T. Myers added a comment -

        If the lack of having servicerpc-address configured caused this, then I would still consider that a bug. The balancer should work even if only the normal NN RPC address is configured.

        Show
        Aaron T. Myers added a comment - If the lack of having servicerpc-address configured caused this, then I would still consider that a bug. The balancer should work even if only the normal NN RPC address is configured.
        Hide
        Eli Collins added a comment -

        Yea sounds like a bug in the method the balancer uses to determine the namenodes.

        Show
        Eli Collins added a comment - Yea sounds like a bug in the method the balancer uses to determine the namenodes.
        Aaron T. Myers made changes -
        Assignee Aaron T. Myers [ atm ]
        Hide
        Aaron T. Myers added a comment -

        Sigh. Looks like this problem is the classic "hdfs-site.xml happens to never get loaded because HdfsConfiguration is never statically initialized in the JVM" issue. The tests don't catch this because MiniDFSCluster sets up the configuration explicitly, without hdfs-site.xml having to get loaded.

        Here's a patch which addresses the issue. I tested this manually and confirmed that without the fix, the balancer won't run, but with the fix it runs just fine. Sample output:

        12/03/30 19:06:08 INFO balancer.Balancer: namenodes = [hdfs://ha-nn-uri]
        12/03/30 19:06:08 INFO balancer.Balancer: p         = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
        Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
        12/03/30 19:06:09 INFO net.NetworkTopology: Adding a new node: /default-rack/172.29.20.100:50010
        12/03/30 19:06:09 INFO balancer.Balancer: 0 over-utilized: []
        12/03/30 19:06:09 INFO balancer.Balancer: 0 underutilized: []
        The cluster is balanced. Exiting...
        Balancing took 1.255 seconds
        
        Show
        Aaron T. Myers added a comment - Sigh. Looks like this problem is the classic "hdfs-site.xml happens to never get loaded because HdfsConfiguration is never statically initialized in the JVM" issue. The tests don't catch this because MiniDFSCluster sets up the configuration explicitly, without hdfs-site.xml having to get loaded. Here's a patch which addresses the issue. I tested this manually and confirmed that without the fix, the balancer won't run, but with the fix it runs just fine. Sample output: 12/03/30 19:06:08 INFO balancer.Balancer: namenodes = [hdfs://ha-nn-uri] 12/03/30 19:06:08 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 12/03/30 19:06:09 INFO net.NetworkTopology: Adding a new node: /default-rack/172.29.20.100:50010 12/03/30 19:06:09 INFO balancer.Balancer: 0 over-utilized: [] 12/03/30 19:06:09 INFO balancer.Balancer: 0 underutilized: [] The cluster is balanced. Exiting... Balancing took 1.255 seconds
        Aaron T. Myers made changes -
        Attachment HDFS-3070.patch [ 12520710 ]
        Aaron T. Myers made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 2.0.0 [ 12320353 ]
        Affects Version/s 0.24.0 [ 12317653 ]
        Target Version/s 0.23.3 [ 12320052 ] 2.0.0 [ 12320353 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12520710/HDFS-3070.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2136//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2136//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520710/HDFS-3070.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2136//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2136//console This message is automatically generated.
        Hide
        Uma Maheswara Rao G added a comment -

        Aaron, You are right. We have seen this yesterday and realized it.
        Before Federation we might not have the requirement of loading properties from hdfs-site.xml in balancer, some might have proceeded with default values set in the code. Becaus eConfiguration can load core-site.xml files.

        I agree with the fix that creating the HdfsConfiguration class and passing.

        To catch this bug in tests itself, I would suggest to call the runBalancerCLI( expose new API from Balancer with package scope) and make the run method private.

        static int runBalancerCLI(String[] args) throws Exception {
            return ToolRunner.run(null, new Cli(), args); //Here you have to fix
          }
        

        let main method and all tests call this function.

        output from tests :

        2012-03-31 12:19:47,340 INFO balancer.Balancer (Balancer.java:parse(1508)) - Using a threshold of 10.0
        2012-03-31 12:19:47,340 INFO balancer.Balancer (Balancer.java:run(1387)) - namenodes = []
        2012-03-31 12:19:47,340 INFO balancer.Balancer (Balancer.java:run(1388)) - p = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
        Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
        Balancing took 1.0 milliseconds
        2012-03-31 12:19:47,341 INFO balancer.Balancer (TestBalancerWithMultipleNameNodes.java:runBalancer(164)) - BALANCER 2
        2012-03-31 12:19:47,341 INFO balancer.Balancer (TestBalancerWithMultipleNameNodes.java:wait(132)) - WAIT expectedUsedSpace=350, expectedTotalSpace=1000
        2012-03-31 12:19:47,341 INFO balancer.Balancer (TestBalancerWithMultipleNameNodes.java:runBalancer(166)) - BALANCER 3
        2012-03-31 12:19:47,342 WARN balancer.Balancer (TestBalancerWithMultipleNameNodes.java:runBalancer(183)) - datanodes[0]: getDfsUsed()=60, getCapacity()=500
        2012-03-31 12:19:47,343 WARN balancer.Balancer (TestBalancerWithMultipleNameNodes.java:runBalancer(183)) - datanodes[1]: getDfsUsed()=290, getCapacity()=500
        2012-03-31 12:19:47,344 WARN balancer.Balancer (TestBalancerWithMultipleNameNodes.java:runBalancer(200)) - datanodes 1 is not yet balanced: used=290, cap=500, avg=35.0

        Remove HdfsConfiguration object creation from Balancer Tests.

        Show
        Uma Maheswara Rao G added a comment - Aaron, You are right. We have seen this yesterday and realized it. Before Federation we might not have the requirement of loading properties from hdfs-site.xml in balancer, some might have proceeded with default values set in the code. Becaus eConfiguration can load core-site.xml files. I agree with the fix that creating the HdfsConfiguration class and passing. To catch this bug in tests itself, I would suggest to call the runBalancerCLI( expose new API from Balancer with package scope) and make the run method private. static int runBalancerCLI( String [] args) throws Exception { return ToolRunner.run( null , new Cli(), args); //Here you have to fix } let main method and all tests call this function. output from tests : 2012-03-31 12:19:47,340 INFO balancer.Balancer (Balancer.java:parse(1508)) - Using a threshold of 10.0 2012-03-31 12:19:47,340 INFO balancer.Balancer (Balancer.java:run(1387)) - namenodes = [] 2012-03-31 12:19:47,340 INFO balancer.Balancer (Balancer.java:run(1388)) - p = Balancer.Parameters [BalancingPolicy.Node, threshold=10.0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Balancing took 1.0 milliseconds 2012-03-31 12:19:47,341 INFO balancer.Balancer (TestBalancerWithMultipleNameNodes.java:runBalancer(164)) - BALANCER 2 2012-03-31 12:19:47,341 INFO balancer.Balancer (TestBalancerWithMultipleNameNodes.java:wait(132)) - WAIT expectedUsedSpace=350, expectedTotalSpace=1000 2012-03-31 12:19:47,341 INFO balancer.Balancer (TestBalancerWithMultipleNameNodes.java:runBalancer(166)) - BALANCER 3 2012-03-31 12:19:47,342 WARN balancer.Balancer (TestBalancerWithMultipleNameNodes.java:runBalancer(183)) - datanodes [0] : getDfsUsed()=60, getCapacity()=500 2012-03-31 12:19:47,343 WARN balancer.Balancer (TestBalancerWithMultipleNameNodes.java:runBalancer(183)) - datanodes [1] : getDfsUsed()=290, getCapacity()=500 2012-03-31 12:19:47,344 WARN balancer.Balancer (TestBalancerWithMultipleNameNodes.java:runBalancer(200)) - datanodes 1 is not yet balanced: used=290, cap=500, avg=35.0 Remove HdfsConfiguration object creation from Balancer Tests.
        Hide
        Uma Maheswara Rao G added a comment -

        BTW, could you please edit the issue title?

        Show
        Uma Maheswara Rao G added a comment - BTW, could you please edit the issue title?
        Hide
        Aaron T. Myers added a comment -

        Hi Uma,

        To catch this bug in tests itself, I would suggest to call the runBalancerCLI...

        I don't think this will actually expose the bug. The trouble isn't that the object isn't an instance of HdfsConfiguration, but rather that HdfsConfiguration never gets class-loaded and therefore the static initializer that add hdfs-default.xml and hdfs-site.xml as resources never gets called. Another perfectly valid solution would have been to continue to pass "null" for the configuration object, but to call HdfsConfiguration#init() somewhere (anywhere) in the Balancer. So, the only way to write a test that would catch this would be if from the tests we forked a new JVM to run the balancer, and examining the effects. Doing that doesn't seem worth it to me, for something that's such a simple bug.

        BTW, could you please edit the issue title?

        Good idea. Will do.

        Show
        Aaron T. Myers added a comment - Hi Uma, To catch this bug in tests itself, I would suggest to call the runBalancerCLI... I don't think this will actually expose the bug. The trouble isn't that the object isn't an instance of HdfsConfiguration, but rather that HdfsConfiguration never gets class-loaded and therefore the static initializer that add hdfs-default.xml and hdfs-site.xml as resources never gets called. Another perfectly valid solution would have been to continue to pass "null" for the configuration object, but to call HdfsConfiguration#init() somewhere (anywhere) in the Balancer. So, the only way to write a test that would catch this would be if from the tests we forked a new JVM to run the balancer, and examining the effects. Doing that doesn't seem worth it to me, for something that's such a simple bug. BTW, could you please edit the issue title? Good idea. Will do.
        Aaron T. Myers made changes -
        Summary hdfs balancer doesn't balance blocks between datanodes HDFS balancer doesn't ensure that hdfs-site.xml is loaded
        Hide
        Uma Maheswara Rao G added a comment -

        So, the only way to write a test that would catch this would be if from the tests we forked a new JVM to run the balancer, and examining the effects.

        I remember, in our Jenkins it will spawn separate JVM for each test class. no?


        Doing that doesn't seem worth it to me, for something that's such a simple bug.

        I agree, this is very simple fix. But there is a functional effect.

        If we have the test in above suggested way, that would have caught while re-factoring for Federation and introducing the dependency on rpc addresses to start. I am not very much insisting to change. If you feel not required, you can leave it. I won't block, because of simple test change.

        +1

        Show
        Uma Maheswara Rao G added a comment - So, the only way to write a test that would catch this would be if from the tests we forked a new JVM to run the balancer, and examining the effects. I remember, in our Jenkins it will spawn separate JVM for each test class. no? Doing that doesn't seem worth it to me, for something that's such a simple bug. I agree, this is very simple fix. But there is a functional effect. If we have the test in above suggested way, that would have caught while re-factoring for Federation and introducing the dependency on rpc addresses to start. I am not very much insisting to change. If you feel not required, you can leave it. I won't block, because of simple test change. +1
        Hide
        Eli Collins added a comment -

        +1 to HDFS-3070.patch, looks good

        Show
        Eli Collins added a comment - +1 to HDFS-3070 .patch, looks good
        Hide
        Aaron T. Myers added a comment -

        I remember, in our Jenkins it will spawn separate JVM for each test class. no?

        True, but if that test class starts a MiniDFSCluster to run the balancer against, then the test won't detect any problem with the balancer, since the MiniDFSCluster will cause HdfsConfiguration to be class-loaded.

        Uma, if I'm misunderstanding what you're proposing, perhaps you could post some code to illustrate how this would work? If you do, I'll be sure to review it promptly.

        In the mean time, I'm going to go ahead and commit this patch since everyone seems to agree that this will fix the bug.

        Show
        Aaron T. Myers added a comment - I remember, in our Jenkins it will spawn separate JVM for each test class. no? True, but if that test class starts a MiniDFSCluster to run the balancer against, then the test won't detect any problem with the balancer, since the MiniDFSCluster will cause HdfsConfiguration to be class-loaded. Uma, if I'm misunderstanding what you're proposing, perhaps you could post some code to illustrate how this would work? If you do, I'll be sure to review it promptly. In the mean time, I'm going to go ahead and commit this patch since everyone seems to agree that this will fix the bug.
        Hide
        Aaron T. Myers added a comment -

        I've just committed this to branch-2 and trunk.

        Thanks a lot for the reviews, Uma and Eli.

        Show
        Aaron T. Myers added a comment - I've just committed this to branch-2 and trunk. Thanks a lot for the reviews, Uma and Eli.
        Aaron T. Myers made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 2.0.0 [ 12320353 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2033 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2033/)
        HDFS-3070. HDFS balancer doesn't ensure that hdfs-site.xml is loaded. Contributed by Aaron T. Myers. (Revision 1307841)

        Result = SUCCESS
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307841
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2033 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2033/ ) HDFS-3070 . HDFS balancer doesn't ensure that hdfs-site.xml is loaded. Contributed by Aaron T. Myers. (Revision 1307841) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307841 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #1958 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1958/)
        HDFS-3070. HDFS balancer doesn't ensure that hdfs-site.xml is loaded. Contributed by Aaron T. Myers. (Revision 1307841)

        Result = SUCCESS
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307841
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1958 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1958/ ) HDFS-3070 . HDFS balancer doesn't ensure that hdfs-site.xml is loaded. Contributed by Aaron T. Myers. (Revision 1307841) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307841 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #1971 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1971/)
        HDFS-3070. HDFS balancer doesn't ensure that hdfs-site.xml is loaded. Contributed by Aaron T. Myers. (Revision 1307841)

        Result = ABORTED
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307841
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1971 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1971/ ) HDFS-3070 . HDFS balancer doesn't ensure that hdfs-site.xml is loaded. Contributed by Aaron T. Myers. (Revision 1307841) Result = ABORTED atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307841 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
        Hide
        Uma Maheswara Rao G added a comment -

        True, but if that test class starts a MiniDFSCluster to run the balancer against, then the test won't detect any problem with the balancer, since the MiniDFSCluster will cause HdfsConfiguration to be class-loaded.

        Yah, True. I agree with you. It's my mistake. Totally forgot about MiniDFSCluster conf loading.

        Other question is, it looks like Balancer completely depending on dfs.federation.nameservices right?

        Show
        Uma Maheswara Rao G added a comment - True, but if that test class starts a MiniDFSCluster to run the balancer against, then the test won't detect any problem with the balancer, since the MiniDFSCluster will cause HdfsConfiguration to be class-loaded. Yah, True. I agree with you. It's my mistake. Totally forgot about MiniDFSCluster conf loading. Other question is, it looks like Balancer completely depending on dfs.federation.nameservices right?
        Hide
        Aaron T. Myers added a comment -

        Other question is, it looks like Balancer completely depending on dfs.federation.nameservices right?

        Nope. Note that DFSUtil#getNameServiceUris also adds URIs for just the straight conf keys, even if they're not suffixed with a nameservice ID.

        Show
        Aaron T. Myers added a comment - Other question is, it looks like Balancer completely depending on dfs.federation.nameservices right? Nope. Note that DFSUtil#getNameServiceUris also adds URIs for just the straight conf keys, even if they're not suffixed with a nameservice ID.
        Hide
        Uma Maheswara Rao G added a comment -

        Thanks Aaron,
        Recalling the API name (getNameServiceUris), I got doubt from outside(couldn't get chance to look into code, I was outside). Never mind for this silly question.

        I have seen that getNameServiceUris impl about adding URIs for just the straight conf keys in code. Its loading all variable length of keys. No issues.

        Show
        Uma Maheswara Rao G added a comment - Thanks Aaron, Recalling the API name (getNameServiceUris), I got doubt from outside(couldn't get chance to look into code, I was outside). Never mind for this silly question. I have seen that getNameServiceUris impl about adding URIs for just the straight conf keys in code. Its loading all variable length of keys. No issues.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1002 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1002/)
        HDFS-3070. HDFS balancer doesn't ensure that hdfs-site.xml is loaded. Contributed by Aaron T. Myers. (Revision 1307841)

        Result = FAILURE
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307841
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1002 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1002/ ) HDFS-3070 . HDFS balancer doesn't ensure that hdfs-site.xml is loaded. Contributed by Aaron T. Myers. (Revision 1307841) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307841 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1037 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1037/)
        HDFS-3070. HDFS balancer doesn't ensure that hdfs-site.xml is loaded. Contributed by Aaron T. Myers. (Revision 1307841)

        Result = FAILURE
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307841
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1037 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1037/ ) HDFS-3070 . HDFS balancer doesn't ensure that hdfs-site.xml is loaded. Contributed by Aaron T. Myers. (Revision 1307841) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307841 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
        Hide
        amith added a comment -

        Hi Aaron,

        I doubt if we configure only fs.default.name balancer will work?

        Configuring fs.default.name is required to ensure backward compatibility with old configurations.

        Show
        amith added a comment - Hi Aaron, I doubt if we configure only fs.default.name balancer will work? Configuring fs.default.name is required to ensure backward compatibility with old configurations.
        Hide
        Aaron T. Myers added a comment -

        Hi amith, that does seem to be true, though I'm not sure that it's a strict requirement that we support only setting fs.default.name at this point in time, since the ability to set the NN address via other configuration settings has existed for several releases. My personal opinion is that we should make the NN (and balancer, etc) not ever use fs.default.name as an indicator of the NN service bind address, but rather only as a client-side URI to use when a full FS URI is not given. Ideally we would have a system of deprecation which signals that fs.default.name is being used as the desired bind address when the NN address is configured in no other way, but as it stands our config deprecation system is only able to show warnings deprecating a named key in favor of another key.

        Regardless, would you like to open a new JIRA to address this issue, amith?

        Show
        Aaron T. Myers added a comment - Hi amith, that does seem to be true, though I'm not sure that it's a strict requirement that we support only setting fs.default.name at this point in time, since the ability to set the NN address via other configuration settings has existed for several releases. My personal opinion is that we should make the NN (and balancer, etc) not ever use fs.default.name as an indicator of the NN service bind address, but rather only as a client-side URI to use when a full FS URI is not given. Ideally we would have a system of deprecation which signals that fs.default.name is being used as the desired bind address when the NN address is configured in no other way, but as it stands our config deprecation system is only able to show warnings deprecating a named key in favor of another key. Regardless, would you like to open a new JIRA to address this issue, amith?
        Hide
        amith added a comment -

        Thanks Aaron,

        I agree with u

        Show
        amith added a comment - Thanks Aaron, I agree with u

          People

          • Assignee:
            Aaron T. Myers
            Reporter:
            Stephen Chu
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development