Hadoop Common
  1. Hadoop Common
  2. HADOOP-5891

If dfs.http.address is default, SecondaryNameNode can't find NameNode

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: fs
    • Labels:
      None

      Description

      As detailed in this blog post:
      http://www.cloudera.com/blog/2009/02/10/multi-host-secondarynamenode-configuration/
      if dfs.http.address is not configured, and the 2NN is a different machine from the NN, the 2NN fails to connect.

      In SecondaryNameNode.getInfoServer, the 2NN should notice a "0.0.0.0" dfs.http.address and, in that case, pull the hostname out of fs.default.name. This would fix the default configuration to work properly for most users.

      1. hadoop-5891.txt
        0.9 kB
        Todd Lipcon

        Activity

        Hide
        gary murry added a comment -

        From Email exchange: This code change is already exercised by TestCheckpoint.

        Show
        gary murry added a comment - From Email exchange: This code change is already exercised by TestCheckpoint.
        Hide
        dhruba borthakur added a comment -

        I just committed this. Thanks Todd!

        Show
        dhruba borthakur added a comment - I just committed this. Thanks Todd!
        Hide
        dhruba borthakur added a comment -

        +1. code looks good.

        Show
        dhruba borthakur added a comment - +1. code looks good.
        Hide
        Todd Lipcon added a comment -

        As usual, the failing test is unrelated (capacity scheduler).

        Show
        Todd Lipcon added a comment - As usual, the failing test is unrelated (capacity scheduler).
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12408775/hadoop-5891.txt
        against trunk revision 778289.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/396/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/396/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/396/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/396/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408775/hadoop-5891.txt against trunk revision 778289. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/396/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/396/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/396/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/396/console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        Steve: I agree that service location can be improved across the board. However, I don't think it's necessarily a good idea to overload the NameNode as a service name daemon. Personally, I'd prefer to use something like ZooKeeper here. Obviously there needs to be at least one host that is in a "well-known" location, which could be configured by default as a "hadoop-zk" hostname which has multiple A records pointing to all of the ZK nodes.

        Anyway, I agree that we should work towards the ideal goal, but I'd like to have that discussion in a new JIRA. This one is a very simple fix whereas that one could be pretty significant.

        Show
        Todd Lipcon added a comment - Steve: I agree that service location can be improved across the board. However, I don't think it's necessarily a good idea to overload the NameNode as a service name daemon. Personally, I'd prefer to use something like ZooKeeper here. Obviously there needs to be at least one host that is in a "well-known" location, which could be configured by default as a "hadoop-zk" hostname which has multiple A records pointing to all of the ZK nodes. Anyway, I agree that we should work towards the ideal goal, but I'd like to have that discussion in a new JIRA. This one is a very simple fix whereas that one could be pretty significant.
        Hide
        Steve Loughran added a comment -

        this whole problem of bootstrapping a cluster where machines don't know who they are is pretty brittle right now. In an ideal world, even the NN would be able to work out its name/address and share it with the rest, but failing that, having everything else work out the details by asking the NN would be handy. It would also be good if everything provided (in the same process and via JMX) a list of (service, address, port) for all the different things that the node runs. I try to reverse engineer that, but it adds more scheduling problems (don't start the downstream nodes until the NN and JT are live), and for some reason jetty comes up bonded to 0:0:0:0:0:1 on one machine, which is particularly irritating.

        so: +1 to this, I can see the BackupNode having the same problem on scaled up, as it needs to know both the NN and 2N addresses (note addresses, not hostnames.

        Maybe we should open this up to a general "nodes to come up better on an under-configured network" bugrep which those of us who do underconfigure their networks can deal with.

        Show
        Steve Loughran added a comment - this whole problem of bootstrapping a cluster where machines don't know who they are is pretty brittle right now. In an ideal world, even the NN would be able to work out its name/address and share it with the rest, but failing that, having everything else work out the details by asking the NN would be handy. It would also be good if everything provided (in the same process and via JMX) a list of (service, address, port) for all the different things that the node runs. I try to reverse engineer that, but it adds more scheduling problems (don't start the downstream nodes until the NN and JT are live), and for some reason jetty comes up bonded to 0:0:0:0:0:1 on one machine, which is particularly irritating. so: +1 to this, I can see the BackupNode having the same problem on scaled up, as it needs to know both the NN and 2N addresses (note addresses, not hostnames. Maybe we should open this up to a general "nodes to come up better on an under-configured network" bugrep which those of us who do underconfigure their networks can deal with.

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development