Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-894

DatanodeID.ipcPort is not updated when existing node re-registers

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.20.1, 0.21.0, 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In FSNamesystem.registerDatanode, it checks if a registering node is a reregistration of an old one based on storage ID. If so, it simply updates the old one with the new registration info. However, the new ipcPort is lost when this happens.

      I produced manually this by setting up a DN with IPC port set to 0 (so it picks an ephemeral port) and then restarting the DN. At this point, the NN's view of the ipcPort is stale, and clients will not be able to achieve pipeline recovery.

      This should be easy to fix and unit test, but not sure when I'll get to it, so anyone else should feel free to grab it if they get to it first.

      1. hdfs-894.txt
        4 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        This should be fixed in all three current branches. As mentioned in the description, it can prevent the write pipeline from recovering since ClientDatanodeProtocol and InterDatanodeProtocol won't be able to connect.

        Show
        Todd Lipcon added a comment - This should be fixed in all three current branches. As mentioned in the description, it can prevent the write pipeline from recovering since ClientDatanodeProtocol and InterDatanodeProtocol won't be able to connect.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12434614/hdfs-894.txt
        against trunk revision 905760.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434614/hdfs-894.txt against trunk revision 905760. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        Failed test seems unrelated.

        Show
        Todd Lipcon added a comment - Failed test seems unrelated.
        Hide
        Todd Lipcon added a comment -

        Filed HDFS-953 for the testpatch failure seen above.

        I think this is ready to commit.

        Show
        Todd Lipcon added a comment - Filed HDFS-953 for the testpatch failure seen above. I think this is ready to commit.
        Hide
        Tom White added a comment -

        +1

        Show
        Tom White added a comment - +1
        Hide
        dhruba borthakur added a comment -

        The code looks good. But since this is not a regression (and datanodes typically re-register with the same ipcPort) can we put this patch only in trunk?

        Show
        dhruba borthakur added a comment - The code looks good. But since this is not a regression (and datanodes typically re-register with the same ipcPort) can we put this patch only in trunk?
        Hide
        Todd Lipcon added a comment -

        datanodes typically re-register with the same ipcPort

        Unless you've configured the datanode IPC port to 0 - I do this on my test clusters on shared hardware, for example.

        this is not a regression

        true enough. I find it an obvious enough bug that causes big problems when binding to port 0, that we should put it in all branches. But if you disagree, trunk's fine.

        Show
        Todd Lipcon added a comment - datanodes typically re-register with the same ipcPort Unless you've configured the datanode IPC port to 0 - I do this on my test clusters on shared hardware, for example. this is not a regression true enough. I find it an obvious enough bug that causes big problems when binding to port 0, that we should put it in all branches. But if you disagree, trunk's fine.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Todd!

        Show
        Tom White added a comment - I've just committed this. Thanks Todd!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #193 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/193/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #193 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/193/ )
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #275 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #275 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/ )

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development