Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-894

DatanodeID.ipcPort is not updated when existing node re-registers

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.20.1, 0.21.0, 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In FSNamesystem.registerDatanode, it checks if a registering node is a reregistration of an old one based on storage ID. If so, it simply updates the old one with the new registration info. However, the new ipcPort is lost when this happens.

      I produced manually this by setting up a DN with IPC port set to 0 (so it picks an ephemeral port) and then restarting the DN. At this point, the NN's view of the ipcPort is stale, and clients will not be able to achieve pipeline recovery.

      This should be easy to fix and unit test, but not sure when I'll get to it, so anyone else should feel free to grab it if they get to it first.

      1. hdfs-894.txt
        4 kB
        Todd Lipcon

        Activity

        Todd Lipcon created issue -
        Todd Lipcon made changes -
        Field Original Value New Value
        Attachment hdfs-894.txt [ 12434614 ]
        Todd Lipcon made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Todd Lipcon added a comment -

        This should be fixed in all three current branches. As mentioned in the description, it can prevent the write pipeline from recovering since ClientDatanodeProtocol and InterDatanodeProtocol won't be able to connect.

        Show
        Todd Lipcon added a comment - This should be fixed in all three current branches. As mentioned in the description, it can prevent the write pipeline from recovering since ClientDatanodeProtocol and InterDatanodeProtocol won't be able to connect.
        Todd Lipcon made changes -
        Component/s name-node [ 12312926 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12434614/hdfs-894.txt
        against trunk revision 905760.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434614/hdfs-894.txt against trunk revision 905760. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        Failed test seems unrelated.

        Show
        Todd Lipcon added a comment - Failed test seems unrelated.
        Hide
        Todd Lipcon added a comment -

        Filed HDFS-953 for the testpatch failure seen above.

        I think this is ready to commit.

        Show
        Todd Lipcon added a comment - Filed HDFS-953 for the testpatch failure seen above. I think this is ready to commit.
        Todd Lipcon made changes -
        Assignee Todd Lipcon [ tlipcon ]
        Hide
        Tom White added a comment -

        +1

        Show
        Tom White added a comment - +1
        Hide
        dhruba borthakur added a comment -

        The code looks good. But since this is not a regression (and datanodes typically re-register with the same ipcPort) can we put this patch only in trunk?

        Show
        dhruba borthakur added a comment - The code looks good. But since this is not a regression (and datanodes typically re-register with the same ipcPort) can we put this patch only in trunk?
        Hide
        Todd Lipcon added a comment -

        datanodes typically re-register with the same ipcPort

        Unless you've configured the datanode IPC port to 0 - I do this on my test clusters on shared hardware, for example.

        this is not a regression

        true enough. I find it an obvious enough bug that causes big problems when binding to port 0, that we should put it in all branches. But if you disagree, trunk's fine.

        Show
        Todd Lipcon added a comment - datanodes typically re-register with the same ipcPort Unless you've configured the datanode IPC port to 0 - I do this on my test clusters on shared hardware, for example. this is not a regression true enough. I find it an obvious enough bug that causes big problems when binding to port 0, that we should put it in all branches. But if you disagree, trunk's fine.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Todd!

        Show
        Tom White added a comment - I've just committed this. Thanks Todd!
        Tom White made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Tom White made changes -
        Fix Version/s 0.22.0 [ 12314241 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #193 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/193/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #193 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/193/ )
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #275 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #275 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/ )
        Tom White made changes -
        Fix Version/s 0.21.0 [ 12314046 ]
        Fix Version/s 0.22.0 [ 12314241 ]
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        20d 20h 16m 1 Todd Lipcon 02/Feb/10 23:32
        Patch Available Patch Available Resolved Resolved
        13d 23h 42m 1 Tom White 16/Feb/10 23:14
        Resolved Resolved Closed Closed
        188d 21h 36m 1 Tom White 24/Aug/10 21:51

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development