Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3224

Bug in check for DN re-registration with different storage ID

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.3-alpha, 0.23.5
    • Component/s: None
    • Labels:
      None

      Description

      DatanodeManager#registerDatanode checks the host to node map using an IP:port key, however the map is keyed on IP, so this check will always fail. It's performing the check to determine if a DN with the same IP and storage ID has already registered, and if so to remove this DN from the map and indicate that eg it's no longer hosting these blocks. This bug has been here forever.

      1. HDFS-3224-branch0.23.patch
        4 kB
        Jason Lowe
      2. HDFS-3224.patch
        6 kB
        Jason Lowe
      3. HDFS-3224.patch
        6 kB
        Jason Lowe
      4. HDFS-3224.patch
        6 kB
        Jason Lowe
      5. HDFS-3224.patch
        2 kB
        Jason Lowe

        Issue Links

          Activity

          Hide
          Eli Collins added a comment -

          Btw here was the code before HDFS-3144. Host2NodesMap was using getHost which is the "name" w/o port

          -    String host = node.getHost();
          +    String ipAddr = node.getIpAddr();
          

          and DatanodeManager was doing lookups with the "name"

          -    DatanodeDescriptor nodeN = getDatanodeByHost(nodeReg.getName());
          +    DatanodeDescriptor nodeN = getDatanodeByHost(nodeReg.getXferAddr());
          
          Show
          Eli Collins added a comment - Btw here was the code before HDFS-3144 . Host2NodesMap was using getHost which is the "name" w/o port - String host = node.getHost(); + String ipAddr = node.getIpAddr(); and DatanodeManager was doing lookups with the "name" - DatanodeDescriptor nodeN = getDatanodeByHost(nodeReg.getName()); + DatanodeDescriptor nodeN = getDatanodeByHost(nodeReg.getXferAddr());
          Hide
          Jason Lowe added a comment -

          This bug seems benign but is causing issues with ops monitoring scripts because it allows a node to be reported as simultaneously live and dead by the NN web UI and JMX. Here's one scenario:

          • Node is registered and appears as a live node
          • Node fails badly, starts showing up as a dead node
          • Node is re-imaged by ops as a fresh node
          • Node rejoins the cluster, and now the same host is reported as both live and dead

          Since re-imaging the node causes it to get a new storage ID, the failure to recognized it by name means the NN thinks it's a totally different node and therefore the node is placed in the datanode map twice for the two storage IDs.

          In this case I think we should be calling getDatanodeByName (i.e.: where we include the port). This would help us properly distinguish datanodes that are using ephemeral ports (e.g.: miniclusters).

          Show
          Jason Lowe added a comment - This bug seems benign but is causing issues with ops monitoring scripts because it allows a node to be reported as simultaneously live and dead by the NN web UI and JMX. Here's one scenario: Node is registered and appears as a live node Node fails badly, starts showing up as a dead node Node is re-imaged by ops as a fresh node Node rejoins the cluster, and now the same host is reported as both live and dead Since re-imaging the node causes it to get a new storage ID, the failure to recognized it by name means the NN thinks it's a totally different node and therefore the node is placed in the datanode map twice for the two storage IDs. In this case I think we should be calling getDatanodeByName (i.e.: where we include the port). This would help us properly distinguish datanodes that are using ephemeral ports (e.g.: miniclusters).
          Hide
          Jason Lowe added a comment -

          Here's a quick-n-dirty patch that fixes the issue in my limited testing. Host2NodeMap.getDatanodeByName was recently removed, so I added it back in. It seems appropriate here although the name is unfortunate re: confusion about host vs. name.

          Still needs a unit test, but posting now for feedback on whether we really should be using ip (host) vs ip:port (name).

          Show
          Jason Lowe added a comment - Here's a quick-n-dirty patch that fixes the issue in my limited testing. Host2NodeMap.getDatanodeByName was recently removed, so I added it back in. It seems appropriate here although the name is unfortunate re: confusion about host vs. name. Still needs a unit test, but posting now for feedback on whether we really should be using ip (host) vs ip:port (name).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548049/HDFS-3224.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestCheckpoint

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3278//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3278//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548049/HDFS-3224.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCheckpoint +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3278//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3278//console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          Thanks for looking at this Jason. Why not just fix the lookup to use IP rather than IP:port?

          Show
          Eli Collins added a comment - Thanks for looking at this Jason. Why not just fix the lookup to use IP rather than IP:port?
          Hide
          Jason Lowe added a comment -

          Why not just fix the lookup to use IP rather than IP:port?

          I think if we just have the lookup use IP then we're going to create problems with setups that launch multiple datanodes per host like miniclusters. If we lookup only by IP then as soon as the second datanode from the same IP registers we'll accidentally think it's replacing the first datanode which isn't the intent.

          TestCheckpoint failure appears to be unrelated, see HDFS-4006. I'll update the patch to cleanup the names and add a test case.

          Show
          Jason Lowe added a comment - Why not just fix the lookup to use IP rather than IP:port? I think if we just have the lookup use IP then we're going to create problems with setups that launch multiple datanodes per host like miniclusters. If we lookup only by IP then as soon as the second datanode from the same IP registers we'll accidentally think it's replacing the first datanode which isn't the intent. TestCheckpoint failure appears to be unrelated, see HDFS-4006 . I'll update the patch to cleanup the names and add a test case.
          Hide
          Daryn Sharp added a comment -

          I don't think it matters whether host or ip is used, it all about adding the port. Since the rest of the code is expecting the nodes to be keyed on host, using host:port is the minimal and least risky change.

          Show
          Daryn Sharp added a comment - I don't think it matters whether host or ip is used, it all about adding the port. Since the rest of the code is expecting the nodes to be keyed on host, using host:port is the minimal and least risky change.
          Hide
          Rajiv Chittajallu added a comment -

          We haven't noticed a node being listed in live and dead list in 0.20.

          Show
          Rajiv Chittajallu added a comment - We haven't noticed a node being listed in live and dead list in 0.20.
          Hide
          Jason Lowe added a comment -

          Updated patch with refactored interface in Host2NodesMap and a test case.

          I thought it would be good to have the old getDatanodeByName interface explicitly take the IP address and xfer port arguments. It's clearer what is expected, and the old interface would silently fail if the :port portion was missing from the address string.

          Show
          Jason Lowe added a comment - Updated patch with refactored interface in Host2NodesMap and a test case. I thought it would be good to have the old getDatanodeByName interface explicitly take the IP address and xfer port arguments. It's clearer what is expected, and the old interface would silently fail if the :port portion was missing from the address string.
          Hide
          Jason Lowe added a comment -

          Patch for 0.23 will need to be slightly different. I'll update the patch for 0.23 after we arrive at a consensus for the trunk patch.

          Show
          Jason Lowe added a comment - Patch for 0.23 will need to be slightly different. I'll update the patch for 0.23 after we arrive at a consensus for the trunk patch.
          Hide
          Daryn Sharp added a comment -

          I'm a bit squeamish that Host2NodesMap.getDatanodeByXferAddr(String, int) requires the caller to have the intimate knowledge of how Host2NodesMap#add(DatanodeDescriptor) keyed the node in the map. It's creating a landmine if someone doesn't update both the add the caller's lookups in unison - and we're right back to this bug.

          Your thoughts Eli?

          Show
          Daryn Sharp added a comment - I'm a bit squeamish that Host2NodesMap.getDatanodeByXferAddr(String, int) requires the caller to have the intimate knowledge of how Host2NodesMap#add(DatanodeDescriptor) keyed the node in the map. It's creating a landmine if someone doesn't update both the add the caller's lookups in unison - and we're right back to this bug. Your thoughts Eli?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548252/HDFS-3224.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestHDFSFileSystemContract

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3291//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3291//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548252/HDFS-3224.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHDFSFileSystemContract +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3291//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3291//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          The TestHDFSFileSystemContract failure appears to be unrelated. The test passes for me locally, and I see no evidence in the logs that it decided to replace one datanode registration with another, which is the essence of this change.

          Show
          Jason Lowe added a comment - The TestHDFSFileSystemContract failure appears to be unrelated. The test passes for me locally, and I see no evidence in the logs that it decided to replace one datanode registration with another, which is the essence of this change.
          Hide
          Suresh Srinivas added a comment -

          I'm a bit squeamish that Host2NodesMap.getDatanodeByXferAddr(String, int) requires the caller to have the intimate knowledge of how Host2NodesMap#add(DatanodeDescriptor) keyed the node in the map. It's creating a landmine if someone doesn't update both the add the caller's lookups in unison - and we're right back to this bug.

          Daryn I do not understand the part "update both the add the caller's lookups". We could continue that discussion and if it warrants we could create a separate jira to address your concerns.

          +1. This patch looks good to me.

          Show
          Suresh Srinivas added a comment - I'm a bit squeamish that Host2NodesMap.getDatanodeByXferAddr(String, int) requires the caller to have the intimate knowledge of how Host2NodesMap#add(DatanodeDescriptor) keyed the node in the map. It's creating a landmine if someone doesn't update both the add the caller's lookups in unison - and we're right back to this bug. Daryn I do not understand the part "update both the add the caller's lookups". We could continue that discussion and if it warrants we could create a separate jira to address your concerns. +1. This patch looks good to me.
          Hide
          Eli Collins added a comment -

          I don't think it matters whether host or ip is used, it all about adding the port. Since the rest of the code is expecting the nodes to be keyed on host, using host:port is the minimal and least risky change.

          Just using the IP is the minimal change. Checking the port supports mini clusters but doesn't work if you use an ephemeral port (eg "0.0.0.0:0" for dfs.datanode.address) because the DN will re-register with a different port and getDatanodeByXferAddr won't find it. How about adding a warn if we find an entry for the IP but do not match any of the transfer port?

          @Daryn, since getDatanodeByXferAddr lives in Host2NodesMap it seems OK that it use its implementation details. If we decide DNs are uniquely identified by xferAddr:xferPort then we could key Host2NodesMap on this rather than just the IP, which would be cleaner.

          Show
          Eli Collins added a comment - I don't think it matters whether host or ip is used, it all about adding the port. Since the rest of the code is expecting the nodes to be keyed on host, using host:port is the minimal and least risky change. Just using the IP is the minimal change. Checking the port supports mini clusters but doesn't work if you use an ephemeral port (eg "0.0.0.0:0" for dfs.datanode.address) because the DN will re-register with a different port and getDatanodeByXferAddr won't find it. How about adding a warn if we find an entry for the IP but do not match any of the transfer port? @Daryn, since getDatanodeByXferAddr lives in Host2NodesMap it seems OK that it use its implementation details. If we decide DNs are uniquely identified by xferAddr:xferPort then we could key Host2NodesMap on this rather than just the IP, which would be cleaner.
          Hide
          Daryn Sharp added a comment -

          Suresh, it's that Hosts2NodesMap is internally decided on how it will key the node in the map. Now the caller assumes that it knows how the node is keyed when fetching the node. Two possible approaches to decouple them are: 1) the caller specifies the map key for the node, and again specifies the same key when fetching a node. Or the node is passed to the fetch so Hosts2NodesMap can extract the same key as it did when adding the node to the map.

          It's kind of the same situation where the TokenCache assumed it knew how the fs token would be keyed in the Credentials. When the assumption went out of sync in some cases, it caused extra tokens to be fetched. In this case it would confuse the node manager as to whether the node is new or a replacement.

          Show
          Daryn Sharp added a comment - Suresh, it's that Hosts2NodesMap is internally decided on how it will key the node in the map. Now the caller assumes that it knows how the node is keyed when fetching the node. Two possible approaches to decouple them are: 1) the caller specifies the map key for the node, and again specifies the same key when fetching a node. Or the node is passed to the fetch so Hosts2NodesMap can extract the same key as it did when adding the node to the map. It's kind of the same situation where the TokenCache assumed it knew how the fs token would be keyed in the Credentials . When the assumption went out of sync in some cases, it caused extra tokens to be fetched. In this case it would confuse the node manager as to whether the node is new or a replacement.
          Hide
          Jason Lowe added a comment -

          Updated patch with a warning if we don't get a hit on IP:port but do get a hit on IP (i.e.: we detect multiple datanodes per host).

          Show
          Jason Lowe added a comment - Updated patch with a warning if we don't get a hit on IP:port but do get a hit on IP (i.e.: we detect multiple datanodes per host).
          Hide
          Suresh Srinivas added a comment -

          There are three elements in DN identity - Storage ID, host address and transfer port. When storage ID does not change and other two elements change, it is correctly handled by the namenode.

          When storage ID changes:

          1. Old behavior: remove any DN with matching host address (no transfer address)
            1. In case of single DN on the same host, correct DN is removed
            2. In case of multiple DNs on the same host, a random DN is removed
          2. Current behavior: remove any DN with matching host address + transfer address
            1. In case of single DN on the same host, no DN is removed
            2. In case of multiple DNs on the same host, no DN is removed
          3. Behavior with this patch: remove any DN with matching host address + transfer address
            1. In case of single DN on the same host, correct DN is removed
            2. In case of multiple DNs on the same host, correct DN is removed

          When Storage ID changes with IP address and/or transfer address, we will end up having two entries. I do not think this can be avoided.

          I have seen cases in many setups where multiple DNs are being run on the same machine. Given this, I prefer the new behavior with this patch.

          Show
          Suresh Srinivas added a comment - There are three elements in DN identity - Storage ID, host address and transfer port. When storage ID does not change and other two elements change, it is correctly handled by the namenode. When storage ID changes: Old behavior: remove any DN with matching host address (no transfer address) In case of single DN on the same host, correct DN is removed In case of multiple DNs on the same host, a random DN is removed Current behavior: remove any DN with matching host address + transfer address In case of single DN on the same host, no DN is removed In case of multiple DNs on the same host, no DN is removed Behavior with this patch: remove any DN with matching host address + transfer address In case of single DN on the same host, correct DN is removed In case of multiple DNs on the same host, correct DN is removed When Storage ID changes with IP address and/or transfer address, we will end up having two entries. I do not think this can be avoided. I have seen cases in many setups where multiple DNs are being run on the same machine. Given this, I prefer the new behavior with this patch.
          Hide
          Suresh Srinivas added a comment - - edited

          When Storage ID changes with IP address and/or transfer address, we will end up having two entries. I do not think this can be avoided.

          Given this, I am not sure if printing the warning is useful.

          Show
          Suresh Srinivas added a comment - - edited When Storage ID changes with IP address and/or transfer address, we will end up having two entries. I do not think this can be avoided. Given this, I am not sure if printing the warning is useful.
          Hide
          Jason Lowe added a comment -

          I agree that we cannot resolve the ambiguity brought about by datanodes using ephemeral ports – it could be a new datanode rejoining the cluster or an existing one that was re-imaged (and lost its storage ID). And I'd rather not emit warnings for normal behavior when multiple datanodes per host are being used, so I'm OK with going back to the previous patch without it.

          Show
          Jason Lowe added a comment - I agree that we cannot resolve the ambiguity brought about by datanodes using ephemeral ports – it could be a new datanode rejoining the cluster or an existing one that was re-imaged (and lost its storage ID). And I'd rather not emit warnings for normal behavior when multiple datanodes per host are being used, so I'm OK with going back to the previous patch without it.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548481/HDFS-3224.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3297//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3297//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548481/HDFS-3224.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3297//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3297//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          Re-submitting the patch with the multiple datanode per host warning removed per the discussion with Suresh.

          Eli, let me know if you're OK with the patch or if we need something further. Thanks!

          Show
          Jason Lowe added a comment - Re-submitting the patch with the multiple datanode per host warning removed per the discussion with Suresh. Eli, let me know if you're OK with the patch or if we need something further. Thanks!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548501/HDFS-3224.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3298//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3298//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548501/HDFS-3224.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3298//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3298//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          I had already posted my +1 on this patch. Eli let me know if you agree with the rationale I have posted. We could perhaps commit this sometime tomorrow.

          Show
          Suresh Srinivas added a comment - I had already posted my +1 on this patch. Eli let me know if you agree with the rationale I have posted. We could perhaps commit this sometime tomorrow.
          Hide
          Eli Collins added a comment -

          +1 on the rationale and latest patch. I agree that given there are cases outside the tests where multiple DNs/host are used warning doesn't make sense.

          Show
          Eli Collins added a comment - +1 on the rationale and latest patch. I agree that given there are cases outside the tests where multiple DNs/host are used warning doesn't make sense.
          Hide
          Jason Lowe added a comment -

          Thanks Suresh and Eli for the reviews. The TestStandbyCheckpoints failure is a known issue, HDFS-3806. I'll commit this shortly.

          Show
          Jason Lowe added a comment - Thanks Suresh and Eli for the reviews. The TestStandbyCheckpoints failure is a known issue, HDFS-3806 . I'll commit this shortly.
          Hide
          Jason Lowe added a comment -

          I committed this to trunk and branch-2. Will post patch for branch-0.23 shortly.

          Show
          Jason Lowe added a comment - I committed this to trunk and branch-2. Will post patch for branch-0.23 shortly.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2904 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2904/)
          HDFS-3224. Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1396798)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396798
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2904 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2904/ ) HDFS-3224 . Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1396798) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396798 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2842 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2842/)
          HDFS-3224. Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1396798)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396798
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2842 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2842/ ) HDFS-3224 . Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1396798) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396798 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Hide
          Jason Lowe added a comment -

          Patch for branch-0.23.

          Show
          Jason Lowe added a comment - Patch for branch-0.23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2865 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2865/)
          HDFS-3224. Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1396798)

          Result = FAILURE
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396798
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2865 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2865/ ) HDFS-3224 . Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1396798) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396798 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Hide
          Jason Lowe added a comment -

          Manually ran test-patch.sh on the branch-0.23 patch:

          -1 overall.  
          
              +1 @author.  The patch does not contain any @author tags.
          
              +1 tests included.  The patch appears to include 3 new or modified tests.
          
              +1 javadoc.  The javadoc tool did not generate any warning messages.
          
              +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
          
              +1 eclipse:eclipse.  The patch built with eclipse:eclipse.
          
              +1 findbugs.  The patch does not introduce any new Findbugs (version ) warnings.
          
              +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
              -1 core tests.  The patch failed these unit tests:
                            org.apache.hadoop.hdfs.TestDatanodeBlockScanner
          
              +1 contrib tests.  The patch passed contrib unit tests.
          

          The TestDatanodeBlockScanner failure appears to be unrelated, and it passed when I ran the test again. I also manually tested a re-imaged datanode rejoining the cluster and the namenode correctly recognized it as a previously registered datanode after this patch.

          Show
          Jason Lowe added a comment - Manually ran test-patch.sh on the branch-0.23 patch: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDatanodeBlockScanner +1 contrib tests. The patch passed contrib unit tests. The TestDatanodeBlockScanner failure appears to be unrelated, and it passed when I ran the test again. I also manually tested a re-imaged datanode rejoining the cluster and the namenode correctly recognized it as a previously registered datanode after this patch.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1192 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1192/)
          HDFS-3224. Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1396798)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396798
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1192 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1192/ ) HDFS-3224 . Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1396798) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396798 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1223 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1223/)
          HDFS-3224. Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1396798)

          Result = FAILURE
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396798
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1223 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1223/ ) HDFS-3224 . Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1396798) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396798 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Hide
          Robert Joseph Evans added a comment -

          It looks like a clean port to 0.23. +1 feel free to check it in.

          Show
          Robert Joseph Evans added a comment - It looks like a clean port to 0.23. +1 feel free to check it in.
          Hide
          Jason Lowe added a comment -

          Thanks, Bobby. I committed this to branch-0.23.

          Show
          Jason Lowe added a comment - Thanks, Bobby. I committed this to branch-0.23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #402 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/402/)
          HDFS-3224. Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1397100)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397100
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #402 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/402/ ) HDFS-3224 . Bug in check for DN re-registration with different storage ID. Contributed by Jason Lowe (Revision 1397100) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397100 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java

            People

            • Assignee:
              Jason Lowe
              Reporter:
              Eli Collins
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development