Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-5001

Branch-1-Win TestAzureBlockPlacementPolicy and TestReplicationPolicyWithNodeGroup failed caused by 1) old APIs and 2) incorrect value of depthOfAllLeaves

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1-win
    • Fix Version/s: 1-win
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      After the backport patch of HDFS-4975 was committed, TestAzureBlockPlacementPolicy and TestReplicationPolicyWithNodeGroup failed.
      The cause for the failure of TestReplicationPolicyWithNodeGroup is that some part in the patch of HDFS-3941 is missing. Our patch for HADOOP-495 makes methods in super class to be called incorrectly. More specifically, HDFS-4975 backported HDFS-4350, HDFS-4351, and HDFS-3912 to enable the method parameter "boolean avoidStaleNodes", and updated the APIs in BlockPlacementPolicyDefault. However, the override methods in ReplicationPolicyWithNodeGroup weren't updated.

      The cause for the failure of TestAzureBlockPlacementPolicy is similar.

      In addition, TestAzureBlockPlacementPolicy has an error. Here is the error info.

      Testcase: testPolicyWithDefaultRacks took 0.005 sec
      Caused an ERROR
      Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
      org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
      at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:396)
      at org.apache.hadoop.hdfs.server.namenode.TestAzureBlockPlacementPolicy.testPolicyWithDefaultRacks(TestAzureBlockPlacementPolicy.java:779)

      The error is caused by a check in NetworkTopology#add(Node node)

      if (depthOfAllLeaves != node.getLevel()) {
        LOG.error("Error: can't add leaf node at depth " +
            node.getLevel() + " to topology:\n" + oldTopoStr);
        throw new InvalidTopologyException("Invalid network topology. " +
            "You cannot have a rack and a non-rack node at the same " +
            "level of the network topology.");
      }
      

      The problem of this check is that when we use NetworkTopology#remove(Node node) to remove a node from the cluster, depthOfAllLeaves won't change. As a result, we can't reset the value of NetworkTopology#depathOfAllLeaves of the old topology of a cluster by just removing all its dataNode. See TestAzureBlockPlacementPolicy#testPolicyWithDefaultRacks()

      // clear the old topology
      for (Node node : dataNodes) {
        cluster.remove(node);
      }
      

        Attachments

        1. HDFS-5001.patch
          11 kB
          Douma Fang

          Issue Links

            Activity

              People

              • Assignee:
                xifang Douma Fang
                Reporter:
                xifang Douma Fang
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: