Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16028

Fix NetworkTopology chooseRandom function to support excluded nodes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.9.2
    • 3.3.0, 3.2.1, 3.1.3
    • None
    • None
    • Reviewed

    Description

      During reading the hadoop NetworkTopology.java, I suspect there is a bug in function 

      chooseRandom (line 498, hadoop version 2.9.2-RC0), 

       I think there is a bug in code, ~excludedScope doesn't mean availableNodes under Scope node, and I also add unit test for this and get an exception.

      bug code in the else.

      // code placeholder
      
       if (excludedScope == null) {
          availableNodes = countNumOfAvailableNodes(scope, excludedNodes);
        } else {
          availableNodes =
              countNumOfAvailableNodes("~" + excludedScope, excludedNodes);
        }

      Source code:

      // code placeholder
      protected Node chooseRandom(final String scope, String excludedScope,
          final Collection<Node> excludedNodes) {
        if (excludedScope != null) {
          if (scope.startsWith(excludedScope)) {
            return null;
          }
          if (!excludedScope.startsWith(scope)) {
            excludedScope = null;
          }
        }
        Node node = getNode(scope);
        if (!(node instanceof InnerNode)) {
          return excludedNodes != null && excludedNodes.contains(node) ?
              null : node;
        }
        InnerNode innerNode = (InnerNode)node;
        int numOfDatanodes = innerNode.getNumOfLeaves();
        if (excludedScope == null) {
          node = null;
        } else {
          node = getNode(excludedScope);
          if (!(node instanceof InnerNode)) {
            numOfDatanodes -= 1;
          } else {
            numOfDatanodes -= ((InnerNode)node).getNumOfLeaves();
          }
        }
        if (numOfDatanodes <= 0) {
          LOG.debug("Failed to find datanode (scope=\"{}\" excludedScope=\"{}\")."
                  + " numOfDatanodes={}",
              scope, excludedScope, numOfDatanodes);
          return null;
        }
        final int availableNodes;
        if (excludedScope == null) {
          availableNodes = countNumOfAvailableNodes(scope, excludedNodes);
        } else {
          availableNodes =
              countNumOfAvailableNodes("~" + excludedScope, excludedNodes);
        }
        LOG.debug("Choosing random from {} available nodes on node {},"
            + " scope={}, excludedScope={}, excludeNodes={}. numOfDatanodes={}.",
            availableNodes, innerNode, scope, excludedScope, excludedNodes,
            numOfDatanodes);
        Node ret = null;
        if (availableNodes > 0) {
          ret = chooseRandom(innerNode, node, excludedNodes, numOfDatanodes,
              availableNodes);
        }
        LOG.debug("chooseRandom returning {}", ret);
        return ret;
      }
      

       

       

      Add Unit Test in TestClusterTopology.java, but get exception.

       

      // code placeholder
      
      @Test
      public void testChooseRandom1() {
        // create the topology
        NetworkTopology cluster = NetworkTopology.getInstance(new Configuration());
        NodeElement node1 = getNewNode("node1", "/a1/b1/c1");
        cluster.add(node1);
        NodeElement node2 = getNewNode("node2", "/a1/b1/c1");
        cluster.add(node2);
        NodeElement node3 = getNewNode("node3", "/a1/b1/c2");
        cluster.add(node3);
        NodeElement node4 = getNewNode("node4", "/a1/b2/c3");
        cluster.add(node4);
      
        Node node = cluster.chooseRandom("/a1/b1", "/a1/b1/c1", null);
        assertSame(node.getName(), "node3");
      }
      

       

      Exception:

      // code placeholder
      
      java.lang.IllegalArgumentException: 1 should >= 2, and both should be positive. 
      at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) 
      at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:567) 
      at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:544) 
      atorg.apache.hadoop.net.TestClusterTopology.testChooseRandom1(TestClusterTopology.java:198)
      

       

       

       

      vagarychen this change is imported in PR HDFS-11577, could you help to check whether this is a bug ?

       

      Attachments

        1. image-2018-12-29-15-02-19-415.png
          20 kB
          Sihai Ke
        2. 0001-add-UT-for-NetworkTopology.patch
          2 kB
          Sihai Ke
        3. 0001-fix-NetworkTopology.java-chooseRandom-bug.patch
          1 kB
          Sihai Ke
        4. HDFS-14181.01.patch
          2 kB
          Sihai Ke
        5. HDFS-14181.02.patch
          1 kB
          Sihai Ke
        6. HDFS-14181.03.patch
          3 kB
          Sihai Ke
        7. HDFS-14181.04.patch
          4 kB
          Sihai Ke
        8. HDFS-14181.05.patch
          3 kB
          Sihai Ke
        9. HDFS-14181.06.patch
          3 kB
          Sihai Ke
        10. HDFS-14181.07.patch
          3 kB
          Sihai Ke
        11. HDFS-14181.08.patch
          3 kB
          Sihai Ke
        12. HDFS-14181.09.patch
          3 kB
          Sihai Ke

        Issue Links

          Activity

            People

              sihai Sihai Ke
              sihai Sihai Ke
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: