Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1480

All replicas of a block can end up on the same rack when some datanodes are decommissioning.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.2
    • Fix Version/s: 0.23.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      It appears that all replicas of a block can end up in the same rack. The likelihood of such replicas seems to be directly related to decommissioning of nodes.

      Post rolling OS upgrade (decommission 3-10% of nodes, re-install etc, add them back) of a running cluster, all replicas of about 0.16% of blocks ended up in the same rack.

      Hadoop Namenode UI etc doesn't seem to know about such incorrectly replicated blocks. "hadoop fsck .." does report that the blocks must be replicated on additional racks.

      Looking at ReplicationTargetChooser.java, following seem suspect:

      snippet-01:

          int maxNodesPerRack =
            (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;
      

      snippet-02:

            case 2:
              if (clusterMap.isOnSameRack(results.get(0), results.get(1))) {
                chooseRemoteRack(1, results.get(0), excludedNodes,
                                 blocksize, maxNodesPerRack, results);
              } else if (newBlock){
                chooseLocalRack(results.get(1), excludedNodes, blocksize,
                                maxNodesPerRack, results);
              } else {
                chooseLocalRack(writer, excludedNodes, blocksize,
                                maxNodesPerRack, results);
              }
              if (--numOfReplicas == 0) {
                break;
              }
      

      snippet-03:

          do {
            DatanodeDescriptor[] selectedNodes =
              chooseRandom(1, nodes, excludedNodes);
            if (selectedNodes.length == 0) {
              throw new NotEnoughReplicasException(
                                                   "Not able to place enough replicas");
            }
            result = (DatanodeDescriptor)(selectedNodes[0]);
          } while(!isGoodTarget(result, blocksize, maxNodesPerRack, results));
      

        Attachments

        1. hdfs-1480-test.txt
          3 kB
          Todd Lipcon
        2. hdfs-1480.txt
          22 kB
          Todd Lipcon
        3. hdfs-1480.txt
          23 kB
          Todd Lipcon
        4. hdfs-1480.txt
          22 kB
          Todd Lipcon

          Issue Links

            Activity

              People

              • Assignee:
                tlipcon Todd Lipcon
                Reporter:
                mary T Meyarivan
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: