Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16456

EC: Decommission a rack with only on dn will fail when the rack number is equal with replication

    XMLWordPrintableJSON

Details

    Description

      In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason:

      1. Enable EC policy, such as RS-6-3-1024k.
      2. The rack number in this cluster is equal with or less than the replication number(9)
      3. A rack only has one DN, and decommission this DN.

      The root cause is in BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will give a limit parameter maxNodesPerRack for choose targets. In this scenario, the maxNodesPerRack is 1, which means each rack can only be chosen one datanode.

        protected int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) {
         ...
          // If more replicas than racks, evenly spread the replicas.
          // This calculation rounds up.
          int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1;
          return new int[] {numOfReplicas, maxNodesPerRack};
        } 

      int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1;
      here will be called, where totalNumOfReplicas=9 and  numOfRacks=9  

      When we decommission one dn which is only one node in its rack, the chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder() will throw NotEnoughReplicasException, but the exception will not be caught and fail to fallback to chooseEvenlyFromRemainingRacks() function.

      When decommission, after choose targets, verifyBlockPlacement() function will return the total rack number contains the invalid rack, and BlockPlacementStatusDefault::isPlacementPolicySatisfied() will return false and it will also cause decommission fail.

        public BlockPlacementStatus verifyBlockPlacement(DatanodeInfo[] locs,
            int numberOfReplicas) {
          if (locs == null)
            locs = DatanodeDescriptor.EMPTY_ARRAY;
          if (!clusterMap.hasClusterEverBeenMultiRack()) {
            // only one rack
            return new BlockPlacementStatusDefault(1, 1, 1);
          }
          // Count locations on different racks.
          Set<String> racks = new HashSet<>();
          for (DatanodeInfo dn : locs) {
            racks.add(dn.getNetworkLocation());
          }
          return new BlockPlacementStatusDefault(racks.size(), numberOfReplicas,
              clusterMap.getNumOfRacks());
        } 
        public boolean isPlacementPolicySatisfied() {
          return requiredRacks <= currentRacks || currentRacks >= totalRacks;
        }

      According to the above description, we should make the below modify to fix it:

      1. In startDecommission() or stopDecommission(), we should also change the numOfRacks in class NetworkTopology. Or choose targets may fail for the maxNodesPerRack is too small. And even choose targets success, isPlacementPolicySatisfied will also return false cause decommission fail.
      2. In BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder(), the first chooseOnce() function should also be put in try..catch..., or it will not fallback to call chooseEvenlyFromRemainingRacks() when throw exception.
      3. In verifyBlockPlacement, we need to remove invalid racks from total numOfRacks, or isPlacementPolicySatisfied() will return false and cause fail to reconstruct data.

       

       

      Attachments

        1. HDFS-16456.010.patch
          21 kB
          caozhiqiang
        2. HDFS-16456.009.patch
          26 kB
          caozhiqiang
        3. HDFS-16456.008.patch
          27 kB
          caozhiqiang
        4. HDFS-16456.007.patch
          21 kB
          caozhiqiang
        5. HDFS-16456.006.patch
          21 kB
          caozhiqiang
        6. HDFS-16456.005.patch
          22 kB
          caozhiqiang
        7. HDFS-16456.004.patch
          22 kB
          caozhiqiang
        8. HDFS-16456.003.patch
          19 kB
          caozhiqiang
        9. HDFS-16456.002.patch
          19 kB
          caozhiqiang
        10. HDFS-16456.001.patch
          12 kB
          caozhiqiang

        Issue Links

          Activity

            People

              caozhiqiang caozhiqiang
              caozhiqiang caozhiqiang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 40m
                  3h 40m