Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7285 Erasure Coding Support inside HDFS
  3. HDFS-8461

Erasure coding: fix priority level of UnderReplicatedBlocks for striped block

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • HDFS-7285
    • None
    • None
    • Reviewed

    Description

      Issues 1: correctly mark corrupted blocks.
      Issues 2: distinguish highest risk priority and normal risk priority.

      UnderReplicatedBlocks.java
        private int getPriority(int curReplicas,
        ...
          } else if (curReplicas == 1) {
            //only on replica -risk of loss
            // highest priority
            return QUEUE_HIGHEST_PRIORITY;
        ...
      

      For stripe blocks, we should return QUEUE_HIGHEST_PRIORITY when curReplicas == 6( Suppose 6+3 schema).

      That's important. Because

      BlockManager.java
      DatanodeDescriptor[] chooseSourceDatanodes(BlockInfo block,
        ...
           if(priority != UnderReplicatedBlocks.QUEUE_HIGHEST_PRIORITY 
                && !node.isDecommissionInProgress() 
                && node.getNumberOfBlocksToBeReplicated() >= maxReplicationStreams)
            {
              continue; // already reached replication limit
            }
        ...
      

      It may return not enough source DNs ( maybe 5), and failed to recover.
      A busy node should not be skiped if a block has highest risk/priority. The issue is the striped block doesn't have priority.

      Attachments

        1. HDFS-8461-HDFS-7285.001.patch
          19 kB
          Walter Su
        2. HDFS-8461-HDFS-7285.002.patch
          19 kB
          Walter Su

        Issue Links

          Activity

            People

              walter.k.su Walter Su
              walter.k.su Walter Su
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: