Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13770

dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.7
    • 2.10.0, 2.8.6, 2.9.3
    • hdfs
    • None

    Description

      Missing blocks (with replication factor 1) metric is not always decreased when file is deleted.

      If a file is deleted, the remove function of UnderReplicatedBlocks can be called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called with the wrong priority the corruptReplOneBlocks metric is not decreased, however the block is removed from the priority queue which contains it.

      The corresponding code:

      /** remove a block from a under replication queue */
      synchronized boolean remove(BlockInfo block,
       int oldReplicas,
       int oldReadOnlyReplicas,
       int decommissionedReplicas,
       int oldExpectedReplicas) {
       final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
       decommissionedReplicas, oldExpectedReplicas);
       boolean removedBlock = remove(block, priLevel);
       if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
       oldExpectedReplicas == 1 &&
       removedBlock) {
       corruptReplOneBlocks--;
       assert corruptReplOneBlocks >= 0 :
       "Number of corrupt blocks with replication factor 1 " +
       "should be non-negative";
       }
       return removedBlock;
      }
      
      /**
       * Remove a block from the under replication queues.
       *
       * The priLevel parameter is a hint of which queue to query
       * first: if negative or >= \{@link #LEVEL} this shortcutting
       * is not attmpted.
       *
       * If the block is not found in the nominated queue, an attempt is made to
       * remove it from all queues.
       *
       * <i>Warning:</i> This is not a synchronized method.
       * @param block block to remove
       * @param priLevel expected privilege level
       * @return true if the block was found and removed from one of the priority queues
       */
      boolean remove(BlockInfo block, int priLevel) {
       if(priLevel >= 0 && priLevel < LEVEL
       && priorityQueues.get(priLevel).remove(block)) {
       NameNode.blockStateChangeLog.debug(
       "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
       " from priority queue {}", block, priLevel);
       return true;
       } else {
       // Try to remove the block from all queues if the block was
       // not found in the queue for the given priority level.
       for (int i = 0; i < LEVEL; i++) {
       if (i != priLevel && priorityQueues.get(i).remove(block)) {
       NameNode.blockStateChangeLog.debug(
       "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
       " {} from priority queue {}", block, i);
       return true;
       }
       }
       }
       return false;
      }
      

      It is already fixed on trunk by this jira: HDFS-10999, but that ticket introduces new metrics, which I think should't be backported to branch-2.

       

      Attachments

        1. HDFS-13770-branch-2.001.patch
          7 kB
          Kitti Nanasi
        2. HDFS-13770-branch-2.002.patch
          9 kB
          Kitti Nanasi
        3. HDFS-13770-branch-2.003.patch
          9 kB
          Kitti Nanasi
        4. HDFS-13770-branch-2.004.patch
          9 kB
          Wei-Chiu Chuang
        5. HDFS-13770-branch-2-005.patch
          9 kB
          Wei-Chiu Chuang

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            knanasi Kitti Nanasi
            knanasi Kitti Nanasi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment