Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15200

Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.3.0, 3.2.3
    • None
    • None
    • Reviewed

    Description

      Presently invalidateBlock(..) before adding a replica into invalidates, checks whether any block replica is on stale storage, if any replica is on stale storage, it postpones deletion of the replica.
      Here :

         // Check how many copies we have of the block
          if (nr.replicasOnStaleNodes() > 0) {
            blockLog.debug("BLOCK* invalidateBlocks: postponing " +
                "invalidation of {} on {} because {} replica(s) are located on " +
                "nodes with potentially out-of-date block reports", b, dn,
                nr.replicasOnStaleNodes());
            postponeBlock(b.getCorrupted());
            return false;
      

      In case of corrupt replica, we can skip this logic and delete the corrupt replica immediately, as a corrupt replica can't get corrected.

      One outcome of this behavior presently is namenodes showing different block states post failover, as:
      If a replica is marked corrupt, the Active NN, will mark it as corrupt, and mark it for deletion and remove it from corruptReplica's and excessRedundancyMap.
      If before the deletion of replica, Failover happens.
      The standby Namenode will mark all the storages as stale.
      Then will start processing IBR's, Now since the replica's would be on stale storage, it will skip deletion, and removal from corruptReplica's
      Hence both the namenode will show different numbers and different corrupt replicas.

      Attachments

        1. HDFS-15200-01.patch
          6 kB
          Ayush Saxena
        2. HDFS-15200-02.patch
          9 kB
          Ayush Saxena
        3. HDFS-15200-03.patch
          9 kB
          Ayush Saxena
        4. HDFS-15200-04.patch
          9 kB
          Ayush Saxena
        5. HDFS-15200-05.patch
          9 kB
          Ayush Saxena

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ayushtkn Ayush Saxena
            ayushtkn Ayush Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment