Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: Append Branch
    • Fix Version/s: Append Branch
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This is a followed up issue of HDFS-619. We are going to implement step 4c described in the block recovery algorithm.

      1. h627_20090917.patch
        4 kB
        Tsz Wo Nicholas Sze
      2. h627_20090921.patch
        7 kB
        Tsz Wo Nicholas Sze
      3. h627_20090921b.patch
        10 kB
        Tsz Wo Nicholas Sze
      4. h627_20090922.patch
        10 kB
        Tsz Wo Nicholas Sze
      5. h627_20090923.patch
        10 kB
        Tsz Wo Nicholas Sze
      6. h627_20090923b.patch
        12 kB
        Tsz Wo Nicholas Sze
      7. h627_20090924.patch
        12 kB
        Tsz Wo Nicholas Sze
      8. updateReplica.patch
        12 kB
        Hairong Kuang
      9. updateReplica1.patch
        9 kB
        Hairong Kuang
      10. updateReplica2.patch
        9 kB
        Konstantin Shvachko

        Issue Links

          Activity

          Tsz Wo Nicholas Sze created issue -
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090917.patch: patch for review. Will add unit tests.

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090917.patch: patch for review. Will add unit tests.
          Tsz Wo Nicholas Sze made changes -
          Field Original Value New Value
          Attachment h627_20090917.patch [ 12419956 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090921.patch: latest patch. It is not easy to create new unit tests. Still working on it.

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090921.patch: latest patch. It is not easy to create new unit tests. Still working on it.
          Tsz Wo Nicholas Sze made changes -
          Attachment h627_20090921.patch [ 12420234 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090921b.patch: added a test (finally).

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090921b.patch: added a test (finally).
          Tsz Wo Nicholas Sze made changes -
          Attachment h627_20090921b.patch [ 12420241 ]
          Hide
          Hairong Kuang added a comment -

          1. A replica needs to be "detached" before it gets truncated so truncation won't modify snapshot. Sorry this is not described in the design document.
          2. When finalizing a RUR, we need to make sure that the finalized data/meta files are located in the "finalized" directory. Check FSDataset#finalizeBlock to see how a replica is finalized.
          3. Looks that we need to make sure that rbw#getBytesOnDiks() should be equal to rbw#getNumOfBytes() when stopping a writer..

          Show
          Hairong Kuang added a comment - 1. A replica needs to be "detached" before it gets truncated so truncation won't modify snapshot. Sorry this is not described in the design document. 2. When finalizing a RUR, we need to make sure that the finalized data/meta files are located in the "finalized" directory. Check FSDataset#finalizeBlock to see how a replica is finalized. 3. Looks that we need to make sure that rbw#getBytesOnDiks() should be equal to rbw#getNumOfBytes() when stopping a writer..
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090922.patch: only able to fix (1).

          > 2. When finalizing a RUR, we need to make sure that the finalized data/meta files are located in the "finalized" directory. Check FSDataset#finalizeBlock to see how a replica is finalized.

          I cannot see any codes in FSDataset#finalizeBlock checking "finalized" directory. Could you give me more hints?

          3. Looks that we need to make sure that rbw#getBytesOnDiks() should be equal to rbw#getNumOfBytes() when stopping a writer..

          Then, it should be fixed in initReplicaRecovery(..), right?

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090922.patch: only able to fix (1). > 2. When finalizing a RUR, we need to make sure that the finalized data/meta files are located in the "finalized" directory. Check FSDataset#finalizeBlock to see how a replica is finalized. I cannot see any codes in FSDataset#finalizeBlock checking "finalized" directory. Could you give me more hints? 3. Looks that we need to make sure that rbw#getBytesOnDiks() should be equal to rbw#getNumOfBytes() when stopping a writer.. Then, it should be fixed in initReplicaRecovery(..), right?
          Tsz Wo Nicholas Sze made changes -
          Attachment h627_20090922.patch [ 12420328 ]
          Hide
          Hairong Kuang added a comment -

          > I cannot see any codes in FSDataset#finalizeBlock checking "finalized" directory. Could you give me more hints?
          You need to move the replica from directory "rbw" to directory "finalized". This is done by FSVolume#addBlock(Block, File).

          Show
          Hairong Kuang added a comment - > I cannot see any codes in FSDataset#finalizeBlock checking "finalized" directory. Could you give me more hints? You need to move the replica from directory "rbw" to directory "finalized". This is done by FSVolume#addBlock(Block, File).
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks, Hairong.

          h627_20090923.patch: calling finalizeBlock(..).

          Show
          Tsz Wo Nicholas Sze added a comment - Thanks, Hairong. h627_20090923.patch: calling finalizeBlock(..).
          Tsz Wo Nicholas Sze made changes -
          Attachment h627_20090923.patch [ 12420390 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090923b.patch: use detachFile(..) instead of detachBlock(..)

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090923b.patch: use detachFile(..) instead of detachBlock(..)
          Tsz Wo Nicholas Sze made changes -
          Attachment h627_20090923b.patch [ 12420420 ]
          Hide
          Hairong Kuang added a comment -

          Calling finalizeBlock in the last may not work because the replicaInfo in the replicasMap is already a finalized one.

          Show
          Hairong Kuang added a comment - Calling finalizeBlock in the last may not work because the replicaInfo in the replicasMap is already a finalized one.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090924.patch: check "finalized" in ReplicaUnderRecovery.toFinalizedReplica(..).

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090924.patch: check "finalized" in ReplicaUnderRecovery.toFinalizedReplica(..).
          Tsz Wo Nicholas Sze made changes -
          Attachment h627_20090924.patch [ 12420489 ]
          Hide
          Hairong Kuang added a comment -

          It would be nice if the logic for finalizing a block is at FSDataset#finalizeBlock.

          Show
          Hairong Kuang added a comment - It would be nice if the logic for finalizing a block is at FSDataset#finalizeBlock.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > It would be nice if the logic for finalizing a block is at FSDataset#finalizeBlock.

          Yes, but I don't see any easy way to use it. Could you point out how to use finalizeBlock(..) in the patch?

          Show
          Tsz Wo Nicholas Sze added a comment - > It would be nice if the logic for finalizing a block is at FSDataset#finalizeBlock. Yes, but I don't see any easy way to use it. Could you point out how to use finalizeBlock(..) in the patch?
          Hide
          Hairong Kuang added a comment -

          OK, let me give a shot. With so many code changes, I agree it is not easy to figure it out.

          This patch has a few changes to Nicholas' the most recent patch. His most recent patch has the right logic. My new patch simply makes it simpler by reusing existing code.
          1. use bumpReplicaGS to bump RUR's GS to be recovery id;
          2. refactor finalizeReplica that takes a ReplicaInfo to finalize a replica, and use finalizeReplica to finalize the rur;
          2. remove the static method updateReplica and change updateReplicaUnderRecovery to non-static.

          Show
          Hairong Kuang added a comment - OK, let me give a shot. With so many code changes, I agree it is not easy to figure it out. This patch has a few changes to Nicholas' the most recent patch. His most recent patch has the right logic. My new patch simply makes it simpler by reusing existing code. 1. use bumpReplicaGS to bump RUR's GS to be recovery id; 2. refactor finalizeReplica that takes a ReplicaInfo to finalize a replica, and use finalizeReplica to finalize the rur; 2. remove the static method updateReplica and change updateReplicaUnderRecovery to non-static.
          Hairong Kuang made changes -
          Attachment updateReplica.patch [ 12420616 ]
          Hide
          Hairong Kuang added a comment -

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          Hairong Kuang added a comment - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Hairong, you may revert ReplicaUnderRecovery.java and ReplicaInfo.java. It seems that the changes are not needed in your patch.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Hairong, you may revert ReplicaUnderRecovery.java and ReplicaInfo.java. It seems that the changes are not needed in your patch.
          Hide
          Hairong Kuang added a comment -

          > you may revert ReplicaUnderRecovery.java and ReplicaInfo.java. It seems that the changes are not needed in your patch.

          Done. Thanks, Nicholas!

          Show
          Hairong Kuang added a comment - > you may revert ReplicaUnderRecovery.java and ReplicaInfo.java. It seems that the changes are not needed in your patch. Done. Thanks, Nicholas!
          Hairong Kuang made changes -
          Attachment updateReplica1.patch [ 12420624 ]
          Hide
          Hairong Kuang added a comment -

          Passed all unit tests except for TestBackupNode, the ones in HDFS-647, and TestBlockUnderConstruction. But the last seemed not related to this patch.

          Show
          Hairong Kuang added a comment - Passed all unit tests except for TestBackupNode, the ones in HDFS-647 , and TestBlockUnderConstruction. But the last seemed not related to this patch.
          Hide
          Konstantin Shvachko added a comment -

          Merged the patch with trunk. TestBlockUnderConstruction does not fail.

          Show
          Konstantin Shvachko added a comment - Merged the patch with trunk. TestBlockUnderConstruction does not fail.
          Konstantin Shvachko made changes -
          Attachment updateReplica2.patch [ 12420638 ]
          Hide
          Konstantin Shvachko added a comment -

          I just committed this. Thank you Nicholas and Hairong.

          Show
          Konstantin Shvachko added a comment - I just committed this. Thank you Nicholas and Hairong.
          Konstantin Shvachko made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue relates to HDFS-676 [ HDFS-676 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The patch committed has a NPE problem. See HDFS-676.

          What was the reason to remove the following codes from h627_20090924.patch?

          +  /** Static version of {@link #updateReplica(Block, long, long)} */
          +  static void updateReplica(final ReplicasMap map, final Block block,
          +      final long recoveryId, final long length) throws IOException {
          +    //get replica
          +    final ReplicaInfo replica = map.get(block.getBlockId());
          +    DataNode.LOG.info("updateReplica: block=" + block
          +        + ", recoveryId=" + recoveryId
          +        + ", length=" + length
          +        + ", replica=" + replica);
          +
          +    //check replica
          +    if (replica == null) {
          +      throw new ReplicaNotFoundException(block);
          +    }
          
          Show
          Tsz Wo Nicholas Sze added a comment - The patch committed has a NPE problem. See HDFS-676 . What was the reason to remove the following codes from h627_20090924.patch? + /** Static version of {@link #updateReplica(Block, long , long )} */ + static void updateReplica( final ReplicasMap map, final Block block, + final long recoveryId, final long length) throws IOException { + //get replica + final ReplicaInfo replica = map.get(block.getBlockId()); + DataNode.LOG.info( "updateReplica: block=" + block + + ", recoveryId=" + recoveryId + + ", length=" + length + + ", replica=" + replica); + + //check replica + if (replica == null ) { + throw new ReplicaNotFoundException(block); + }
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > What was the reason to remove the following codes from h627_20090924.patch?

          Oops, the codes are indeed committed but they are not used by HDFS-658.

          Show
          Tsz Wo Nicholas Sze added a comment - > What was the reason to remove the following codes from h627_20090924.patch? Oops, the codes are indeed committed but they are not used by HDFS-658 .
          Hairong Kuang made changes -
          Assignee Tsz Wo (Nicholas), SZE [ szetszwo ]

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development