Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: Append Branch
    • Fix Version/s: Append Branch
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This is a followed up issue of HDFS-619. We are going to implement step 4c described in the block recovery algorithm.

      1. updateReplica2.patch
        9 kB
        Konstantin Shvachko
      2. updateReplica1.patch
        9 kB
        Hairong Kuang
      3. updateReplica.patch
        12 kB
        Hairong Kuang
      4. h627_20090924.patch
        12 kB
        Tsz Wo Nicholas Sze
      5. h627_20090923b.patch
        12 kB
        Tsz Wo Nicholas Sze
      6. h627_20090923.patch
        10 kB
        Tsz Wo Nicholas Sze
      7. h627_20090922.patch
        10 kB
        Tsz Wo Nicholas Sze
      8. h627_20090921b.patch
        10 kB
        Tsz Wo Nicholas Sze
      9. h627_20090921.patch
        7 kB
        Tsz Wo Nicholas Sze
      10. h627_20090917.patch
        4 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090917.patch: patch for review. Will add unit tests.

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090917.patch: patch for review. Will add unit tests.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090921.patch: latest patch. It is not easy to create new unit tests. Still working on it.

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090921.patch: latest patch. It is not easy to create new unit tests. Still working on it.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090921b.patch: added a test (finally).

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090921b.patch: added a test (finally).
          Hide
          Hairong Kuang added a comment -

          1. A replica needs to be "detached" before it gets truncated so truncation won't modify snapshot. Sorry this is not described in the design document.
          2. When finalizing a RUR, we need to make sure that the finalized data/meta files are located in the "finalized" directory. Check FSDataset#finalizeBlock to see how a replica is finalized.
          3. Looks that we need to make sure that rbw#getBytesOnDiks() should be equal to rbw#getNumOfBytes() when stopping a writer..

          Show
          Hairong Kuang added a comment - 1. A replica needs to be "detached" before it gets truncated so truncation won't modify snapshot. Sorry this is not described in the design document. 2. When finalizing a RUR, we need to make sure that the finalized data/meta files are located in the "finalized" directory. Check FSDataset#finalizeBlock to see how a replica is finalized. 3. Looks that we need to make sure that rbw#getBytesOnDiks() should be equal to rbw#getNumOfBytes() when stopping a writer..
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090922.patch: only able to fix (1).

          > 2. When finalizing a RUR, we need to make sure that the finalized data/meta files are located in the "finalized" directory. Check FSDataset#finalizeBlock to see how a replica is finalized.

          I cannot see any codes in FSDataset#finalizeBlock checking "finalized" directory. Could you give me more hints?

          3. Looks that we need to make sure that rbw#getBytesOnDiks() should be equal to rbw#getNumOfBytes() when stopping a writer..

          Then, it should be fixed in initReplicaRecovery(..), right?

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090922.patch: only able to fix (1). > 2. When finalizing a RUR, we need to make sure that the finalized data/meta files are located in the "finalized" directory. Check FSDataset#finalizeBlock to see how a replica is finalized. I cannot see any codes in FSDataset#finalizeBlock checking "finalized" directory. Could you give me more hints? 3. Looks that we need to make sure that rbw#getBytesOnDiks() should be equal to rbw#getNumOfBytes() when stopping a writer.. Then, it should be fixed in initReplicaRecovery(..), right?
          Hide
          Hairong Kuang added a comment -

          > I cannot see any codes in FSDataset#finalizeBlock checking "finalized" directory. Could you give me more hints?
          You need to move the replica from directory "rbw" to directory "finalized". This is done by FSVolume#addBlock(Block, File).

          Show
          Hairong Kuang added a comment - > I cannot see any codes in FSDataset#finalizeBlock checking "finalized" directory. Could you give me more hints? You need to move the replica from directory "rbw" to directory "finalized". This is done by FSVolume#addBlock(Block, File).
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks, Hairong.

          h627_20090923.patch: calling finalizeBlock(..).

          Show
          Tsz Wo Nicholas Sze added a comment - Thanks, Hairong. h627_20090923.patch: calling finalizeBlock(..).
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090923b.patch: use detachFile(..) instead of detachBlock(..)

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090923b.patch: use detachFile(..) instead of detachBlock(..)
          Hide
          Hairong Kuang added a comment -

          Calling finalizeBlock in the last may not work because the replicaInfo in the replicasMap is already a finalized one.

          Show
          Hairong Kuang added a comment - Calling finalizeBlock in the last may not work because the replicaInfo in the replicasMap is already a finalized one.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h627_20090924.patch: check "finalized" in ReplicaUnderRecovery.toFinalizedReplica(..).

          Show
          Tsz Wo Nicholas Sze added a comment - h627_20090924.patch: check "finalized" in ReplicaUnderRecovery.toFinalizedReplica(..).
          Hide
          Hairong Kuang added a comment -

          It would be nice if the logic for finalizing a block is at FSDataset#finalizeBlock.

          Show
          Hairong Kuang added a comment - It would be nice if the logic for finalizing a block is at FSDataset#finalizeBlock.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > It would be nice if the logic for finalizing a block is at FSDataset#finalizeBlock.

          Yes, but I don't see any easy way to use it. Could you point out how to use finalizeBlock(..) in the patch?

          Show
          Tsz Wo Nicholas Sze added a comment - > It would be nice if the logic for finalizing a block is at FSDataset#finalizeBlock. Yes, but I don't see any easy way to use it. Could you point out how to use finalizeBlock(..) in the patch?
          Hide
          Hairong Kuang added a comment -

          OK, let me give a shot. With so many code changes, I agree it is not easy to figure it out.

          This patch has a few changes to Nicholas' the most recent patch. His most recent patch has the right logic. My new patch simply makes it simpler by reusing existing code.
          1. use bumpReplicaGS to bump RUR's GS to be recovery id;
          2. refactor finalizeReplica that takes a ReplicaInfo to finalize a replica, and use finalizeReplica to finalize the rur;
          2. remove the static method updateReplica and change updateReplicaUnderRecovery to non-static.

          Show
          Hairong Kuang added a comment - OK, let me give a shot. With so many code changes, I agree it is not easy to figure it out. This patch has a few changes to Nicholas' the most recent patch. His most recent patch has the right logic. My new patch simply makes it simpler by reusing existing code. 1. use bumpReplicaGS to bump RUR's GS to be recovery id; 2. refactor finalizeReplica that takes a ReplicaInfo to finalize a replica, and use finalizeReplica to finalize the rur; 2. remove the static method updateReplica and change updateReplicaUnderRecovery to non-static.
          Hide
          Hairong Kuang added a comment -

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          Hairong Kuang added a comment - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Hairong, you may revert ReplicaUnderRecovery.java and ReplicaInfo.java. It seems that the changes are not needed in your patch.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Hairong, you may revert ReplicaUnderRecovery.java and ReplicaInfo.java. It seems that the changes are not needed in your patch.
          Hide
          Hairong Kuang added a comment -

          > you may revert ReplicaUnderRecovery.java and ReplicaInfo.java. It seems that the changes are not needed in your patch.

          Done. Thanks, Nicholas!

          Show
          Hairong Kuang added a comment - > you may revert ReplicaUnderRecovery.java and ReplicaInfo.java. It seems that the changes are not needed in your patch. Done. Thanks, Nicholas!
          Hide
          Hairong Kuang added a comment -

          Passed all unit tests except for TestBackupNode, the ones in HDFS-647, and TestBlockUnderConstruction. But the last seemed not related to this patch.

          Show
          Hairong Kuang added a comment - Passed all unit tests except for TestBackupNode, the ones in HDFS-647 , and TestBlockUnderConstruction. But the last seemed not related to this patch.
          Hide
          Konstantin Shvachko added a comment -

          Merged the patch with trunk. TestBlockUnderConstruction does not fail.

          Show
          Konstantin Shvachko added a comment - Merged the patch with trunk. TestBlockUnderConstruction does not fail.
          Hide
          Konstantin Shvachko added a comment -

          I just committed this. Thank you Nicholas and Hairong.

          Show
          Konstantin Shvachko added a comment - I just committed this. Thank you Nicholas and Hairong.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The patch committed has a NPE problem. See HDFS-676.

          What was the reason to remove the following codes from h627_20090924.patch?

          +  /** Static version of {@link #updateReplica(Block, long, long)} */
          +  static void updateReplica(final ReplicasMap map, final Block block,
          +      final long recoveryId, final long length) throws IOException {
          +    //get replica
          +    final ReplicaInfo replica = map.get(block.getBlockId());
          +    DataNode.LOG.info("updateReplica: block=" + block
          +        + ", recoveryId=" + recoveryId
          +        + ", length=" + length
          +        + ", replica=" + replica);
          +
          +    //check replica
          +    if (replica == null) {
          +      throw new ReplicaNotFoundException(block);
          +    }
          
          Show
          Tsz Wo Nicholas Sze added a comment - The patch committed has a NPE problem. See HDFS-676 . What was the reason to remove the following codes from h627_20090924.patch? + /** Static version of {@link #updateReplica(Block, long , long )} */ + static void updateReplica( final ReplicasMap map, final Block block, + final long recoveryId, final long length) throws IOException { + //get replica + final ReplicaInfo replica = map.get(block.getBlockId()); + DataNode.LOG.info( "updateReplica: block=" + block + + ", recoveryId=" + recoveryId + + ", length=" + length + + ", replica=" + replica); + + //check replica + if (replica == null ) { + throw new ReplicaNotFoundException(block); + }
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > What was the reason to remove the following codes from h627_20090924.patch?

          Oops, the codes are indeed committed but they are not used by HDFS-658.

          Show
          Tsz Wo Nicholas Sze added a comment - > What was the reason to remove the following codes from h627_20090924.patch? Oops, the codes are indeed committed but they are not used by HDFS-658 .

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development