Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Implemented Lease Recovery to sync the last bock of a file. Added ClientDatanodeProtocol for client trigging block recovery. Changed DatanodeProtocol to support block synchronization. Changed InterDatanodeProtocol to support block update.

      Description

      In order to support file append, a GenerationStamp is associated with each block. Lease recovery will be performed when there is a possibility that the replicas of a block in a lease may have different GenerationStamp values.

      For more details, see the documentation in HADOOP-1700.

      1. 3310_20080514.patch
        38 kB
        Tsz Wo Nicholas Sze
      2. 3310_20080516b.patch
        49 kB
        Tsz Wo Nicholas Sze
      3. 3310_20080516c.patch
        47 kB
        Tsz Wo Nicholas Sze
      4. 3310_20080519.patch
        51 kB
        Tsz Wo Nicholas Sze
      5. 3310_20080519b.patch
        62 kB
        Tsz Wo Nicholas Sze
      6. 3310_20080520.patch
        76 kB
        Tsz Wo Nicholas Sze
      7. 3310_20080521.patch
        78 kB
        Tsz Wo Nicholas Sze
      8. 3310_20080522b.patch
        81 kB
        Tsz Wo Nicholas Sze
      9. 3310_20080522c.patch
        82 kB
        Tsz Wo Nicholas Sze
      10. 3310_20080523.patch
        85 kB
        Tsz Wo Nicholas Sze
      11. 3310_20080524_dhruba.patch
        88 kB
        dhruba borthakur
      12. 3310_20080527.patch
        100 kB
        Tsz Wo Nicholas Sze
      13. 3310_20080528.patch
        102 kB
        Tsz Wo Nicholas Sze
      14. 3310_20080528b.patch
        101 kB
        Tsz Wo Nicholas Sze
      15. 3310_20080528c.patch
        104 kB
        Tsz Wo Nicholas Sze
      16. 3310_20080529.patch
        96 kB
        dhruba borthakur
      17. 3310_20080529b.patch
        105 kB
        dhruba borthakur
      18. 3310_20080529c.patch
        108 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #511 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/511/ )
          Hide
          dhruba borthakur added a comment -

          I just committed it. Thanks Nicholas!

          Show
          dhruba borthakur added a comment - I just committed it. Thanks Nicholas!
          Hide
          dhruba borthakur added a comment -

          Ok, 100 nodes sound good. I will commit it.

          Show
          dhruba borthakur added a comment - Ok, 100 nodes sound good. I will commit it.
          Hide
          Mukund Madhugiri added a comment -

          I did not have resources to do a 500 node run. Here are the results for a 100 node run. Please let me know that works?

          Sort on 100 nodes with trunk: time in minutes

          • Random Writer: 15.96
          • Sort: 52.92
          • Validation: 11.94

          Sort on 100 nodes with trunk + patch: time in minutes

          • Random Writer: 14.76
          • Sort: 53.76
          • Validation: 11.46
          Show
          Mukund Madhugiri added a comment - I did not have resources to do a 500 node run. Here are the results for a 100 node run. Please let me know that works? Sort on 100 nodes with trunk : time in minutes Random Writer: 15.96 Sort: 52.92 Validation: 11.94 Sort on 100 nodes with trunk + patch : time in minutes Random Writer: 14.76 Sort: 53.76 Validation: 11.46
          Hide
          dhruba borthakur added a comment -

          If this patch passes random-writer/sort on a reasonable size cluster (e.g. 500 nodes), it will be ready for "commit".

          Show
          dhruba borthakur added a comment - If this patch passes random-writer/sort on a reasonable size cluster (e.g. 500 nodes), it will be ready for "commit".
          Hide
          Tsz Wo Nicholas Sze added a comment - - edited

          The failed TestIndexedSort is not related to this issue, see HADOOP-3471.

          Show
          Tsz Wo Nicholas Sze added a comment - - edited The failed TestIndexedSort is not related to this issue, see HADOOP-3471 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12383056/3310_20080529c.patch
          against trunk revision 661771.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2523/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2523/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2523/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2523/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383056/3310_20080529c.patch against trunk revision 661771. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2523/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2523/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2523/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2523/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080529c.patch: fixed findbugs warning.

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080529c.patch: fixed findbugs warning.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12383040/3310_20080529b.patch
          against trunk revision 661462.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 2 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2521/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2521/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2521/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2521/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383040/3310_20080529b.patch against trunk revision 661462. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2521/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2521/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2521/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2521/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          This patch invokes the lease recovery code from the dfs client. It passes all unit tests.

          Show
          dhruba borthakur added a comment - This patch invokes the lease recovery code from the dfs client. It passes all unit tests.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Passed all tests locally. Try hudson.

          Show
          Tsz Wo Nicholas Sze added a comment - Passed all tests locally. Try hudson.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080528c.patch: fixed a problem when the last block is empty.

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080528c.patch: fixed a problem when the last block is empty.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080528b.patch: fixed a bug and it passed all tests in my machine.

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080528b.patch: fixed a bug and it passed all tests in my machine.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080528.patch: passes all tests in TestFileCreation (with a get-around of HADOOP-3453)

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080528.patch: passes all tests in TestFileCreation (with a get-around of HADOOP-3453 )
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080527.patch:

          • FSDataset.updateBlock finds block file in from both volumeMap and ongoingCreates
          • TestFileCreation2 is a temporary test for running testLeaseExpireHardLimit() alone.
          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080527.patch: FSDataset.updateBlock finds block file in from both volumeMap and ongoingCreates TestFileCreation2 is a temporary test for running testLeaseExpireHardLimit() alone.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The failing TestFileCreation may be caused by HADOOP-3453.

          Show
          Tsz Wo Nicholas Sze added a comment - The failing TestFileCreation may be caused by HADOOP-3453 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Dhruba, I will fix TestFileCreation and figure out how to update a tmp file. Thank you for your comments.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Dhruba, I will fix TestFileCreation and figure out how to update a tmp file. Thank you for your comments.
          Hide
          dhruba borthakur added a comment -

          Hi Nicholas, the more I think of this, the more it sounds logical to make FSDataset.updateBlock work correctly if the block is either in the volumeMap or in the ongoingCreates.

          Even when "append" is supported, It makes sense to keep the blocks that are currently being written to in the tmpdir. This ensures that a block report will not report these blocks. It also ensures that the periodic block scanner will not operate on these blocks. It is also an indirect persistence representation of blocks that need recovery if the datanode restarts. Can this be done?

          Show
          dhruba borthakur added a comment - Hi Nicholas, the more I think of this, the more it sounds logical to make FSDataset.updateBlock work correctly if the block is either in the volumeMap or in the ongoingCreates. Even when "append" is supported, It makes sense to keep the blocks that are currently being written to in the tmpdir. This ensures that a block report will not report these blocks. It also ensures that the periodic block scanner will not operate on these blocks. It is also an indirect persistence representation of blocks that need recovery if the datanode restarts. Can this be done?
          Hide
          dhruba borthakur added a comment -

          Hi Nicholas, I took your latest patch and made changes to it so that the same lease recovery code is called from the client. It passes all unit tests except TestFileCreation. Maybe we can use this patch for further development and debugging. Also, pl feel free to make any changes to the code I added.

          Show
          dhruba borthakur added a comment - Hi Nicholas, I took your latest patch and made changes to it so that the same lease recovery code is called from the client. It passes all unit tests except TestFileCreation. Maybe we can use this patch for further development and debugging. Also, pl feel free to make any changes to the code I added.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080523.patch: latest codes but it fails on TestFileCreation.testFileCreationNamenodeRestart()

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080523.patch: latest codes but it fails on TestFileCreation.testFileCreationNamenodeRestart()
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I tried to test the patch for lease expiry. It does not work yet since we still write block to a tmp file first. FSDataset.validateBlockFile() will fail during lease recovery.

          Also, FSDataset.volumeMap should use block id as key, instead of block (which compares both id and generation stamp) since the generation stamp may be not known.

          Show
          Tsz Wo Nicholas Sze added a comment - I tried to test the patch for lease expiry. It does not work yet since we still write block to a tmp file first. FSDataset.validateBlockFile() will fail during lease recovery. Also, FSDataset.volumeMap should use block id as key, instead of block (which compares both id and generation stamp) since the generation stamp may be not known.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080522c.patch: updated javadoc

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080522c.patch: updated javadoc
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080522b.patch:

          • fixed the typo, item 2 and item 3 mentioned above
          • added targets in INodeFileUnderConstruction
          • renew lease even if all targets are not avaliable
          • In FSDataset.updateBlock, unpdate oldblock.generationStamp before using it.
          • In LeaseManager.syncBlock, if successList.isEmpty(), don't commit the block.
          • In FSNamesystem.commitBlockSynchronization, don't write to editLog since the finalizeINodeFileUnderConstruction(...) has already done it.
          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080522b.patch: fixed the typo, item 2 and item 3 mentioned above added targets in INodeFileUnderConstruction renew lease even if all targets are not avaliable In FSDataset.updateBlock, unpdate oldblock.generationStamp before using it. In LeaseManager.syncBlock, if successList.isEmpty(), don't commit the block. In FSNamesystem.commitBlockSynchronization, don't write to editLog since the finalizeINodeFileUnderConstruction(...) has already done it.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > 2. internalReleaseLease invokes lease.renew(). Instead, LeaseManager.removeExpiredLease() should invoke lease.renew(). The reason being that a lease actually corresponds to multiple files.
          >
          > 3. removeExpiredLease is also invoked from startFileInternal. In this case, only one file in the lease should be recovered. The current code recovers all the files in the lease.

          Then, removeExpiredLease is not useful anymore since the uses of it are different in startFileInternal and LeaseManager.Monitor.run(). I will remove it and fix the caller's codes individually.

          Show
          Tsz Wo Nicholas Sze added a comment - > 2. internalReleaseLease invokes lease.renew(). Instead, LeaseManager.removeExpiredLease() should invoke lease.renew(). The reason being that a lease actually corresponds to multiple files. > > 3. removeExpiredLease is also invoked from startFileInternal. In this case, only one file in the lease should be recovered. The current code recovers all the files in the lease. Then, removeExpiredLease is not useful anymore since the uses of it are different in startFileInternal and LeaseManager.Monitor.run(). I will remove it and fix the caller's codes individually.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > I see a compilation error while using the latest patch.

          yeah, there is a typo: TestFileCreation2 => TestFileCreation

          >... these two RPCs should also be available thru the ClientProtocol. ...

          But ClientProtocol is for client-namenode communication. I think we need a new RPC recoverBlock(...) in either ClientProtocol or a new client-datanode protocol.

          Show
          Tsz Wo Nicholas Sze added a comment - > I see a compilation error while using the latest patch. yeah, there is a typo: TestFileCreation2 => TestFileCreation >... these two RPCs should also be available thru the ClientProtocol. ... But ClientProtocol is for client-namenode communication. I think we need a new RPC recoverBlock(...) in either ClientProtocol or a new client-datanode protocol.
          Hide
          dhruba borthakur added a comment -

          Other issues that came to my mind:

          1. I am making changes to the DFDClient. When the DFSClient encounters an error in the pipeline, it eliminates the bad node from the pipeline and needs to stamp all known good replicas with the new generation stamp. The DFSClient will invoke LeaseManager.recoverBlock. This method make a two RPC calls to the namenode : getNextGenerationStamp and commitBlockSynchronization. These two methods are part of the DataodeProtocol. The problem is that when this is invoked by the DFSClient, these two RPCs should also be available thru the ClientProtocol. Can this be arranged?

          2. internalReleaseLease invokes lease.renew(). Instead, LeaseManager.removeExpiredLease() should invoke lease.renew(). The reason being that a lease actually corresponds to multiple files.

          3. removeExpiredLease is also invoked from startFileInternal. In this case, only one file in the lease should be recovered. The current code recovers all the files in the lease.

          Show
          dhruba borthakur added a comment - Other issues that came to my mind: 1. I am making changes to the DFDClient. When the DFSClient encounters an error in the pipeline, it eliminates the bad node from the pipeline and needs to stamp all known good replicas with the new generation stamp. The DFSClient will invoke LeaseManager.recoverBlock. This method make a two RPC calls to the namenode : getNextGenerationStamp and commitBlockSynchronization. These two methods are part of the DataodeProtocol. The problem is that when this is invoked by the DFSClient, these two RPCs should also be available thru the ClientProtocol. Can this be arranged? 2. internalReleaseLease invokes lease.renew(). Instead, LeaseManager.removeExpiredLease() should invoke lease.renew(). The reason being that a lease actually corresponds to multiple files. 3. removeExpiredLease is also invoked from startFileInternal. In this case, only one file in the lease should be recovered. The current code recovers all the files in the lease.
          Hide
          dhruba borthakur added a comment -

          I see a compilation error while using the latest patch.

          javac] /export/home/dhruba/snow/src/test/org/apache/hadoop/dfs/TestFileCreation.java:42: cannot find symbol
          [javac] symbol : class TestFileCreation2
          [javac] location: class org.apache.hadoop.dfs.TestFileCreation
          [javac] static final String DIR = "/" + TestFileCreation2.class.getSimpleName() + "/";

          Show
          dhruba borthakur added a comment - I see a compilation error while using the latest patch. javac] /export/home/dhruba/snow/src/test/org/apache/hadoop/dfs/TestFileCreation.java:42: cannot find symbol [javac] symbol : class TestFileCreation2 [javac] location: class org.apache.hadoop.dfs.TestFileCreation [javac] static final String DIR = "/" + TestFileCreation2.class.getSimpleName() + "/";
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080521.patch: improved javadoc

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080521.patch: improved javadoc
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080520.patch: Thanks, Dhruba.

          • added FSDataset.interruptOngoingCreates which is invoked in FSDataset.updateBlock.
          • The patch passed all unit tests
          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080520.patch: Thanks, Dhruba. added FSDataset.interruptOngoingCreates which is invoked in FSDataset.updateBlock. The patch passed all unit tests
          Hide
          dhruba borthakur added a comment -

          One comment: The primary datanode makes an RPC call to the secondary datanode(s) to stamp the generationStamp for a block. As part of processing this request, the secondary datanode(s) should first terminate any threads that are currently writing to that block before returning "success" to this RPC. The threads that are currently writing to a block can be found in FSDataset.ActiveFile.threads.

          Show
          dhruba borthakur added a comment - One comment: The primary datanode makes an RPC call to the secondary datanode(s) to stamp the generationStamp for a block. As part of processing this request, the secondary datanode(s) should first terminate any threads that are currently writing to that block before returning "success" to this RPC. The threads that are currently writing to a block can be found in FSDataset.ActiveFile.threads.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080519b.patch: added a test and a few append methods in ClientProtocol, NameNode, FSNamesystem for testing. Still need more tests.

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080519b.patch: added a test and a few append methods in ClientProtocol, NameNode, FSNamesystem for testing. Still need more tests.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080519.patch: a completed version for reviewing. Still need more tests.

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080519.patch: a completed version for reviewing. Still need more tests.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          >how to tells whether a block is being updated?

          When updating a block, the meta file is renamed to a tmp file at first. After the update is done, the tmp file will be renamed to the new meta file (with the new generation stamp.) It should work.

          Show
          Tsz Wo Nicholas Sze added a comment - >how to tells whether a block is being updated? When updating a block, the meta file is renamed to a tmp file at first. After the update is done, the tmp file will be renamed to the new meta file (with the new generation stamp.) It should work.
          Hide
          dhruba borthakur added a comment -

          every read request has the blockid and the generation stamp. if the datanode cannot find the block (because the generation stamp has changed), then it will return an exception.

          The current code also behaves as folows: When a client gets an exception, it retries other replicas. If all these replicas fail, then it goes back to the namenode to re-retrieve block locations. Now, it should get the correct generation stamp of the block. Then, the client will retry the read request to the datanode and this one shud succeed.

          do you think that this will work?

          Show
          dhruba borthakur added a comment - every read request has the blockid and the generation stamp. if the datanode cannot find the block (because the generation stamp has changed), then it will return an exception. The current code also behaves as folows: When a client gets an exception, it retries other replicas. If all these replicas fail, then it goes back to the namenode to re-retrieve block locations. Now, it should get the correct generation stamp of the block. Then, the client will retry the read request to the datanode and this one shud succeed. do you think that this will work?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080516c.patch: cleanup some codes

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080516c.patch: cleanup some codes
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080516b.patch: my latest codes

          Question: When updating a block (i.e. updating generation stamp and block length), what happens if a reader tries to read the block? I guess the reader should get an exception. However, how to tells whether a block is being updated?

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080516b.patch: my latest codes Question : When updating a block (i.e. updating generation stamp and block length), what happens if a reader tries to read the block? I guess the reader should get an exception. However, how to tells whether a block is being updated?
          Hide
          dhruba borthakur added a comment -

          The processing of DNA_RECOVERBLOCK would entail making RPCs to other datanode(s), right? This should be done in a thread that is separate from the offerService thread.

          Show
          dhruba borthakur added a comment - The processing of DNA_RECOVERBLOCK would entail making RPCs to other datanode(s), right? This should be done in a thread that is separate from the offerService thread.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Should we start a new thread to recover block in DataNode (i.e. the case DatanodeProtocol.DNA_RECOVERBLOCK)?

          Show
          Tsz Wo Nicholas Sze added a comment - Should we start a new thread to recover block in DataNode (i.e. the case DatanodeProtocol.DNA_RECOVERBLOCK)?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3310_20080514.patch: Implementing the Lease Recovery Algorithm

          Show
          Tsz Wo Nicholas Sze added a comment - 3310_20080514.patch: Implementing the Lease Recovery Algorithm
          Hide
          dhruba borthakur added a comment -

          Now that HADOOP-2656 has been committed, this issue is the next in line that is required for appends. Nicholas: if you are working on this this, feel free to upload a very early version of the patch so that we can review it earlier. Thanks.

          Show
          dhruba borthakur added a comment - Now that HADOOP-2656 has been committed, this issue is the next in line that is required for appends. Nicholas: if you are working on this this, feel free to upload a very early version of the patch so that we can review it earlier. Thanks.
          Hide
          Tsz Wo Nicholas Sze added a comment - - edited

          Lease Recovery Algorithm

            /*
             * 1) Namenode retrieves lease information
             * 2) For each file f in the lease, consider the last block b of f
             * 2.1) Get the datanodes which contains b
             * 2.2) Assign one of the datanodes as the primary datanode p
          
             * 2.3) p obtains a new generation stamp form the namenode
             * 2.4) p get the block info from each datanode
             * 2.5) p computes the minimum block length
             * 2.6) p updates the datanodes, which have a valid generation stamp,
             *      with the new generation stamp and the minimum block length 
             * 2.7) p acknowledges the namenode the update results
          
             * 2.8) Namenode updates the BlockInfo
             * 2.9) Namenode removes f from the lease
             *      and removes the lease once all files have been removed
             * 2.10) Namenode commit changes to edit log
             */
          
          Show
          Tsz Wo Nicholas Sze added a comment - - edited Lease Recovery Algorithm /* * 1) Namenode retrieves lease information * 2) For each file f in the lease, consider the last block b of f * 2.1) Get the datanodes which contains b * 2.2) Assign one of the datanodes as the primary datanode p * 2.3) p obtains a new generation stamp form the namenode * 2.4) p get the block info from each datanode * 2.5) p computes the minimum block length * 2.6) p updates the datanodes, which have a valid generation stamp, * with the new generation stamp and the minimum block length * 2.7) p acknowledges the namenode the update results * 2.8) Namenode updates the BlockInfo * 2.9) Namenode removes f from the lease * and removes the lease once all files have been removed * 2.10) Namenode commit changes to edit log */

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development