Issue Details (XML | Word | Printable)

Key: HADOOP-2655
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: dhruba borthakur
Reporter: dhruba borthakur
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Copy on write for data and metadata files in the presence of snapshots

Created: 18/Jan/08 05:50 PM   Updated: 08/Jul/09 04:42 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works copyOnWrite.patch 2008-02-14 05:42 AM dhruba borthakur 27 kB
Text File Licensed for inclusion in ASF works copyOnWrite.patch 2008-02-11 08:57 AM dhruba borthakur 27 kB
Text File Licensed for inclusion in ASF works copyOnWrite2.patch 2008-02-21 01:10 AM dhruba borthakur 25 kB
Text File Licensed for inclusion in ASF works copyOnWrite3.patch 2008-02-21 09:20 PM dhruba borthakur 27 kB
Issue Links:
Blocker
 

Resolution Date: 25/Feb/08 10:21 PM


 Description  « Hide
If a DFS Client wants to append data to an existing file (appends, HADOOP-1700) and a snapshot is present, the Datanoed has to implement some form of a copy-on-write for writes to data and meta data files.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
dhruba borthakur added a comment - 03/Feb/08 09:54 AM

dhruba borthakur added a comment - 05/Feb/08 01:10 AM
The datanode has a new directory per volume called "detachDir". This directory is used to do temporary copy-on-write for data blocks that are part of a snapshot.

When a client writes a block that is linked to a snapshot, it does the following:

1. Create a copy of the original file into a file in the detachDir.
2. Rename the newly created file in detachDir into the original file. This breaks the hardlink and creates two copies of the block atomically.

Point 2 works perfectly on Linux platform. The following are some caveats on Windows platform.

On Windows platform, the rename fails because the target file already exists. Thus, the code issues a delete followed by a rename. This means that there is a window of opportunity (on Windows) when the block does not exist in the right place. If a read request for the block occurs precisely in that window, then the client will get an exception and will try to read that block from an alternate location. (When a datanode restarts, it recovers blocks that are exist in detachDir but do not exist in the original data directory.) I am proposing that this is an acceptable solution.


Konstantin Shvachko added a comment - 07/Feb/08 10:41 PM
  • I think it is ok to have a short window during "unlinking" a block, when it is not visible to the data-node.
    I believe it should be a pretty rare event.
  • Relying on mod time is not such a good idea. It's not the time itself, but the way you will have to find another hard link.
    You will need rely on that the other link to the blockFile is in ".../previous/blockFile".
    And if an administrator renamed previous directory in order to preserve the snapshot for longer time storage then
    your approach will not recognize that the blockFile has another link.
  • We should probably use stat instead
    stat -c%h blockFile

    I checked it works both on Linux and CygWin.


dhruba borthakur added a comment - 11/Feb/08 08:57 AM
This patch does a copy-on-write for blocks that need modification but were linked to an existing snapshot. As part of this patch, the two maps in the datanode (volumeMap and blockMap) are combined into one single map.

dhruba borthakur added a comment - 14/Feb/08 05:42 AM
Merged patch with latest trunk.

Konstantin Shvachko added a comment - 20/Feb/08 10:24 PM
  1. DatanodeBlockInfo.getVol() rename to getVolume().
  2. org.apache.hadoop.fs.* is imported twice.
  3. FSDataset.detachFile(v, f, b)
    • It is better to use /** JavaDoc comments */
    • I would place this method into DatanodeBlockInfo.detachFile(b)
  4. FSDataset.detachBlock().
    • Should @inheritDoc from FSDatasetInterface.
    • When do we need numLinks parameter?
    • The whole part below the comment starting with
      // If this block does not have a linked snapshot,

      should be a method of DatanodeBlockInfo.detachBlock()

  5. In general I'd reorganize methods and place most of the functionality related to copy-on-write
    into DatanodeBlockInfo rather than exposing it in the high level class like FSDataset.
  6. I have a doubt that detachBlock() should be a public interface of FSDatasetInterface.
    My understanding is that detachBlock() is intended to be used in append(), and is going to be specific
    for real data-nodes, because simulated data-nodes do not have any data that require to be copied on write.
    I think it should be just a method of FSDataset and TestAppend should directly call it because
    it is testing the real data-nodes rather than the simulated ones.
  7. FSVolume.createTmpFile()
    • should probably be a static method.
    • There is an old System.out in it. Could you please remove it.
    • The whole try catch statement seams to be redundant here.
  8. FSDataset.replaceFile().
    • We are using the same code for renames because of the Windows semantics.
      And this is yet another variant of that code. I think we should introduce
      a FSUtil.replaceFile() method and call it where appropriate.
    • Are you sure we should wait 5 seconds before failing to replace?
      Do we expect something to change during that period?
  9. HardLink.getLinkCount -> getLinkCountCommand;
    and you can set it once outside the switch{} because it is the same for both OSs.
  10. TestFileAppend has 3 warnings:
    • import org.apache.hadoop.dfs.FSConstants.DatanodeReportType;
    • private void checkFile() is never used.
    • long len = fs.getFileStatus(file1).getLen(); is never used.
    • We should systematically use /** create file ... */ instead of // create file ... for method title comments.

dhruba borthakur added a comment - 21/Feb/08 01:10 AM
1. Moved most of detachBlock to DatanodeBlockInfo.java.
2. It is not a public interface anymore. Removed it from FSDatasetInterface.java
3. numLinks is needed to drive the unit tests. It is also needed to make the method handle the case when there are possibly multiple snapshots in the future.
4. I did not make createDetachFile() static, especially because it refers to the detachDir variable.
5. Moved replaceFile to FileUtil. It is intended to handle OS specific cases when the rename will fail if somebody else has a handle to the target file. The current setting of the retry time-limit is adhoc. During this period, it is likely that another thread that was reading that block file will be done reading it.

dhruba borthakur added a comment - 21/Feb/08 01:11 AM
Triggering Hudson tests.

Hadoop QA added a comment - 21/Feb/08 02:20 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376078/copyOnWrite2.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 7 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs -1. The patch appears to introduce 3 new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1822/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1822/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1822/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1822/console

This message is automatically generated.


dhruba borthakur added a comment - 21/Feb/08 09:19 PM
Findbugs warnings.

dhruba borthakur added a comment - 21/Feb/08 09:20 PM
The "stat" command is not available on solaris. Use "ls -l" on solaris.
Fixed findbugs warnings.

Hadoop QA added a comment - 21/Feb/08 11:10 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376169/copyOnWrite3.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 7 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1831/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1831/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1831/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1831/console

This message is automatically generated.


Konstantin Shvachko added a comment - 25/Feb/08 09:17 PM - edited
  • FSDataset.detachBlock()
    Variables "file" and "vol" are not used anymore.
  • TestFileAppend: redundant import DSDataImputStream
  • +1 other than that.

dhruba borthakur added a comment - 25/Feb/08 10:21 PM
I just committed this. Thanks Konstantin.

Hudson added a comment - 26/Feb/08 12:04 PM