Hadoop Common
  1. Hadoop Common
  2. HADOOP-2345

new transactions to support HDFS Appends

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.17.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change
    • Release Note:
      Introduce new namenode transactions to support appending to HDFS files.

      Description

      This JIRA adresses the changes needed to the transaction mechanism to support appending data to existing HDFS files. The details of this design is documented in HADOOP-1700.

      1. AppendTransactions.patch
        45 kB
        dhruba borthakur
      2. AppendTransactions2.patch
        46 kB
        dhruba borthakur
      3. AppendTransactions3.patch
        46 kB
        dhruba borthakur
      4. AppendTransactions4.patch
        48 kB
        dhruba borthakur
      5. AppendTransactions5.patch
        48 kB
        dhruba borthakur

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          This patch implements the following two functionality:

          1. A client can continue to write to a DFSOutputStream even if the namenode restarts! This patch persist leases in the transaction log. Persisting leases is a pre-requisite for correct error recovery for the HDFS file-append feature.

          2. A new method NameNode.fsync() that lets a client persist metadata for a file into transaction log.

          Show
          dhruba borthakur added a comment - This patch implements the following two functionality: 1. A client can continue to write to a DFSOutputStream even if the namenode restarts! This patch persist leases in the transaction log. Persisting leases is a pre-requisite for correct error recovery for the HDFS file-append feature. 2. A new method NameNode.fsync() that lets a client persist metadata for a file into transaction log.
          Hide
          dhruba borthakur added a comment -

          Changed names of Transactions from Openlease/CloseLease to Open/Close. Removed DatanodeDescriptors from fsimage.

          Show
          dhruba borthakur added a comment - Changed names of Transactions from Openlease/CloseLease to Open/Close. Removed DatanodeDescriptors from fsimage.
          Hide
          Konstantin Shvachko added a comment - - edited
          1. ClientProtocol.versionID should be incremented.
          2. NameNode.fsync() should use @inheritDoc.
          3. FSDirectory.delete(String src, INode[] old) needs documentation on the parameter "old".
          4. Since you return only one INode I'd change prototype of unprotectedDelete
            Block[] unprotectedDelete(String src, long modificationTime, INode[] old)
            

            to

            INode unprotectedDelete(String src, long modificationTime, Collection<Block> deletedBlocks)
            

            Replacing Block[] to Collection<Block> will probably be required, since we do not know in
            advance how many blocks will be returned. It's a good thing anyway, since collections
            are more flexible than arrays.

          5. getFileINode() I am not sure about the consequences of removing waitForReady() from it.
            I understand you need to look for files under construction before the entire image is loaded,
            but doesn't it break behavior of other methods. E.g. one will be able to call
            getBlockLocations() or startFile() before the image is loaded.
          6. OP_ADD sounds good to me. It can mean adding a file or adding a block.
            OP_OPEN = 9;    // create for write
            

            sounds controversial, doesn't it?
            So may be you should retain OP_ADD in favor replacing it with OP_OPEN.

          7. OP_GENSTAMP: I'd propose OP_SET_GENERATION instead.
          8. You use different comparisons, which are equivalent but confusing in detecting the version they refer to.
            logVersion <= -12
            logVersion < -11
            

            I'd just use logVersion <= -12 everywhere.

          9. Why generation stamp needs to be serialized as a one element array?
            Arrays are appropriate if several members need to be serialized at once.
            In the case you can simply in.readLong() / writeLong().
            The same as in FSImage.
          10. Do we want to introduce something like
            ClientID {
              protected StringBytesWritable clientName;         // lease holder
              protected StringBytesWritable clientMachine;
            }
            

            Is there a use case for this?

          11. FSImage.loadFSImage(File)
            You want to "read in the last generation stamp" after imgVersion and numFile are read and sorted out.
            Somewhere after
            this.layoutVersion = imgVersion;
            

            Otherwise loading of pre-version images will fail. It's unlikely somebody will need that, but still.

          12. FSNamesystem.generationStamp: Could you please use JavaDoc-style comments for the description of this member.
          13. FSDirectory.closeLease() is confusing because it has nothing to do with leases.
            I was first wondering why it is a part of FSDirectory while other lease methods are in FSNamesystem,
            but then realized that it is just closing the file. Please rename, and also the comments to the calls
            of closeLease() say "persist block allocations for this file" which does not reflect the actual action
            any more.
          14. GenerationStamp may implement WritableComparable rather than Writable, Comparable
          15. I like that serialization code for INodeUnderConstruction is separated from the loadFSImage mainstream
            code. I am not so sure it should implement Writable because this class is not supposed to be a part of
            any communications. Alternatively we could define methods
            FSImage.readINodeUnderConstruction(in, INodeUnderConstruction)
            FSImage.writeINodeUnderConstruction(out, INodeUnderConstruction)
            

            or

            static INodeUnderConstruction.readINode(in, INodeUnderConstruction)
            static INodeUnderConstruction.writeINode(out, INodeUnderConstruction)
            
          Show
          Konstantin Shvachko added a comment - - edited ClientProtocol.versionID should be incremented. NameNode.fsync() should use @inheritDoc. FSDirectory.delete(String src, INode[] old) needs documentation on the parameter "old". Since you return only one INode I'd change prototype of unprotectedDelete Block[] unprotectedDelete( String src, long modificationTime, INode[] old) to INode unprotectedDelete( String src, long modificationTime, Collection<Block> deletedBlocks) Replacing Block[] to Collection<Block> will probably be required, since we do not know in advance how many blocks will be returned. It's a good thing anyway, since collections are more flexible than arrays. getFileINode() I am not sure about the consequences of removing waitForReady() from it. I understand you need to look for files under construction before the entire image is loaded, but doesn't it break behavior of other methods. E.g. one will be able to call getBlockLocations() or startFile() before the image is loaded. OP_ADD sounds good to me. It can mean adding a file or adding a block. OP_OPEN = 9; // create for write sounds controversial, doesn't it? So may be you should retain OP_ADD in favor replacing it with OP_OPEN. OP_GENSTAMP: I'd propose OP_SET_GENERATION instead. You use different comparisons, which are equivalent but confusing in detecting the version they refer to. logVersion <= -12 logVersion < -11 I'd just use logVersion <= -12 everywhere. Why generation stamp needs to be serialized as a one element array? Arrays are appropriate if several members need to be serialized at once. In the case you can simply in.readLong() / writeLong(). The same as in FSImage. Do we want to introduce something like ClientID { protected StringBytesWritable clientName; // lease holder protected StringBytesWritable clientMachine; } Is there a use case for this? FSImage.loadFSImage(File) You want to "read in the last generation stamp" after imgVersion and numFile are read and sorted out. Somewhere after this .layoutVersion = imgVersion; Otherwise loading of pre-version images will fail. It's unlikely somebody will need that, but still. FSNamesystem.generationStamp: Could you please use JavaDoc-style comments for the description of this member. FSDirectory.closeLease() is confusing because it has nothing to do with leases. I was first wondering why it is a part of FSDirectory while other lease methods are in FSNamesystem, but then realized that it is just closing the file. Please rename, and also the comments to the calls of closeLease() say "persist block allocations for this file" which does not reflect the actual action any more. GenerationStamp may implement WritableComparable rather than Writable, Comparable I like that serialization code for INodeUnderConstruction is separated from the loadFSImage mainstream code. I am not so sure it should implement Writable because this class is not supposed to be a part of any communications. Alternatively we could define methods FSImage.readINodeUnderConstruction(in, INodeUnderConstruction) FSImage.writeINodeUnderConstruction(out, INodeUnderConstruction) or static INodeUnderConstruction.readINode(in, INodeUnderConstruction) static INodeUnderConstruction.writeINode(out, INodeUnderConstruction)
          Hide
          dhruba borthakur added a comment -

          Incorporated all of Konstantin's comments except 10 and 15.
          I like the serialization code in INodeUnderConstruction, especially because it writes the serialized information to the transaction log.

          Show
          dhruba borthakur added a comment - Incorporated all of Konstantin's comments except 10 and 15. I like the serialization code in INodeUnderConstruction, especially because it writes the serialized information to the transaction log.
          Hide
          dhruba borthakur added a comment -

          Submitting for HadoopQA tests (pending final round of review from Konstantin).

          Show
          dhruba borthakur added a comment - Submitting for HadoopQA tests (pending final round of review from Konstantin).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12375208/AppendTransactions3.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 6 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac -1. The applied patch generated 633 javac compiler warnings (more than the trunk's current 612 warnings).

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1769/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1769/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1769/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1769/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12375208/AppendTransactions3.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac -1. The applied patch generated 633 javac compiler warnings (more than the trunk's current 612 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1769/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1769/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1769/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1769/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          Incorporated #15 from Konstantin's list of review comments.

          Show
          dhruba borthakur added a comment - Incorporated #15 from Konstantin's list of review comments.
          Hide
          dhruba borthakur added a comment -

          This patch will have a few javac compiler warnings because it uses UTF8 to serialize leases to the transaction log.

          Show
          dhruba borthakur added a comment - This patch will have a few javac compiler warnings because it uses UTF8 to serialize leases to the transaction log.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12375292/AppendTransactions4.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 6 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac -1. The applied patch generated 629 javac compiler warnings (more than the trunk's current 608 warnings).

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1779/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1779/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1779/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1779/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12375292/AppendTransactions4.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac -1. The applied patch generated 629 javac compiler warnings (more than the trunk's current 608 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1779/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1779/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1779/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1779/console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -
          1. FSEditLog.loadFSEdits(). This is what meant in (9):
            case OP_SET_GENSTAMP: {
              long generation = in.readLong();
              fsDir.namesystem.setGenerationStamp(generation);
            
          2. FSNamesystem.generationStamp. In (12) I meant
            /** 
             * The global generation stamp for this file system. Valid values start from 1000.
             */
            private GenerationStamp generationStamp = new GenerationStamp(1000);
            
          3. I like the new prototype of FSDirectory.delete().
            I suggest we simplify the loop statement in FSImage.deleteInternal()
            INode old = dir.delete(src, deletedBlocks);
            if (old == null)
              return false;
            for (Block b : deletedBlocks) {
              .........
            }
            
          Show
          Konstantin Shvachko added a comment - FSEditLog.loadFSEdits(). This is what meant in (9): case OP_SET_GENSTAMP: { long generation = in.readLong(); fsDir.namesystem.setGenerationStamp(generation); FSNamesystem.generationStamp. In (12) I meant /** * The global generation stamp for this file system. Valid values start from 1000. */ private GenerationStamp generationStamp = new GenerationStamp(1000); I like the new prototype of FSDirectory.delete(). I suggest we simplify the loop statement in FSImage.deleteInternal() INode old = dir.delete(src, deletedBlocks); if (old == null ) return false ; for (Block b : deletedBlocks) { ......... }
          Hide
          dhruba borthakur added a comment -

          FSEditLog.OP_SET_GENERATIONSTAMP has to use writables because the transactions are written using FSEditLog.logEdit(). This method allows writables to be written to the edit log.

          I made changes (2) and (3) described above. Thanks Konstantin for your review comments.

          Show
          dhruba borthakur added a comment - FSEditLog.OP_SET_GENERATIONSTAMP has to use writables because the transactions are written using FSEditLog.logEdit(). This method allows writables to be written to the edit log. I made changes (2) and (3) described above. Thanks Konstantin for your review comments.
          Hide
          Konstantin Shvachko added a comment -

          +1. This looks good.

          Show
          Konstantin Shvachko added a comment - +1. This looks good.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12375533/AppendTransactions5.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 6 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac -1. The applied patch generated 624 javac compiler warnings (more than the trunk's current 603 warnings).

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1793/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1793/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1793/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1793/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12375533/AppendTransactions5.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac -1. The applied patch generated 624 javac compiler warnings (more than the trunk's current 603 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1793/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1793/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1793/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1793/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          I just committed this.

          Show
          dhruba borthakur added a comment - I just committed this.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #400 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/400/ )
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #412 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/412/ )
          Hide
          Robert Chansler added a comment -

          Noted as incompatible in changes.txt

          Show
          Robert Chansler added a comment - Noted as incompatible in changes.txt

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development