Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1606

Provide a stronger data guarantee in the write pipeline

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: datanode, hdfs-client, namenode
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Added two configuration properties, dfs.client.block.write.replace-datanode-on-failure.enable and dfs.client.block.write.replace-datanode-on-failure.policy. Added a new feature to replace datanode on failure in DataTransferProtocol. Added getAdditionalDatanode(..) in ClientProtocol.
      Show
      Added two configuration properties, dfs.client.block.write.replace-datanode-on-failure.enable and dfs.client.block.write.replace-datanode-on-failure.policy. Added a new feature to replace datanode on failure in DataTransferProtocol. Added getAdditionalDatanode(..) in ClientProtocol.

      Description

      In the current design, if there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. Unfortunately, it is possible that DFSClient may incorrectly remove a healthy datanode but leave the failed datanode in the pipeline because failure detection may be inaccurate under erroneous conditions.

      We propose to have a new mechanism for adding new datanodes to the pipeline in order to provide a stronger data guarantee.

      1. h1606_20110210.patch
        7 kB
        Tsz Wo Nicholas Sze
      2. h1606_20110211.patch
        20 kB
        Tsz Wo Nicholas Sze
      3. h1606_20110217.patch
        25 kB
        Tsz Wo Nicholas Sze
      4. h1606_20110228.patch
        34 kB
        Tsz Wo Nicholas Sze
      5. h1606_20110404.patch
        34 kB
        Tsz Wo Nicholas Sze
      6. h1606_20110405.patch
        35 kB
        Tsz Wo Nicholas Sze
      7. h1606_20110405b.patch
        35 kB
        Tsz Wo Nicholas Sze
      8. h1606_20110406.patch
        41 kB
        Tsz Wo Nicholas Sze
      9. h1606_20110406b.patch
        43 kB
        Tsz Wo Nicholas Sze
      10. h1606_20110407.patch
        48 kB
        Tsz Wo Nicholas Sze
      11. h1606_20110407b.patch
        48 kB
        Tsz Wo Nicholas Sze
      12. h1606_20110407c.patch
        50 kB
        Tsz Wo Nicholas Sze
      13. h1606_20110408.patch
        49 kB
        Tsz Wo Nicholas Sze
      14. h1606_20110408b.patch
        52 kB
        Tsz Wo Nicholas Sze

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          Koji Noguchi also has provided a lot of inputs on this. Sorry that I failed to mention it in the acknowledgement.

          Show
          Tsz Wo Nicholas Sze added a comment - Koji Noguchi also has provided a lot of inputs on this. Sorry that I failed to mention it in the acknowledgement .
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Acknowledgement

          This work cannot be done without the helps and many discussions of Hadoop contributors including (in alphabetical order) Dhruba Borthakur, Hairong Kuang, Owen O'Malley, Jitendra Pandey, Sanjay Radia, Sriram Rao, Suresh Srinivas and Kan Zhang. I am simply the programmer who has done the implementation of the great ideas provided by them.

          Show
          Tsz Wo Nicholas Sze added a comment - Acknowledgement This work cannot be done without the helps and many discussions of Hadoop contributors including (in alphabetical order) Dhruba Borthakur, Hairong Kuang, Owen O'Malley, Jitendra Pandey, Sanjay Radia, Sriram Rao, Suresh Srinivas and Kan Zhang. I am simply the programmer who has done the implementation of the great ideas provided by them.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #588 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/588/)
          HDFS-1606. Provide a stronger data guarantee in the write pipeline by adding a new datanode when an existing datanode failed.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #588 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/588/ ) HDFS-1606 . Provide a stronger data guarantee in the write pipeline by adding a new datanode when an existing datanode failed.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks Jitendra for the review.

          I have committed this.

          Show
          Tsz Wo Nicholas Sze added a comment - Thanks Jitendra for the review. I have committed this.
          Hide
          Jitendra Nath Pandey added a comment -

          +1 for the patch.

          Show
          Jitendra Nath Pandey added a comment - +1 for the patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475874/h1606_20110408b.patch
          against trunk revision 1090357.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 31 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/338//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/338//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/338//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475874/h1606_20110408b.patch against trunk revision 1090357. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 31 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/338//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/338//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/338//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks Jitendra for the review.

          h1606_20110408b.patch:

          > 1. The method findNewDatanode should return all new datanodes ...

          Since it is only an internal method but not protocol or public API, we may easily change it later when we add the multiple destination feature.

          > 2. The method addDatanode2ExistingPipeline can be split ...

          I only split actual transfer out. The remaining codes only has 20 lines excluding comments.

          > 3. DataStreamer#hflush : Should we change it to setHflush(boolean val) to clarify its just setting a flag?

          Changed.

          > 4. Does it make sense to add a unit test for default ReplaceDatanodeOnFailure policy?

          Added testDefaultPolicy().

          Show
          Tsz Wo Nicholas Sze added a comment - Thanks Jitendra for the review. h1606_20110408b.patch: > 1. The method findNewDatanode should return all new datanodes ... Since it is only an internal method but not protocol or public API, we may easily change it later when we add the multiple destination feature. > 2. The method addDatanode2ExistingPipeline can be split ... I only split actual transfer out. The remaining codes only has 20 lines excluding comments. > 3. DataStreamer#hflush : Should we change it to setHflush(boolean val) to clarify its just setting a flag? Changed. > 4. Does it make sense to add a unit test for default ReplaceDatanodeOnFailure policy? Added testDefaultPolicy() .
          Hide
          Jitendra Nath Pandey added a comment -

          1. The method findNewDatanode should return all new datanodes in case there are more than one new datanode.

          2. The method addDatanode2ExistingPipeline can be split into following methods
          a) method to check if transfer is needed.
          b) method to get additional datanodes and determine source and destination
          c) method that does actual transfer.

          3. DataStreamer#hflush : Should we change it to setHflush(boolean val) to clarify its just setting a flag?

          4. Does it make sense to add a unit test for default ReplaceDatanodeOnFailure policy?

          Show
          Jitendra Nath Pandey added a comment - 1. The method findNewDatanode should return all new datanodes in case there are more than one new datanode. 2. The method addDatanode2ExistingPipeline can be split into following methods a) method to check if transfer is needed. b) method to get additional datanodes and determine source and destination c) method that does actual transfer. 3. DataStreamer#hflush : Should we change it to setHflush(boolean val) to clarify its just setting a flag? 4. Does it make sense to add a unit test for default ReplaceDatanodeOnFailure policy?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475833/h1606_20110408.patch
          against trunk revision 1090357.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 31 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestDFSShell
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/334//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/334//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/334//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475833/h1606_20110408.patch against trunk revision 1090357. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 31 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/334//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/334//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/334//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I suspect that the failures of TestFiDataTransferProtocol were due to resource not cleaning up completely by MiniDFSCluster. Moved some the tests around in HDFS-1817. See if it works.

          h1606_20110408.patch: updated with trunk

          Show
          Tsz Wo Nicholas Sze added a comment - I suspect that the failures of TestFiDataTransferProtocol were due to resource not cleaning up completely by MiniDFSCluster . Moved some the tests around in HDFS-1817 . See if it works. h1606_20110408.patch: updated with trunk
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475764/h1606_20110407c.patch
          against trunk revision 1087900.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 31 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.cli.TestHDFSCLI
          org.apache.hadoop.hdfs.TestDFSShell
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/329//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/329//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/329//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475764/h1606_20110407c.patch against trunk revision 1087900. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 31 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/329//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/329//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/329//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110407c.patch: fixed some bugs in the fault injection tests.

          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110407c.patch: fixed some bugs in the fault injection tests.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475747/h1606_20110407b.patch
          against trunk revision 1087900.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 25 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.datanode.TestBlockReport

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/327//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/327//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/327//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475747/h1606_20110407b.patch against trunk revision 1087900. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 25 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.datanode.TestBlockReport -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/327//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/327//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/327//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I should mention earlier that I cannot reproduce the failure locally. So I submit to Jenkins multiple times.

          h1606_20110407b.patch: don't add datanode when closing the block.

          Show
          Tsz Wo Nicholas Sze added a comment - I should mention earlier that I cannot reproduce the failure locally. So I submit to Jenkins multiple times. h1606_20110407b.patch: don't add datanode when closing the block.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475735/h1606_20110407.patch
          against trunk revision 1087900.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 25 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.datanode.TestBlockReport

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/325//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/325//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/325//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475735/h1606_20110407.patch against trunk revision 1087900. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 25 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.datanode.TestBlockReport -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/325//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/325//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/325//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110407.patch: delayed opening a output stream.

          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110407.patch: delayed opening a output stream.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Still have some FI tests failed.

          • Too many open files
            2011-04-06 22:27:23,329 WARN  datanode.DataNode (DataXceiverServer.java:run(142)) - DatanodeRegistration(127.0.0.1:41905, storageID=DS-961198735-127.0.1.1-41905-1302128843146, infoPort=35802, ipcPort=59788):DataXceiveServer: java.io.IOException: Too many open files
            	at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
            	at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152)
            	at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
            	at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:135)
            	at java.lang.Thread.run(Thread.java:662)
            
          • It also seems that there are some timeout.
          Show
          Tsz Wo Nicholas Sze added a comment - Still have some FI tests failed. Too many open files 2011-04-06 22:27:23,329 WARN datanode.DataNode (DataXceiverServer.java:run(142)) - DatanodeRegistration(127.0.0.1:41905, storageID=DS-961198735-127.0.1.1-41905-1302128843146, infoPort=35802, ipcPort=59788):DataXceiveServer: java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152) at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84) at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:135) at java.lang.Thread.run(Thread.java:662) It also seems that there are some timeout.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475654/h1606_20110406b.patch
          against trunk revision 1087900.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 22 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestFileConcurrentReader
          org.apache.hadoop.hdfs.TestLargeBlock

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/324//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/324//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/324//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475654/h1606_20110406b.patch against trunk revision 1087900. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 22 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestLargeBlock -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/324//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/324//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/324//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110406b.patch: the fault inject tests may set lastAckedSeqno to -2, a new state which the previous patch does not handle.

          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110406b.patch: the fault inject tests may set lastAckedSeqno to -2, a new state which the previous patch does not handle.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475626/h1606_20110406.patch
          against trunk revision 1087900.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 22 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/321//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/321//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/321//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475626/h1606_20110406.patch against trunk revision 1087900. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 22 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/321//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/321//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/321//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110406.patch: fixed the tests

          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110406.patch: fixed the tests
          Hide
          Tsz Wo Nicholas Sze added a comment -
          • In build #318,
            java.lang.NullPointerException
            	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.access$2500(DFSOutputStream.java:283)
            	at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1470)
            	at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:110)
            	at org.apache.hadoop.hdfs.TestMultiThreadedHflush$1.run(TestMultiThreadedHflush.java:156)
            

            There are some existing synchronization problem in DFSOutputStream. It is possible the call hflush() after close() without getting any error.
            I will simply check null in this patch. Will think about the synchronization problem after that.

          • For the other failed tests, it is simply not enough datanodes so that addDatanode failed.
          Show
          Tsz Wo Nicholas Sze added a comment - In build #318 , java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.access$2500(DFSOutputStream.java:283) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1470) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:110) at org.apache.hadoop.hdfs.TestMultiThreadedHflush$1.run(TestMultiThreadedHflush.java:156) There are some existing synchronization problem in DFSOutputStream . It is possible the call hflush() after close() without getting any error. I will simply check null in this patch. Will think about the synchronization problem after that. For the other failed tests, it is simply not enough datanodes so that addDatanode failed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475554/h1606_20110405b.patch
          against trunk revision 1087900.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.datanode.TestBlockReport
          org.apache.hadoop.hdfs.TestFileAppend2
          org.apache.hadoop.hdfs.TestFileAppend4
          org.apache.hadoop.hdfs.TestLargeBlock
          org.apache.hadoop.hdfs.TestMultiThreadedHflush
          org.apache.hadoop.hdfs.TestWriteConfigurationToDFS

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/319//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/319//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/319//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475554/h1606_20110405b.patch against trunk revision 1087900. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.TestFileAppend4 org.apache.hadoop.hdfs.TestLargeBlock org.apache.hadoop.hdfs.TestMultiThreadedHflush org.apache.hadoop.hdfs.TestWriteConfigurationToDFS -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/319//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/319//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/319//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475554/h1606_20110405b.patch
          against trunk revision 1087900.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestFileAppend2
          org.apache.hadoop.hdfs.TestFileConcurrentReader
          org.apache.hadoop.hdfs.TestMultiThreadedHflush

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/318//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/318//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/318//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475554/h1606_20110405b.patch against trunk revision 1087900. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestMultiThreadedHflush -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/318//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/318//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/318//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110405b.patch: synchronized.

          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110405b.patch: synchronized.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475550/h1606_20110405.patch
          against trunk revision 1087900.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.datanode.TestBlockReport
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/317//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/317//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/317//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475550/h1606_20110405.patch against trunk revision 1087900. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/317//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/317//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/317//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110405.patch: fixed bugs, warnings.

          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110405.patch: fixed bugs, warnings.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475446/h1606_20110404.patch
          against trunk revision 1087900.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.fs.permission.TestStickyBit
          org.apache.hadoop.hdfs.TestFileAppend2
          org.apache.hadoop.hdfs.TestFileAppend3
          org.apache.hadoop.hdfs.TestFileAppend4
          org.apache.hadoop.hdfs.TestLargeBlock
          org.apache.hadoop.hdfs.TestPipelines
          org.apache.hadoop.hdfs.TestReadWhileWriting
          org.apache.hadoop.hdfs.TestWriteConfigurationToDFS

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/316//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/316//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/316//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475446/h1606_20110404.patch against trunk revision 1087900. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.fs.permission.TestStickyBit org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.TestFileAppend3 org.apache.hadoop.hdfs.TestFileAppend4 org.apache.hadoop.hdfs.TestLargeBlock org.apache.hadoop.hdfs.TestPipelines org.apache.hadoop.hdfs.TestReadWhileWriting org.apache.hadoop.hdfs.TestWriteConfigurationToDFS -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/316//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/316//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/316//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110404.patch: updated with trunk and added a unit test.

          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110404.patch: updated with trunk and added a unit test.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110228.patch: Seems to have everything except new tests.

          Will separate step (*) to HDFS-1675 for easing the review.

          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110228.patch: Seems to have everything except new tests. Will separate step (*) to HDFS-1675 for easing the review.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110217.patch: changed DFSClient to get additional datanodes from the NameNode.

          Remaining items:

          • Transfer a RBW from a datanode to another datanode.
          • Add new unit tests.
          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110217.patch: changed DFSClient to get additional datanodes from the NameNode. Remaining items: Transfer a RBW from a datanode to another datanode. Add new unit tests.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110211.patch: add new configuration properties. Still have a long way to go.

          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110211.patch: add new configuration properties. Still have a long way to go.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Below are the proposed new configuration properties.

          <property>
            <name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
            <value>ture</value>
            <description>
              If there is a datanode/network failure in the write pipeline,
              DFSClient will try to remove the failed datanode from the pipeline
              and then continue writing with the remaining datanodes. As a result,
              the number of datanodes in the pipeline is decreased.  The feature is
              to add new datanodes to the pipeline.
          
              This is a site-wise property to enable/disable the feature.
          
              See also dfs.client.block.write.replace-datanode-on-failure.policy
            </description>
          </property>
          
          <property>
            <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
            <value>DEFAULT</value>
            <description>
              This property is used only if the value of
              dfs.client.block.write.replace-datanode-on-failure.enable is true.
          
              ALWAYS: always add a new datanode when an existing datanode is removed.
              
              NEVER: never add a new datanode.
          
              DEFAULT: add a new datanode only if
                       (1) the number of datanodes in the pipeline drops from 2 to 1; or
                       (2) the block is reopened for append.
            </description>
          </property>
          
          Show
          Tsz Wo Nicholas Sze added a comment - Below are the proposed new configuration properties. <property> <name> dfs.client.block.write.replace-datanode-on-failure.enable </name> <value> ture </value> <description> If there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wise property to enable/disable the feature. See also dfs.client.block.write.replace-datanode-on-failure.policy </description> </property> <property> <name> dfs.client.block.write.replace-datanode-on-failure.policy </name> <value> DEFAULT </value> <description> This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. ALWAYS: always add a new datanode when an existing datanode is removed. NEVER: never add a new datanode. DEFAULT: add a new datanode only if (1) the number of datanodes in the pipeline drops from 2 to 1; or (2) the block is reopened for append. </description> </property>
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1606_20110210.patch:

          • Adding ClientProtocol.getAdditionalDatanode(..)
          • NameNode side changes

          Remaining works:

          • implements (*)
          • changes DFSCleint to use the new datanode.
          • adds new tests
          Show
          Tsz Wo Nicholas Sze added a comment - h1606_20110210.patch: Adding ClientProtocol.getAdditionalDatanode(..) NameNode side changes Remaining works: implements (*) changes DFSCleint to use the new datanode. adds new tests
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > 1. Find a datanode D by some means.

          I have checked the code. This is easier than I expect since BlockPlacementPolicy is able to find an additional datanode, given a list of chosen datanodes. The remaining work of this part is to add a new method to ClientProtocol so that DFSClient could use it.

          Show
          Tsz Wo Nicholas Sze added a comment - > 1. Find a datanode D by some means. I have checked the code. This is easier than I expect since BlockPlacementPolicy is able to find an additional datanode, given a list of chosen datanodes. The remaining work of this part is to add a new method to ClientProtocol so that DFSClient could use it.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > In fact, if we can have a system-wide config ...
          Will do.

          Show
          Tsz Wo Nicholas Sze added a comment - > In fact, if we can have a system-wide config ... Will do.
          Hide
          dhruba borthakur added a comment -

          In fact, if we can have a system-wide config on whether to trigger this behaviour or not, that will be great.

          Show
          dhruba borthakur added a comment - In fact, if we can have a system-wide config on whether to trigger this behaviour or not, that will be great.
          Hide
          Tsz Wo Nicholas Sze added a comment -
          When to add a datanode?

          Since adding a datanode to an existing pipeline is an expensive operation (see the previoius comment), it should not be performed for every pipeline failure. Suppose the number of replications of the file is greater than or equal to 3. When a pipeline fails, the operation will be invoked if

          • the number of datanodes in the pipeline drops from 2 to 1; or
          • the block is reopened for append; or
          • it is specified by the user.

          Note that when the number of replications is specified to less than 3, the operation should not be invoked by default because performance is preferred over data guarantee.

          Show
          Tsz Wo Nicholas Sze added a comment - When to add a datanode? Since adding a datanode to an existing pipeline is an expensive operation (see the previoius comment ), it should not be performed for every pipeline failure. Suppose the number of replications of the file is greater than or equal to 3. When a pipeline fails, the operation will be invoked if the number of datanodes in the pipeline drops from 2 to 1; or the block is reopened for append; or it is specified by the user. Note that when the number of replications is specified to less than 3, the operation should not be invoked by default because performance is preferred over data guarantee.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          A straightforward approach is to

          (i) start (*) right after #1 and stall #2 until (*) is done.

          If we feel comfortable, we may

          (ii) start (*) right after #1 in a separated thread and start #2 concurrently. Once #3 is done, join the thread and then combine the old data with the new data before #4.

          Depending on the block size, a partial block may have several hundreds megabytes. So (*) is an expensive operation which may potentially take a long time (in the order of seconds). (ii) has a lower latency but (i) is a simpler solution. How about we have (i) in the first implementation and have (ii) as a future improvement?

          Show
          Tsz Wo Nicholas Sze added a comment - A straightforward approach is to (i) start (*) right after #1 and stall #2 until (*) is done. If we feel comfortable, we may (ii) start (*) right after #1 in a separated thread and start #2 concurrently. Once #3 is done, join the thread and then combine the old data with the new data before #4. Depending on the block size, a partial block may have several hundreds megabytes. So (*) is an expensive operation which may potentially take a long time (in the order of seconds). (ii) has a lower latency but (i) is a simpler solution. How about we have (i) in the first implementation and have (ii) as a future improvement?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Below are some steps for adding a datanode to a pipeline:

          1. Find a datanode D by some means.
          2. Add D to the existing pipeline.
          3. Continue writing.
          4. Close the pipeline.


          Moreover, we have to:
          (*) Transfer the existing data to D.

          The question is where to put (*)?

          Show
          Tsz Wo Nicholas Sze added a comment - Below are some steps for adding a datanode to a pipeline: Find a datanode D by some means. Add D to the existing pipeline. Continue writing. Close the pipeline. Moreover, we have to: (*) Transfer the existing data to D . The question is where to put (*)?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Below are two important use cases:

          • Long Living Pipeline (e.g. HBase logging)

            When a pipeline is short living, the failure probability may be negligible. However, when the client writes very slowly, the failure probability becomes significant.

          • File Append

            When a new file is being written, if all the datanodes in a pipeline fail, then the data written will be lost. Although the behavior is not ideal, it is acceptable since DFSClient will fail to close the file and we allow data loss in a never-closed file. Nevertheless, when a closed file is reopened for append, the last block B of the file is reopened and a pipeline is re-created (provided that the pre-append file size is not a multiple of the block size.) B will not be selected for replication until the pipeline is finished. Then, the pre-append data stored in B may be lost if all the datanodes in the pipeline fail and the subsequent block recovery fails. Such behavior is unacceptable since the pre-append data was stored in a closed file.

          Show
          Tsz Wo Nicholas Sze added a comment - Below are two important use cases: Long Living Pipeline (e.g. HBase logging) When a pipeline is short living, the failure probability may be negligible. However, when the client writes very slowly, the failure probability becomes significant. File Append When a new file is being written, if all the datanodes in a pipeline fail, then the data written will be lost. Although the behavior is not ideal, it is acceptable since DFSClient will fail to close the file and we allow data loss in a never-closed file. Nevertheless, when a closed file is reopened for append, the last block B of the file is reopened and a pipeline is re-created (provided that the pre-append file size is not a multiple of the block size.) B will not be selected for replication until the pipeline is finished. Then, the pre-append data stored in B may be lost if all the datanodes in the pipeline fail and the subsequent block recovery fails. Such behavior is unacceptable since the pre-append data was stored in a closed file.

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development