Hadoop Common
  1. Hadoop Common
  2. HADOOP-1595

Add an option to setReplication method to wait for completion of replication

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.14.0
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently setReplication requested by a client returns immediately, without waiting for completion of replication. There are situations where the client would like to know when the replication is actually done.
      This option should be available in fs shell and libhdfs (see HADOOP-1551).

      1. 1595_20070809.patch
        17 kB
        Tsz Wo Nicholas Sze

        Activity

        Hide
        dhruba borthakur added a comment -

        I just committed this. Thanks Nicholas!

        Show
        dhruba borthakur added a comment - I just committed this. Thanks Nicholas!
        Hide
        Tsz Wo Nicholas Sze added a comment -
        • changed error messages
        Show
        Tsz Wo Nicholas Sze added a comment - changed error messages
        Hide
        dhruba borthakur added a comment -

        bin/hadoop dfs -setrep
        Usage: java FsShell -setrep [-R] [-w] <rep> <path/file>
        Exception in thread "main" java.lang.IllegalArgumentException: Illegal number of arguments
        at org.apache.hadoop.fs.FsShell$CommandFormat.parse(FsShell.java:341)
        at org.apache.hadoop.fs.FsShell.setReplication(FsShell.java:358)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:1329)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1427)

        The user should not see the exception message.

        Show
        dhruba borthakur added a comment - bin/hadoop dfs -setrep Usage: java FsShell -setrep [-R] [-w] <rep> <path/file> Exception in thread "main" java.lang.IllegalArgumentException: Illegal number of arguments at org.apache.hadoop.fs.FsShell$CommandFormat.parse(FsShell.java:341) at org.apache.hadoop.fs.FsShell.setReplication(FsShell.java:358) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1329) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1427) The user should not see the exception message.
        Hide
        Hadoop QA added a comment -

        -1, build or testing failed

        2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12363449/1595_20070808.patch against trunk revision r564012.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/532/testReport/
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/532/console

        Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Show
        Hadoop QA added a comment - -1, build or testing failed 2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12363449/1595_20070808.patch against trunk revision r564012. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/532/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/532/console Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
        Hide
        Tsz Wo Nicholas Sze added a comment -
        • the waiting time (if -w is set) for decreasing replication may take a long time since default dfs.blockreport.intervalMsec is 1hour.
        • set dfs.blockreport.intervalMsec to 1 second in the tests.
        • added a warning message if -w is set for decreasing replication.
        Show
        Tsz Wo Nicholas Sze added a comment - the waiting time (if -w is set) for decreasing replication may take a long time since default dfs.blockreport.intervalMsec is 1hour. set dfs.blockreport.intervalMsec to 1 second in the tests. added a warning message if -w is set for decreasing replication.
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - +0, new Findbugs warnings http://issues.apache.org/jira/secure/attachment/12363291/1595_20070806.patch applied and successfully tested against trunk revision r563577, but there appear to be new Findbugs warnings introduced by this patch. New Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/521/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/521/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/521/console
        Hide
        Tsz Wo Nicholas Sze added a comment -
        • it has passed all test in my machine.
        Show
        Tsz Wo Nicholas Sze added a comment - it has passed all test in my machine.
        Hide
        Tsz Wo Nicholas Sze added a comment -
        • renamed the test directory name in the junit test
        Show
        Tsz Wo Nicholas Sze added a comment - renamed the test directory name in the junit test
        Hide
        Hadoop QA added a comment -

        -1, build or testing failed

        2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12363144/1595_20070803.patch against trunk revision r562294.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/513/testReport/
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/513/console

        Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Show
        Hadoop QA added a comment - -1, build or testing failed 2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12363144/1595_20070803.patch against trunk revision r562294. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/513/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/513/console Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I was wrong: it failed on my tests last few times. The reason is I have a method name beginning with "test" but actually not a junit test. Then, it failed on that test.

        • take setrep test out from TestDFSShell and created TestDFSShellSetrep
        • rename a helper method in TestDFSShellSetrep
        Show
        Tsz Wo Nicholas Sze added a comment - I was wrong: it failed on my tests last few times. The reason is I have a method name beginning with "test" but actually not a junit test. Then, it failed on that test. take setrep test out from TestDFSShell and created TestDFSShellSetrep rename a helper method in TestDFSShellSetrep
        Hide
        Hadoop QA added a comment -

        -1, build or testing failed

        2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12363072/1595_20070802.patch against trunk revision r562041.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/508/testReport/
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/508/console

        Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Show
        Hadoop QA added a comment - -1, build or testing failed 2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12363072/1595_20070802.patch against trunk revision r562041. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/508/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/508/console Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Not sure why it failed on un-related tests. Remaked patch with current trunk.

        Show
        Tsz Wo Nicholas Sze added a comment - Not sure why it failed on un-related tests. Remaked patch with current trunk.
        Hide
        Hadoop QA added a comment -

        -1, build or testing failed

        2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362936/1595_20070731.patch against trunk revision r561935.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/504/testReport/
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/504/console

        Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Show
        Hadoop QA added a comment - -1, build or testing failed 2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362936/1595_20070731.patch against trunk revision r561935. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/504/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/504/console Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
        Hide
        Hadoop QA added a comment -

        -1, build or testing failed

        2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362936/1595_20070731.patch against trunk revision r561603.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/501/testReport/
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/501/console

        Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Show
        Hadoop QA added a comment - -1, build or testing failed 2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362936/1595_20070731.patch against trunk revision r561603. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/501/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/501/console Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
        Hide
        Hadoop QA added a comment -

        -1, build or testing failed

        2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362936/1595_20070731.patch against trunk revision r561603.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/495/testReport/
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/495/console

        Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Show
        Hadoop QA added a comment - -1, build or testing failed 2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362936/1595_20070731.patch against trunk revision r561603. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/495/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/495/console Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
        Hide
        Hadoop QA added a comment -

        -1, build or testing failed

        2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362936/1595_20070731.patch against trunk revision r561603.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/493/testReport/
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/493/console

        Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Show
        Hadoop QA added a comment - -1, build or testing failed 2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362936/1595_20070731.patch against trunk revision r561603. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/493/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/493/console Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Same logic, updated with trunk, added a test for decreasing setrep

        Show
        Tsz Wo Nicholas Sze added a comment - Same logic, updated with trunk, added a test for decreasing setrep
        Hide
        dhruba borthakur added a comment -

        Please add testcase to test the case when replication factor is decreased.

        Show
        dhruba borthakur added a comment - Please add testcase to test the case when replication factor is decreased.
        Hide
        dhruba borthakur added a comment -

        It would be nice if we can get a unit test that tests the scenario when the replication factor is decreased.

        Show
        dhruba borthakur added a comment - It would be nice if we can get a unit test that tests the scenario when the replication factor is decreased.
        Show
        Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12362177/1595_20070717b.patch applied and successfully tested against trunk revision r557790. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/441/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/441/console
        Hide
        Tsz Wo Nicholas Sze added a comment - - edited

        Re-test the same patch since the unit test was not stable. It failed on some un-related tests.

        Show
        Tsz Wo Nicholas Sze added a comment - - edited Re-test the same patch since the unit test was not stable. It failed on some un-related tests.
        Hide
        Hadoop QA added a comment -

        -1, build or testing failed

        2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362020/1595_20070717b.patch against trunk revision r557050.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/430/testReport/
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/430/console

        Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Show
        Hadoop QA added a comment - -1, build or testing failed 2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12362020/1595_20070717b.patch against trunk revision r557050. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/430/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/430/console Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
        Hide
        Hairong Kuang added a comment -

        +1 Code looks good.

        Show
        Hairong Kuang added a comment - +1 Code looks good.
        Hide
        Tsz Wo Nicholas Sze added a comment -
        • add an explicit statement to set default replication to 3 in the junit test
        Show
        Tsz Wo Nicholas Sze added a comment - add an explicit statement to set default replication to 3 in the junit test
        Hide
        Tsz Wo Nicholas Sze added a comment - - edited
        • Thank you for your comment, Hairong. I have changed the code, so that it waits file by file instead of block by block in this patch.
        • Also thank you for your comment, Koji. By fsck, I think you mean NamenodeFsck, which only has two public methods. One of them is fsck() which check the whole file system. The class does not provide a way to check a single file, which can be easily done by FileSystem.getFileCacheHints.
        • The other public method in NamenodeFsck is run() which is empty. I have removed it in this patch.
        Show
        Tsz Wo Nicholas Sze added a comment - - edited Thank you for your comment, Hairong. I have changed the code, so that it waits file by file instead of block by block in this patch. Also thank you for your comment, Koji. By fsck, I think you mean NamenodeFsck, which only has two public methods. One of them is fsck() which check the whole file system. The class does not provide a way to check a single file, which can be easily done by FileSystem.getFileCacheHints. The other public method in NamenodeFsck is run() which is empty. I have removed it in this patch.
        Hide
        Koji Noguchi added a comment -

        I'm not familiar with hadoop codes, but for this problem can we use 'fsck' and check if the file is under-replicated?

        If the logic is the same, maybe we can have shared utility that both could share.

        Show
        Koji Noguchi added a comment - I'm not familiar with hadoop codes, but for this problem can we use 'fsck' and check if the file is under-replicated? If the logic is the same, maybe we can have shared utility that both could share.
        Hide
        Hairong Kuang added a comment -

        The code looks good. The only thing that I am concerned about is its efficiency. The code checks if a block is fully replicated by asking the block's locations from a namenode. This requires a request to the namenode for each block. I think it would be more efficient if we get all blocks' locations of a file in one request.

        Show
        Hairong Kuang added a comment - The code looks good. The only thing that I am concerned about is its efficiency. The code checks if a block is fully replicated by asking the block's locations from a namenode. This requires a request to the namenode for each block. I think it would be more efficient if we get all blocks' locations of a file in one request.
        Hide
        Tsz Wo Nicholas Sze added a comment -
        • Hairong pointed out that it is more correct to use csfs.getRawFileSystem() instead of srcFS in FsShell.copyToLocal. I have made this change.
        Show
        Tsz Wo Nicholas Sze added a comment - Hairong pointed out that it is more correct to use csfs.getRawFileSystem() instead of srcFS in FsShell.copyToLocal. I have made this change.
        Hide
        Tsz Wo Nicholas Sze added a comment -
        • added a "-w" option in the "fs -setrep" command so that if "-w" is set, the shell will not return until the new replication level have achieved.
        • added a junit test for this new feature
        Show
        Tsz Wo Nicholas Sze added a comment - added a "-w" option in the "fs -setrep" command so that if "-w" is set, the shell will not return until the new replication level have achieved. added a junit test for this new feature

          People

          • Assignee:
            Tsz Wo Nicholas Sze
            Reporter:
            Christian Kunz
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development