Hadoop Common
  1. Hadoop Common
  2. HADOOP-10560

Update NativeS3FileSystem to issue copy commands for files with in a directory with a configurable number of threads

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
    • Target Version/s:

      Description

      In NativeS3FileSystem if you do a copy of a directory it will copy all the files to the new location, but it will do this with one thread. Code is below. This jira will allow a configurable number of threads to be used to issue the copy commands to S3.
      do {
      PartialListing listing = store.list(srcKey, S3_MAX_LISTING_LENGTH, priorLastKey, true);
      for (FileMetadata file : listing.getFiles())

      { keysToDelete.add(file.getKey()); store.copy(file.getKey(), dstKey + file.getKey().substring(srcKey.length())); }

      priorLastKey = listing.getPriorLastKey();
      } while (priorLastKey != null);

      1. HADOOP-10560.patch
        3 kB
        Ted Malaska
      2. HADOOP-10560-1.patch
        6 kB
        Ted Malaska

        Activity

        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12643032/HADOOP-10560-1.patch
        against trunk revision 6f9fe76.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5503//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643032/HADOOP-10560-1.patch against trunk revision 6f9fe76. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5503//console This message is automatically generated.
        Hide
        Steve Loughran added a comment -

        the s3a FS is doing some form of thread pooling in 2.7+. Is this sufficient, or do people think we should have something similar in s3n.

        Given s3a is a replacement for s3n; I'd be tempted to say "leave s3n alone"

        Show
        Steve Loughran added a comment - the s3a FS is doing some form of thread pooling in 2.7+. Is this sufficient, or do people think we should have something similar in s3n. Given s3a is a replacement for s3n; I'd be tempted to say "leave s3n alone"
        Hide
        Andrei Savu added a comment -

        ... and also when fs.s3n.copyThreads = 1 we can use MoreExecutors.sameThreadExecutor() if Guava is available as a library.

        Show
        Andrei Savu added a comment - ... and also when fs.s3n.copyThreads = 1 we can use MoreExecutors.sameThreadExecutor() if Guava is available as a library.
        Hide
        Andrei Savu added a comment -

        I had quick look over the patch. Some improvements ideas:

        • that property should be named fs.s3n.copyThreads to be consistent with other s3n related configs
        • you need to shutdown() s3CopyExecutor in a finally {} block to avoid leaking threads
        • set recognizable thread names to make it easy to debug when stuck for whatever reason (see http://stackoverflow.com/questions/6113746/naming-threads-and-thread-pools-of-executorservice)
        • fix typo in config description - directy - you should probably also add a note on why this is necessary as an workaround
        • nice to have: a better test that actually tests concurrency and not only correctness

        Steve Loughran just curious - is there a better way to get a handle on a Hadoop managed ExecutorService?

        Show
        Andrei Savu added a comment - I had quick look over the patch. Some improvements ideas: that property should be named fs.s3n.copyThreads to be consistent with other s3n related configs you need to shutdown() s3CopyExecutor in a finally {} block to avoid leaking threads set recognizable thread names to make it easy to debug when stuck for whatever reason (see http://stackoverflow.com/questions/6113746/naming-threads-and-thread-pools-of-executorservice ) fix typo in config description - directy - you should probably also add a note on why this is necessary as an workaround nice to have: a better test that actually tests concurrency and not only correctness Steve Loughran just curious - is there a better way to get a handle on a Hadoop managed ExecutorService?
        Hide
        Ted Malaska added a comment -

        Hey Steve,

        I hope your weekend went well. I was thinking about you initial comments on the point of InterruptedIOException and the root cause logic.

        Let me know if you would still want the InterruptedIOException. If you do I will make a new patch today.

        Also let me know if you want me to take the extra step and find the root cause before the runtime exception in comment 7. Also if you do want that I will update the patch today.

        Thanks again for helping me through this jira.

        Ted Malaska

        Show
        Ted Malaska added a comment - Hey Steve, I hope your weekend went well. I was thinking about you initial comments on the point of InterruptedIOException and the root cause logic. Let me know if you would still want the InterruptedIOException. If you do I will make a new patch today. Also let me know if you want me to take the extra step and find the root cause before the runtime exception in comment 7. Also if you do want that I will update the patch today. Thanks again for helping me through this jira. Ted Malaska
        Hide
        Ted Malaska added a comment -

        new patch ready for review

        Show
        Ted Malaska added a comment - new patch ready for review
        Hide
        Ted Malaska added a comment -

        new patch ready

        Show
        Ted Malaska added a comment - new patch ready
        Hide
        Ted Malaska added a comment -

        Changes based on comments:

        1. Done, renamed to s3native.max-copy-threads
        2. Done, set to 1
        3. Done
        4. Done, added to s3Native section
        5. Unsure how to do this one
        6. Done, used S3Exception. Could not use InterruptedIOException because it does not take a root cause in it's constructor.
        7. Done, I added a getCause statement. This will ether return a Runtime exception created by the store or it will get my runtime exception which gives context to the error by letting the user know that the exception happen during a copy from src to dst
        8. This is out of scope of this jira. If desired we can create a new jira for this one

        Additional work:
        1. Fixed find bugs issue by making runnable class a inner static class
        2. Added an addition rename unit test that will test the rename with multiple threads, since the default is one.

        Show
        Ted Malaska added a comment - Changes based on comments: 1. Done, renamed to s3native.max-copy-threads 2. Done, set to 1 3. Done 4. Done, added to s3Native section 5. Unsure how to do this one 6. Done, used S3Exception. Could not use InterruptedIOException because it does not take a root cause in it's constructor. 7. Done, I added a getCause statement. This will ether return a Runtime exception created by the store or it will get my runtime exception which gives context to the error by letting the user know that the exception happen during a copy from src to dst 8. This is out of scope of this jira. If desired we can create a new jira for this one Additional work: 1. Fixed find bugs issue by making runnable class a inner static class 2. Added an addition rename unit test that will test the rename with multiple threads, since the default is one.
        Hide
        Ted Malaska added a comment -

        Hey Steve,

        Thank you for the review. Response by number below

        1. Yes this is great. Do you want it in the same class or is there a Const class that should house this
        2. Agreed
        3. Agreed
        4. Agreed, Will do
        5. I'm not sure how to do this one in a predictable way
        6. Agreed
        7. Agreed
        8. I will check if that is supported with the current implementation. If not I'll leave that change for another Jira

        Regarding Jira 10400: Yes I've very aware of that Jira and I hope it continues in its testing and gets committed. This Jira should not impact that jira in any way. Now this jira benefits may be short lived, because S3A maybe better in the long run. But I have short term needs for people using this File System implementation today and who have tested this FileSystem implementation heavily.

        Regarding the FindBugs issue: I am also updating the code to remove the find bugs issue

        I should have a new patch soon. Thanks again for the review

        Show
        Ted Malaska added a comment - Hey Steve, Thank you for the review. Response by number below 1. Yes this is great. Do you want it in the same class or is there a Const class that should house this 2. Agreed 3. Agreed 4. Agreed, Will do 5. I'm not sure how to do this one in a predictable way 6. Agreed 7. Agreed 8. I will check if that is supported with the current implementation. If not I'll leave that change for another Jira Regarding Jira 10400: Yes I've very aware of that Jira and I hope it continues in its testing and gets committed. This Jira should not impact that jira in any way. Now this jira benefits may be short lived, because S3A maybe better in the long run. But I have short term needs for people using this File System implementation today and who have tested this FileSystem implementation heavily. Regarding the FindBugs issue: I am also updating the code to remove the find bugs issue I should have a new patch soon. Thanks again for the review
        Hide
        Steve Loughran added a comment -

        This has the potential to be useful, but it'll take some more iterations

        1. "fs.s3.copyThreads" must be a public const -we don't need more inline attributes as it forces copy-and-paste reuse in other code.
        2. same for default value
        3. It think we should make it 1 for default/consistency
        4. and added to docs, core-default.xml
        5. if done for all exception strings, the – still to be written tests – can look for the specific text in those tests that simulate failures.
        6. InterruptedException should be converted to an InterruptedIOException, S3Exception on anything else
        7. S3CopyRunnable is going to convert all exceptions to a chain of IOE -> RuntimeE -> real exception. The underlying cause should be grabbed and extracted, if there's an S3Exception somewhere down the tree, that is the one that should be rethrown,
        8. there's a risk that the list of files may change during the execution. Runnable may want to handle that by catching FileNotFoundException and downgrade it from an error to a "did not copy event" -maybe just add a counter of "files that were moved"

        There's some work going on - I forget where -on adding a new S3 client that uses the higher-performance AWS library, not jets3t. If the work went in there, backwards compatibility for thread count would be a non-issue, and the lib would get the newer performance of the rest of the library

        Show
        Steve Loughran added a comment - This has the potential to be useful, but it'll take some more iterations "fs.s3.copyThreads" must be a public const -we don't need more inline attributes as it forces copy-and-paste reuse in other code. same for default value It think we should make it 1 for default/consistency and added to docs, core-default.xml if done for all exception strings, the – still to be written tests – can look for the specific text in those tests that simulate failures. InterruptedException should be converted to an InterruptedIOException, S3Exception on anything else S3CopyRunnable is going to convert all exceptions to a chain of IOE -> RuntimeE -> real exception. The underlying cause should be grabbed and extracted, if there's an S3Exception somewhere down the tree, that is the one that should be rethrown, there's a risk that the list of files may change during the execution. Runnable may want to handle that by catching FileNotFoundException and downgrade it from an error to a "did not copy event" -maybe just add a counter of "files that were moved" There's some work going on - I forget where -on adding a new S3 client that uses the higher-performance AWS library, not jets3t. If the work went in there, backwards compatibility for thread count would be a non-issue, and the lib would get the newer performance of the rest of the library
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12642913/HADOOP-10560.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

        org.apache.hadoop.ha.TestZKFailoverControllerStress

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3882//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3882//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3882//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642913/HADOOP-10560.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverControllerStress +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3882//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3882//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3882//console This message is automatically generated.
        Hide
        Ted Malaska added a comment -

        First attempt to added threading to copying files in the case a directory is renamed.

        I just use and executorService with futures. Then try to get all the futures before preforming the normal deletes that were there before.

        This is ready for review.

        Thanks again

        Show
        Ted Malaska added a comment - First attempt to added threading to copying files in the case a directory is renamed. I just use and executorService with futures. Then try to get all the futures before preforming the normal deletes that were there before. This is ready for review. Thanks again
        Hide
        Ted Malaska added a comment -

        If you don't mind I would like to do this Jira. I'm being set up as a contributor now. I will assign it to myself as soon as that is finished.

        Thanks

        Show
        Ted Malaska added a comment - If you don't mind I would like to do this Jira. I'm being set up as a contributor now. I will assign it to myself as soon as that is finished. Thanks

          People

          • Assignee:
            Ted Malaska
            Reporter:
            Ted Malaska
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development