Hadoop Common
  1. Hadoop Common
  2. HADOOP-997

Implement S3 retry mechanism for failed block transfers

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.12.0
    • Component/s: fs
    • Labels:
      None

      Description

      HADOOP-882 improves S3FileSystem so that when certain communications problems with S3 occur the operation is retried. However, the retry mechanism cannot handle a block transfer failure, since blocks may be very large and we don't want to buffer them in memory. This improvement is to write a wrapper (using java.lang.reflect.Proxy if possible - see discussion in HADOOP-882) that can retry block transfers.

      1. HADOOP-997-v3.patch
        46 kB
        Tom White
      2. HADOOP-997-v2.patch
        45 kB
        Tom White
      3. HADOOP-997.patch
        39 kB
        Tom White

        Issue Links

          Activity

          Hide
          Tom White added a comment -

          Here's a patch for review.

          It actually has a general retry mechanism, a la HADOOP-601. (There is no annotations support, but it wouldn't be too hard to extend RetryProxy to have a method that introspects the runtime method annotations of a passed in object and constructs a proxy based on them.) I've put it in org.apache.hadoop.io.retry, but I'm not sure about this as it isn't IO specific. The idea is that this mechanism can be reused for HDFS, MapRed and IPC retries.

          On the S3 front, I have increased the block size to 64 MB (since we don't have to worry about the whole block being buffered in memory any longer), and I've removed the notes about JetS3t buffering, since this shouldn't need to be changed by users now.

          I haven't tried it with any big files yet...

          Show
          Tom White added a comment - Here's a patch for review. It actually has a general retry mechanism, a la HADOOP-601 . (There is no annotations support, but it wouldn't be too hard to extend RetryProxy to have a method that introspects the runtime method annotations of a passed in object and constructs a proxy based on them.) I've put it in org.apache.hadoop.io.retry, but I'm not sure about this as it isn't IO specific. The idea is that this mechanism can be reused for HDFS, MapRed and IPC retries. On the S3 front, I have increased the block size to 64 MB (since we don't have to worry about the whole block being buffered in memory any longer), and I've removed the notes about JetS3t buffering, since this shouldn't need to be changed by users now. I haven't tried it with any big files yet...
          Hide
          stack added a comment -

          I tried the patch and got this:

          hadoop@domU-12-31-33-00-02-D5:/mnt/hadoop-trunk$ ./bin/hadoop fs -fs s3://AWS_ID:AWS_SECRET@BUCKET -put /mnt/search/searcher.dir/index-sorted /1998/20070207040835-1998
          Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
          at org.apache.hadoop.fs.s3.$Proxy0.initialize(Unknown Source)
          at org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:63)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
          at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
          at org.apache.hadoop.fs.FsShell.init(FsShell.java:39)
          at org.apache.hadoop.fs.FsShell.run(FsShell.java:796)
          at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
          at org.apache.hadoop.fs.FsShell.main(FsShell.java:895)
          Caused by: java.lang.IllegalAccessException: Class org.apache.hadoop.io.retry.RetryInvocationHandler can not access a member of class org.apache.hadoop.fs.s3.FileSystemStore with modifiers "public abstract"
          at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:65)
          at java.lang.reflect.Method.invoke(Method.java:578)
          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:62)
          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:42)
          ... 9 more

          Before the patch, the above command-line worked.

          hadoop@domU-12-31-33-00-02-D5:/mnt/hadoop-trunk$ svn info
          ...
          Revision: 507114
          ...

          Other comments on patch,

          + The retry mechanism looks sweet.
          + Is it true that the Jets3tFileSystemStore#initialize runs after the retry proxy has been set up so initialization will be retried if an Exception? Is this intentional? Perhaps initialization should be outside the purview of retry.
          + Minor comment: In version 2 of the retry mechanism, you'd be able to narrow retries to happen only for certain exception types: e.g. for IOExceptions only rather than for Exception as its currently written.
          + In S3OutputStream#newBackupFile, you replace File createTempFile with your own tmp file name fabricator. Any reason why you didn't stay using createTempFile, just use the override with signature instead: http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String,%20java.io.File)? (You do this in a few places in this patch).
          + In S3FileSystem.java#createDefaultStore, should the retry params be exposed as configurables? (I'm fine if the rejoinder is that this makes for too much configuration).
          + On initialize, should Jets3tFileSystemStore clean up its tmp directory? (If crash, deleteOnExit won't get a chance to run).
          + If an IOException in retreiveBlock in Jets3tFileSystemStore, you do a closeQuietly on the out stream. Its followed by a finally that does closeQuietly on both the in and out stream. I don't follow whats going on here. Why not just leave all closing to the finally block? (Suggest adding a comment on why the special case handling of out stream in IOE block).

          Good stuff Tom.

          Show
          stack added a comment - I tried the patch and got this: hadoop@domU-12-31-33-00-02-D5:/mnt/hadoop-trunk$ ./bin/hadoop fs -fs s3://AWS_ID:AWS_SECRET@BUCKET -put /mnt/search/searcher.dir/index-sorted /1998/20070207040835-1998 Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.fs.s3.$Proxy0.initialize(Unknown Source) at org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:63) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160) at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91) at org.apache.hadoop.fs.FsShell.init(FsShell.java:39) at org.apache.hadoop.fs.FsShell.run(FsShell.java:796) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) at org.apache.hadoop.fs.FsShell.main(FsShell.java:895) Caused by: java.lang.IllegalAccessException: Class org.apache.hadoop.io.retry.RetryInvocationHandler can not access a member of class org.apache.hadoop.fs.s3.FileSystemStore with modifiers "public abstract" at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:65) at java.lang.reflect.Method.invoke(Method.java:578) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:62) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:42) ... 9 more Before the patch, the above command-line worked. hadoop@domU-12-31-33-00-02-D5:/mnt/hadoop-trunk$ svn info ... Revision: 507114 ... Other comments on patch, + The retry mechanism looks sweet. + Is it true that the Jets3tFileSystemStore#initialize runs after the retry proxy has been set up so initialization will be retried if an Exception? Is this intentional? Perhaps initialization should be outside the purview of retry. + Minor comment: In version 2 of the retry mechanism, you'd be able to narrow retries to happen only for certain exception types: e.g. for IOExceptions only rather than for Exception as its currently written. + In S3OutputStream#newBackupFile, you replace File createTempFile with your own tmp file name fabricator. Any reason why you didn't stay using createTempFile, just use the override with signature instead: http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String,%20java.io.File)? (You do this in a few places in this patch). + In S3FileSystem.java#createDefaultStore, should the retry params be exposed as configurables? (I'm fine if the rejoinder is that this makes for too much configuration). + On initialize, should Jets3tFileSystemStore clean up its tmp directory? (If crash, deleteOnExit won't get a chance to run). + If an IOException in retreiveBlock in Jets3tFileSystemStore, you do a closeQuietly on the out stream. Its followed by a finally that does closeQuietly on both the in and out stream. I don't follow whats going on here. Why not just leave all closing to the finally block? (Suggest adding a comment on why the special case handling of out stream in IOE block). Good stuff Tom.
          Hide
          Tom White added a comment -

          > + Is it true that the Jets3tFileSystemStore#initialize runs after the retry proxy has been set up so initialization will be retried if an Exception? Is this intentional? Perhaps initialization should be outside the purview of retry.

          This was not intentional. I think initialization should probably not
          be retried currently (in the future I might be worth thinking through
          the initialization cases that need retrying).
          It looks like this might be the cause of the exception you found.

          > + Minor comment: In version 2 of the retry mechanism, you'd be able to narrow retries to happen only for certain exception types: e.g. for IOExceptions only rather than for Exception as its currently written.

          S3 should only retry for IOExceptions - I'll fix this.

          > + In S3OutputStream#newBackupFile, you replace File createTempFile with your own tmp file name fabricator. Any reason why you didn't stay using createTempFile, just use the override with signature instead: http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String,%20java.io.File)? (You do this in a few places in this patch).

          I did this for consistency with HDFS and also because of a recent
          issue with creating files in /tmp. I agree that createTempFile with
          the signature you suggest is the way to go. I'll change it.

          > + In S3FileSystem.java#createDefaultStore, should the retry params be exposed as configurables? (I'm fine if the rejoinder is that this makes for too much configuration).

          Yes, this crossed my mind and I think we should have it. It also
          occurred to me that with annotation driven retries (HADOOP-601) you
          wouldn't be able to expose the parameters as configurables (as far as
          I know).

          > + On initialize, should Jets3tFileSystemStore clean up its tmp directory? (If crash, deleteOnExit won't get a chance to run).

          Possibly. Although the files should be deleted after transfer anyway.
          The deleteOnExit was a backup, but you are right that it doesn't
          always run.

          > + If an IOException in retreiveBlock in Jets3tFileSystemStore, you do a closeQuietly on the out stream. Its followed by a finally that does closeQuietly on both the in and out stream. I don't follow whats going on here. Why not just leave all closing to the finally block? (Suggest adding a comment on why the special case handling of out stream in IOE block).

          On failure I want to delete the file, but to do this I close the
          stream first. I agree a better comment is needed.

          Thanks for the detailed feedback. I'll look at making improvements
          over the coming days.

          Show
          Tom White added a comment - > + Is it true that the Jets3tFileSystemStore#initialize runs after the retry proxy has been set up so initialization will be retried if an Exception? Is this intentional? Perhaps initialization should be outside the purview of retry. This was not intentional. I think initialization should probably not be retried currently (in the future I might be worth thinking through the initialization cases that need retrying). It looks like this might be the cause of the exception you found. > + Minor comment: In version 2 of the retry mechanism, you'd be able to narrow retries to happen only for certain exception types: e.g. for IOExceptions only rather than for Exception as its currently written. S3 should only retry for IOExceptions - I'll fix this. > + In S3OutputStream#newBackupFile, you replace File createTempFile with your own tmp file name fabricator. Any reason why you didn't stay using createTempFile, just use the override with signature instead: http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String,%20java.io.File)? (You do this in a few places in this patch). I did this for consistency with HDFS and also because of a recent issue with creating files in /tmp. I agree that createTempFile with the signature you suggest is the way to go. I'll change it. > + In S3FileSystem.java#createDefaultStore, should the retry params be exposed as configurables? (I'm fine if the rejoinder is that this makes for too much configuration). Yes, this crossed my mind and I think we should have it. It also occurred to me that with annotation driven retries ( HADOOP-601 ) you wouldn't be able to expose the parameters as configurables (as far as I know). > + On initialize, should Jets3tFileSystemStore clean up its tmp directory? (If crash, deleteOnExit won't get a chance to run). Possibly. Although the files should be deleted after transfer anyway. The deleteOnExit was a backup, but you are right that it doesn't always run. > + If an IOException in retreiveBlock in Jets3tFileSystemStore, you do a closeQuietly on the out stream. Its followed by a finally that does closeQuietly on both the in and out stream. I don't follow whats going on here. Why not just leave all closing to the finally block? (Suggest adding a comment on why the special case handling of out stream in IOE block). On failure I want to delete the file, but to do this I close the stream first. I agree a better comment is needed. Thanks for the detailed feedback. I'll look at making improvements over the coming days.
          Hide
          Tom White added a comment -

          Version 2 patch that addresses all the issues listed above, except for clearing out the temp directory on startup.

          The exception while using "hadoop fs" was because the visibility of FileSystemStore was package private, which caused problems when creating proxies that are used from other packages. (I hadn't tested properly since I hadn't enabled retries for S3 when I did my testing!)

          I'm now encountering a different problem when trying to copy local files to S3. I get the error

          put: No such file or directory

          I'm still investigating the cause of this to see if it is related.

          Show
          Tom White added a comment - Version 2 patch that addresses all the issues listed above, except for clearing out the temp directory on startup. The exception while using "hadoop fs" was because the visibility of FileSystemStore was package private, which caused problems when creating proxies that are used from other packages. (I hadn't tested properly since I hadn't enabled retries for S3 when I did my testing!) I'm now encountering a different problem when trying to copy local files to S3. I get the error put: No such file or directory I'm still investigating the cause of this to see if it is related.
          Hide
          Tom White added a comment -

          Version 3 patch that fixes the "No such file or directory" problem. This was occurring because the S3 buffer directory wasn't being created if it didn't exist (tests were OK because hadoop.tmp.dir is different).

          I think this can be committed after review.

          Show
          Tom White added a comment - Version 3 patch that fixes the "No such file or directory" problem. This was occurring because the S3 buffer directory wasn't being created if it didn't exist (tests were OK because hadoop.tmp.dir is different). I think this can be committed after review.
          Hide
          stack added a comment -

          +1

          I ran the previously-failing test from above. It now runs to completion successfully copying gigabytes into S3.

          On patch, I see you've added the one thing I was going to suggest after trying v2 yesterday and tripping over 'no such file or directory' myself, the outputting of the problematic file path when emitting the 'No such file or directory' message. Good stuff.

          Show
          stack added a comment - +1 I ran the previously-failing test from above. It now runs to completion successfully copying gigabytes into S3. On patch, I see you've added the one thing I was going to suggest after trying v2 yesterday and tripping over 'no such file or directory' myself, the outputting of the problematic file path when emitting the 'No such file or directory' message. Good stuff.
          Hide
          Doug Cutting added a comment -

          +1 This looks good to me too.

          Show
          Doug Cutting added a comment - +1 This looks good to me too.
          Hide
          Mike Smith added a comment -

          Tom,

          Using this patch, do we still need to set s3service.stream-retry-buffer-size to 64M in jets3t.properties? Or that can be kept as the default value (100K)?

          Thanks, Mike

          Show
          Mike Smith added a comment - Tom, Using this patch, do we still need to set s3service.stream-retry-buffer-size to 64M in jets3t.properties? Or that can be kept as the default value (100K)? Thanks, Mike
          Hide
          Tom White added a comment -

          No, we can keep the default value because the new retry mechanism will kick in if the JetS3t one fails (runs out of buffer). The JetS3t one is still needed for all other (metadata) operations.

          Show
          Tom White added a comment - No, we can keep the default value because the new retry mechanism will kick in if the JetS3t one fails (runs out of buffer). The JetS3t one is still needed for all other (metadata) operations.
          Hide
          Tom White added a comment -

          I've just committed this.

          Show
          Tom White added a comment - I've just committed this.

            People

            • Assignee:
              Tom White
              Reporter:
              Tom White
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development