Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.6.0
    • Component/s: core
    • Labels:
      None
    • Environment:

      Description

      When the blobstore fails to upload a file, it might be worth retrying a couple of times rather than just aborting
      the whole process.
      OTOH, the failure would occur before any machines are launched, so the loss is limited.

      Here's an example stacktrace of a failed upload:
      Created blob cache container 'fr6mxqa4d6aa' located in '[id=EU, scope=REGION, description=EU, parent=aws-s3, iso3166Codes=[IE], metadata={}]'
      Uploading 'mydecorator-1.0-SNAPSHOT.tar.gz' to 'fr6mxqa4d6aa' blob cache.
      Jul 29, 2011 8:53:50 AM org.jclouds.logging.jdk.JDKLogger logWarn
      WARNING: Cannot retry after server error, command is not replayable: [method=AWSS3AsyncClient.putObject, request=PUT https://fr6mxqa4d6aa.s3.amazonaws.com/mydecorator-1.0-SNAPSHOT.tar.gz HTTP/1.1]
      Exception in thread "main" org.jclouds.http.HttpResponseException: error after writing 335872/637480 bytes to PUT https://fr6mxqa4d6aa.s3.amazonaws.com/mydecorator-1.0-SNAPSHOT.tar.gz HTTP/1.1 connecting to PUT https://fr6mxqa4d6aa.s3.amazonaws.com/mydecorator-1.0-SNAPSHOT.tar.gz HTTP/1.1
      at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:183)
      at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:134)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
      at java.util.concurrent.FutureTask.run(FutureTask.java:166)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      at java.lang.Thread.run(Thread.java:679)
      Caused by: java.lang.RuntimeException: error after writing 335872/637480 bytes to PUT https://fr6mxqa4d6aa.s3.amazonaws.com/mydecorator-1.0-SNAPSHOT.tar.gz HTTP/1.1
      at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:245)
      at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:77)
      at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:156)
      ... 6 more
      Caused by: java.io.IOException: Error writing request body to server
      at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:2716)
      at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:2699)
      at com.google.common.io.CountingOutputStream.write(CountingOutputStream.java:51)
      at com.google.common.io.ByteStreams.copy(ByteStreams.java:172)
      at org.jclouds.io.payloads.BasePayload.writeTo(BasePayload.java:67)
      at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:243)
      ... 8 more
      Removing blob cache 'fr6mxqa4d6aa'

      1. WHIRR-349-new-test.patch
        3 kB
        Andrei Savu
      2. WHIRR-349.patch
        7 kB
        Andrei Savu
      3. failcase.zip
        25 kB
        Karel Vervaeke
      4. WHIRR-349.patch
        5 kB
        Andrei Savu

        Issue Links

          Activity

          Hide
          Andrei Savu added a comment -

          I've just committed the new test.

          Show
          Andrei Savu added a comment - I've just committed the new test.
          Hide
          Andrei Savu added a comment -

          I've created a new test that shows that the upload works as expected with the suggested configuration workaround. I've also added aws-s3 & cloudfiles to the list of BLOBSTORE_APIS.

          Show
          Andrei Savu added a comment - I've created a new test that shows that the upload works as expected with the suggested configuration workaround. I've also added aws-s3 & cloudfiles to the list of BLOBSTORE_APIS.
          Hide
          Adrian Cole added a comment -

          this issue can be worked around, if the endpoint relevant to the s3 region being used in assigned via configuration. However, current code does not allow properties to be passed to the BlobStore. This is resolved in WHIRR-362.

          RECOMMENDATION:

          We have a reasonable workaround which is much better than saying Whirr can only work in EC2 us-east-1.

          Apply WHIRR-362 and dependencies, then retest with the following configuration, which I've tested to work:

          whirr.blobstore-location-id=EU
          jclouds.aws-s3.endpoint=https://s3-eu-west-1.amazonaws.com

          Show
          Adrian Cole added a comment - this issue can be worked around, if the endpoint relevant to the s3 region being used in assigned via configuration. However, current code does not allow properties to be passed to the BlobStore. This is resolved in WHIRR-362 . RECOMMENDATION: We have a reasonable workaround which is much better than saying Whirr can only work in EC2 us-east-1. Apply WHIRR-362 and dependencies, then retest with the following configuration, which I've tested to work: whirr.blobstore-location-id=EU jclouds.aws-s3.endpoint= https://s3-eu-west-1.amazonaws.com
          Hide
          Adrian Cole added a comment -

          note with WHIRR-361 applied, logging of this problem works correctly (ex. the warning isn't via a j.u.l.Logger, rather SLF4J)

          2011-08-13 11:11:25,476 INFO [org.apache.whirr.util.BlobCache] (main) Created blob cache container 'xeyhzrbvvfre' located in '[id=EU, scope=REGION, description=EU, parent=aws-s3, iso3166Codes=[IE], metadata={}]'
          2011-08-13 11:11:31,093 INFO [org.apache.whirr.util.BlobCache] (main) Uploading 'whirr691749320713679426.txt' to 'xeyhzrbvvfre' blob cache.
          2011-08-13 11:12:07,534 INFO [org.apache.whirr.util.integration.BlobCacheTest] (main) (mkdir -p /tmp && cd /tmp && [ ! -f whirr691749320713679426.txt ] && curl -C - -s -q -L --connect-timeout 10 --max-time 600 -X GET -s --retry 20 -H "Host: xeyhzrbvvfre.s3.amazonaws.com" -H "Date: Sat, 13 Aug 2011 10:11:09 GMT" -H "Authorization: AWS 067PW7Z9P0FNH7JDPE82:ROKDCwDiXqXEuG6FgjHxkHyC4t0=" https://xeyhzrbvvfre.s3.amazonaws.com/whirr691749320713679426.txt >whirr691749320713679426.txt)

          2011-08-13 11:12:07,552 INFO [org.apache.whirr.util.BlobCache] (main) Removing blob cache 'xeyhzrbvvfre'
          2011-08-13 11:12:34,789 INFO [org.apache.whirr.util.BlobCache] (main) Created blob cache container 'cpqbe3pmzal8' located in '[id=EU, scope=REGION, description=EU, parent=aws-s3, iso3166Codes=[IE], metadata={}]'
          2011-08-13 11:12:37,701 INFO [org.apache.whirr.util.BlobCache] (main) Uploading 'whirr3766638371999818973.txt' to 'cpqbe3pmzal8' blob cache.
          2011-08-13 11:13:05,662 WARN [org.jclouds.http.handlers.BackoffLimitedRetryHandler] (i/o thread 0) Cannot retry after server error, command has exceeded retry limit 5: [method=AWSS3AsyncClient.putObject, request=PUT https://cpqbe3pmzal8.s3.amazonaws.com/whirr3766638371999818973.txt HTTP/1.1]
          2011-08-13 11:13:05,697 INFO [org.apache.whirr.util.BlobCache] (main) Removing blob cache 'cpqbe3pmzal8'
          2011-08-13 11:13:12,777 INFO [org.apache.whirr.util.integration.BlobCacheTest] (main) Created temporary container 'ntiayrnslawy'
          2011-08-13 11:13:13,178 INFO [org.apache.whirr.util.integration.BlobCacheTest] (main) Removing temporary container 'ntiayrnslawy'

          Show
          Adrian Cole added a comment - note with WHIRR-361 applied, logging of this problem works correctly (ex. the warning isn't via a j.u.l.Logger, rather SLF4J) 2011-08-13 11:11:25,476 INFO [org.apache.whirr.util.BlobCache] (main) Created blob cache container 'xeyhzrbvvfre' located in '[id=EU, scope=REGION, description=EU, parent=aws-s3, iso3166Codes= [IE] , metadata={}]' 2011-08-13 11:11:31,093 INFO [org.apache.whirr.util.BlobCache] (main) Uploading 'whirr691749320713679426.txt' to 'xeyhzrbvvfre' blob cache. 2011-08-13 11:12:07,534 INFO [org.apache.whirr.util.integration.BlobCacheTest] (main) (mkdir -p /tmp && cd /tmp && [ ! -f whirr691749320713679426.txt ] && curl -C - -s -q -L --connect-timeout 10 --max-time 600 -X GET -s --retry 20 -H "Host: xeyhzrbvvfre.s3.amazonaws.com" -H "Date: Sat, 13 Aug 2011 10:11:09 GMT" -H "Authorization: AWS 067PW7Z9P0FNH7JDPE82:ROKDCwDiXqXEuG6FgjHxkHyC4t0=" https://xeyhzrbvvfre.s3.amazonaws.com/whirr691749320713679426.txt >whirr691749320713679426.txt) 2011-08-13 11:12:07,552 INFO [org.apache.whirr.util.BlobCache] (main) Removing blob cache 'xeyhzrbvvfre' 2011-08-13 11:12:34,789 INFO [org.apache.whirr.util.BlobCache] (main) Created blob cache container 'cpqbe3pmzal8' located in '[id=EU, scope=REGION, description=EU, parent=aws-s3, iso3166Codes= [IE] , metadata={}]' 2011-08-13 11:12:37,701 INFO [org.apache.whirr.util.BlobCache] (main) Uploading 'whirr3766638371999818973.txt' to 'cpqbe3pmzal8' blob cache. 2011-08-13 11:13:05,662 WARN [org.jclouds.http.handlers.BackoffLimitedRetryHandler] (i/o thread 0) Cannot retry after server error, command has exceeded retry limit 5: [method=AWSS3AsyncClient.putObject, request=PUT https://cpqbe3pmzal8.s3.amazonaws.com/whirr3766638371999818973.txt HTTP/1.1] 2011-08-13 11:13:05,697 INFO [org.apache.whirr.util.BlobCache] (main) Removing blob cache 'cpqbe3pmzal8' 2011-08-13 11:13:12,777 INFO [org.apache.whirr.util.integration.BlobCacheTest] (main) Created temporary container 'ntiayrnslawy' 2011-08-13 11:13:13,178 INFO [org.apache.whirr.util.integration.BlobCacheTest] (main) Removing temporary container 'ntiayrnslawy'
          Hide
          Adrian Cole added a comment -

          This won't be resolved effectively by a jclouds patch until BlobStore uses the EnterpriseConfigurationModule. It is that configuration, which is tested in jclouds aws-s3 provider

          Show
          Adrian Cole added a comment - This won't be resolved effectively by a jclouds patch until BlobStore uses the EnterpriseConfigurationModule. It is that configuration, which is tested in jclouds aws-s3 provider
          Hide
          Adrian Cole added a comment -

          Note that if we wish a jclouds snapshot release to work effectively, we should include WHIRR-361 before releasing version 0.6.0

          Show
          Adrian Cole added a comment - Note that if we wish a jclouds snapshot release to work effectively, we should include WHIRR-361 before releasing version 0.6.0
          Hide
          Adrian Cole added a comment -

          FYI, to test this, place the following temporarily into whirr-core-test.properties and run BlobCacheTest via junit

          whirr.blobstore-location-id=EU

          Show
          Adrian Cole added a comment - FYI, to test this, place the following temporarily into whirr-core-test.properties and run BlobCacheTest via junit whirr.blobstore-location-id=EU
          Hide
          Andrei Savu added a comment -

          I've just committed this. Thanks Karel for raising the issue! It should probably work with a jclouds snapshot in the near future.

          Show
          Andrei Savu added a comment - I've just committed this. Thanks Karel for raising the issue! It should probably work with a jclouds snapshot in the near future.
          Hide
          Andrei Savu added a comment - - edited
          Show
          Andrei Savu added a comment - - edited The jclouds issue: http://code.google.com/p/jclouds/issues/detail?id=656&start=100
          Hide
          Andrei Savu added a comment -

          I've done some more checking and it seems like the 1MB test is failing for all regions except us-east-1. I will open a jclouds issue and I'll commit this patch because it contains some important changes.

          Show
          Andrei Savu added a comment - I've done some more checking and it seems like the 1MB test is failing for all regions except us-east-1. I will open a jclouds issue and I'll commit this patch because it contains some important changes.
          Hide
          Adrian Cole added a comment -

          I'm noticing the endpoint used on the bucker you are mentioning is eu-west-1 is still the US one. Ex. https://inxjioeo9itz.s3.amazonaws.com/whirr6849479395193932763.txt vs https://inxjioeo9itz.s3-eu-west-1.amazonaws.com/whirr6849479395193932763.txt

          jclouds should be able to sort out the correct endpoint for a bucket in an alternate region. However, I'm concerned this issue may be a red herring, as the byte count of 163840 is very fishy (no pun intended ).

          Can you open a jclouds issue on this? I'll create a test case uploading a blob in a bucket that's in a different region.

          http://code.google.com/p/jclouds/issues

          Show
          Adrian Cole added a comment - I'm noticing the endpoint used on the bucker you are mentioning is eu-west-1 is still the US one. Ex. https://inxjioeo9itz.s3.amazonaws.com/whirr6849479395193932763.txt vs https://inxjioeo9itz.s3-eu-west-1.amazonaws.com/whirr6849479395193932763.txt jclouds should be able to sort out the correct endpoint for a bucket in an alternate region. However, I'm concerned this issue may be a red herring, as the byte count of 163840 is very fishy (no pun intended ). Can you open a jclouds issue on this? I'll create a test case uploading a blob in a bucket that's in a different region. http://code.google.com/p/jclouds/issues
          Hide
          Andrei Savu added a comment -

          Updated patch (cleanup + multipart).

          Integration tests work for cloudfiles and S3 us-east. Unfortunately I'm still seeing a failure when using ue-west-1.

          testUploadLargeFileToBlobCache(org.apache.whirr.util.integration.BlobCacheTest)  Time elapsed: 91.535 sec  <<< ERROR!
          org.jclouds.http.HttpResponseException: error after writing 163840/1048576 bytes to PUT https://inxjioeo9itz.s3.amazonaws.com/whirr6849479395193932763.txt HTTP/1.1 connecting to PUT https://inxjioeo9itz.s3.amazonaws.com/whirr6849479395193932763.txt HTTP/1.1
                  at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:183)
                  at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:134)
                  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          Caused by: java.lang.RuntimeException: error after writing 163840/1048576 bytes to PUT https://inxjioeo9itz.s3.amazonaws.com/whirr6849479395193932763.txt HTTP/1.1
                  at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:245)
                  at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:77)
                  at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:156)
                  ... 6 more
          Caused by: java.io.IOException: Error writing request body to server
                  at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:2809)
                  at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:2792)
                  at com.google.common.io.CountingOutputStream.write(CountingOutputStream.java:51)
                  at com.google.common.io.ByteStreams.copy(ByteStreams.java:172)
                  at org.jclouds.io.payloads.BasePayload.writeTo(BasePayload.java:67)
                  at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:243)
                  ... 8 more
          
          

          Adrian any ideas? I've been to upload a large file to eu-west-1 using the web console.

          Show
          Andrei Savu added a comment - Updated patch (cleanup + multipart). Integration tests work for cloudfiles and S3 us-east. Unfortunately I'm still seeing a failure when using ue-west-1. testUploadLargeFileToBlobCache(org.apache.whirr.util.integration.BlobCacheTest) Time elapsed: 91.535 sec <<< ERROR! org.jclouds.http.HttpResponseException: error after writing 163840/1048576 bytes to PUT https://inxjioeo9itz.s3.amazonaws.com/whirr6849479395193932763.txt HTTP/1.1 connecting to PUT https://inxjioeo9itz.s3.amazonaws.com/whirr6849479395193932763.txt HTTP/1.1 at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:183) at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:134) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: error after writing 163840/1048576 bytes to PUT https://inxjioeo9itz.s3.amazonaws.com/whirr6849479395193932763.txt HTTP/1.1 at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:245) at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:77) at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:156) ... 6 more Caused by: java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:2809) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:2792) at com.google.common.io.CountingOutputStream.write(CountingOutputStream.java:51) at com.google.common.io.ByteStreams.copy(ByteStreams.java:172) at org.jclouds.io.payloads.BasePayload.writeTo(BasePayload.java:67) at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:243) ... 8 more Adrian any ideas? I've been to upload a large file to eu-west-1 using the web console.
          Hide
          Karel Vervaeke added a comment -

          The console output, surefire report and target/test-data output from a 'failed' upload.
          This is with the first patch applied,
          so there's a test case uploading a small file (<20 bytes) and one uploading 1 MB.

          In console.log you can find the timestamps between which the upload should have occurred.
          The jclouds.wire log doesn't contain anything during the uploads.
          (I haven't looked at the log configuration yet, it might be possible to get more logging)

          Show
          Karel Vervaeke added a comment - The console output, surefire report and target/test-data output from a 'failed' upload. This is with the first patch applied, so there's a test case uploading a small file (<20 bytes) and one uploading 1 MB. In console.log you can find the timestamps between which the upload should have occurred. The jclouds.wire log doesn't contain anything during the uploads. (I haven't looked at the log configuration yet, it might be possible to get more logging)
          Hide
          Andrei Savu added a comment -

          Attached patch from the Github pull request. I will look into adding multipart support tomorrow.

          Show
          Andrei Savu added a comment - Attached patch from the Github pull request. I will look into adding multipart support tomorrow.
          Hide
          Andrei Savu added a comment -

          Good points. I will submit a patch to address the comments. Thanks guys.

          Show
          Andrei Savu added a comment - Good points. I will submit a patch to address the comments. Thanks guys.
          Hide
          Adrian Cole added a comment -

          Also, if you use the PutOption multipart(), it will split the file into a ton of pieces and really help w/reliability (thnx tibor).

          https://github.com/jclouds/jclouds-examples/tree/master/blobstore-largeblob

          Show
          Adrian Cole added a comment - Also, if you use the PutOption multipart(), it will split the file into a ton of pieces and really help w/reliability (thnx tibor). https://github.com/jclouds/jclouds-examples/tree/master/blobstore-largeblob
          Hide
          Adrian Cole added a comment -

          the code in BlobCache needs to be revised. jclouds will by default retry any replayable payloads. InputStreams are not replayable. Looks like the smoking gun is here where a file is converted to a stream:

          public synchronized void putIfAbsent(File file) throws FileNotFoundException

          { putIfAbsent(file.getName(), new FileInputStream(file), file.length()); }

          It is better to send the File, byte[], String, etc. directly than do this. All of those payloads are retryable. I hope this helps.

          P.S. we should use BlobBuilder in BlobCache as it looks nicer

          Show
          Adrian Cole added a comment - the code in BlobCache needs to be revised. jclouds will by default retry any replayable payloads. InputStreams are not replayable. Looks like the smoking gun is here where a file is converted to a stream: public synchronized void putIfAbsent(File file) throws FileNotFoundException { putIfAbsent(file.getName(), new FileInputStream(file), file.length()); } It is better to send the File, byte[], String, etc. directly than do this. All of those payloads are retryable. I hope this helps. P.S. we should use BlobBuilder in BlobCache as it looks nicer
          Hide
          Karel Vervaeke added a comment -

          My remark about this happening before any machines are launched is only valid when uploading blobs in the configure phase and should be ignored.

          Show
          Karel Vervaeke added a comment - My remark about this happening before any machines are launched is only valid when uploading blobs in the configure phase and should be ignored.

            People

            • Assignee:
              Adrian Cole
              Reporter:
              Karel Vervaeke
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development