[HADOOP-11183] Memory-based S3AOutputstream - ASF JIRA

Steve Loughran added a comment - 09/Oct/14 17:13

given that the cached-to-hdd data is lost if the client fails, there's no enhanced risk to data integrity here —it even improved cleanup.

It will require a JVM set up with many GB of buffer space though -correct? Or does the AWS API add an append operation for incremental writes?

Steve Loughran added a comment - 09/Oct/14 17:13 given that the cached-to-hdd data is lost if the client fails, there's no enhanced risk to data integrity here —it even improved cleanup. It will require a JVM set up with many GB of buffer space though -correct? Or does the AWS API add an append operation for incremental writes?

Thomas Demoor added a comment - 10/Oct/14 16:11

Overview

Indeed, I do not immediately see any data integrity regressions. The current file-based implementation does not leverage the durability provided by the disks.

The key differences between object stores and HDFS one has to keep in mind here are that an object is immutable (thus no appends) and that an object only "starts existing" after close() returns (no block per block transfer and reading previous blocks while writing the last block cfr. HDFS).

One can mitigate the need for buffer space through MultipartUpload and realizing that we can start uploading before the file is closed, more precisely, once the buffer reaches the partSizeThreshold one can initiate a MultipartUpload and start uploading parts. This will (probably) require use of the low-level AWS API instead of the currently used high-level API (TransferManager) and we will thus need to do some bookkeeping (part number, etc.) ourselves.

A rough algorithm

Write commands append to the in memory "buffer" (ByteArrayOutputStream?)
If close is called while buffer.size < partSizeThreshold do a regular upload (using TransferManager?)
Else a write causes buffer.size >= partSizeThreshold:

initiate multipart upload: upload the current buffer contents per part (partSize) and "resize" buffer to length=partSize
subsequent writes fill up the buffer, when partSize is exceeded: transfer a part
close() flushes the remaining buffer, waits for all parts to be uploaded and completes the MultipartUpload

DESIGN DECISION: Transferring parts could block (enabling buffer re-use) or be asyncronous w/ threadpool (buffer per thread -> requires more memory) cfr. TransferManager. For the following I assume the former.

Memory usage

Maximum amount of used memory = (number of open files) x max(partSizeThreshold, partSize)
The minimum partSize(Threshold) is 5MB but evidently bigger parts increase throughput. How many files could one expect to be open at the same time for different sensible (f.i. not HBase) use cases? We should provide documentation helping users choose values for partSizeThreshold and partSize according to their use case and available memory. Furthermore, we probably want to keep both implementations around and introduce a config setting to choose between them.

I will submit a patch that lays out these ideas so we have a starting point to kick off the discussion.

Thomas Demoor added a comment - 10/Oct/14 16:11 Overview Indeed, I do not immediately see any data integrity regressions. The current file-based implementation does not leverage the durability provided by the disks. The key differences between object stores and HDFS one has to keep in mind here are that an object is immutable (thus no appends) and that an object only "starts existing" after close() returns (no block per block transfer and reading previous blocks while writing the last block cfr. HDFS). One can mitigate the need for buffer space through MultipartUpload and realizing that we can start uploading before the file is closed, more precisely, once the buffer reaches the partSizeThreshold one can initiate a MultipartUpload and start uploading parts. This will (probably) require use of the low-level AWS API instead of the currently used high-level API (TransferManager) and we will thus need to do some bookkeeping (part number, etc.) ourselves. A rough algorithm Write commands append to the in memory "buffer" (ByteArrayOutputStream?) If close is called while buffer.size < partSizeThreshold do a regular upload (using TransferManager?) Else a write causes buffer.size >= partSizeThreshold: initiate multipart upload: upload the current buffer contents per part (partSize) and "resize" buffer to length=partSize subsequent writes fill up the buffer, when partSize is exceeded: transfer a part close() flushes the remaining buffer, waits for all parts to be uploaded and completes the MultipartUpload DESIGN DECISION: Transferring parts could block (enabling buffer re-use) or be asyncronous w/ threadpool (buffer per thread -> requires more memory) cfr. TransferManager. For the following I assume the former. Memory usage Maximum amount of used memory = (number of open files) x max(partSizeThreshold, partSize) The minimum partSize(Threshold) is 5MB but evidently bigger parts increase throughput. How many files could one expect to be open at the same time for different sensible (f.i. not HBase) use cases? We should provide documentation helping users choose values for partSizeThreshold and partSize according to their use case and available memory. Furthermore, we probably want to keep both implementations around and introduce a config setting to choose between them. I will submit a patch that lays out these ideas so we have a starting point to kick off the discussion.

Thomas Demoor added a comment - 08/Dec/14 14:18

Patch 001: synchronous implementation (blocks on every partUpload). A little extra info is provided in info-S3AFastOutputStream-sync.md.

Additional remarks:

This patch is simply to set the stage and kick off the discussion. I am working on an async version (multiple concurrent partuploads), which I will post asap.
I would really like to bump up the aws-sdk version but in some other jira this was said to be a gargantuan task (probably http versions conflicting with other libraries? azure?) Correct?
I also renamed partSizeThreshold to the correct term multiPartThreshold. (creating a separate issue seemed overkill).

Thomas Demoor added a comment - 08/Dec/14 14:18 Patch 001: synchronous implementation (blocks on every partUpload). A little extra info is provided in info-S3AFastOutputStream-sync.md. Additional remarks: This patch is simply to set the stage and kick off the discussion. I am working on an async version (multiple concurrent partuploads), which I will post asap. I would really like to bump up the aws-sdk version but in some other jira this was said to be a gargantuan task (probably http versions conflicting with other libraries? azure?) Correct? I also renamed partSizeThreshold to the correct term multiPartThreshold. (creating a separate issue seemed overkill).

Thomas Demoor added a comment - 30/Jan/15 16:25

002.patch: see 003.patch for basic functionality but this patch uses statistics collection similar to S3AOutputStream, which is different from what HDFS does. It counts bytes at the server side, thus double counting retried bytes and counts each partUpload of a MultiPartUpload as a write operation.

(a related note: The swift fs code does not even seem to do write operation counting)

Thomas Demoor added a comment - 30/Jan/15 16:25 002.patch: see 003.patch for basic functionality but this patch uses statistics collection similar to S3AOutputStream, which is different from what HDFS does. It counts bytes at the server side, thus double counting retried bytes and counts each partUpload of a MultiPartUpload as a write operation. (a related note: The swift fs code does not even seem to do write operation counting)

Thomas Demoor added a comment - 30/Jan/15 16:39

003.patch:

concurrent uploading of parallel uploads / partUploads (of the same or different multipartUploads) in different threads
controllable max memory footprint through threadpoolexecutor w/ blocking queue (causes dependency on ~~HADOOP-11463~~)
More info in: info-003.patch.md
Speedup: my 10 MB/s connection from work upload of a 128MB file is 20% faster to AWS Ireland. Bigger files and higher throughput increase the speedup: directly on Amplidata's object store the difference is huge.
Documentation is being added in ~~HADOOP-11522~~

Difference with 002.patch:

HDFS-like statistics counting: I prefer this, and you? See previous comment for more info.

Remark 2 and 3 from my comment on 001.patch are still open.

Thomas Demoor added a comment - 30/Jan/15 16:39 003.patch: concurrent uploading of parallel uploads / partUploads (of the same or different multipartUploads) in different threads controllable max memory footprint through threadpoolexecutor w/ blocking queue (causes dependency on HADOOP-11463 ) More info in: info-003.patch.md Speedup: my 10 MB/s connection from work upload of a 128MB file is 20% faster to AWS Ireland. Bigger files and higher throughput increase the speedup: directly on Amplidata's object store the difference is huge. Documentation is being added in HADOOP-11522 Difference with 002.patch: HDFS-like statistics counting: I prefer this, and you? See previous comment for more info. Remark 2 and 3 from my comment on 001.patch are still open.

Thomas Demoor added a comment - 11/Feb/15 11:07

Failures currently waits for all parts to fail, each typically taking up to a connection timeout. This should be improved but I'd like some feedback on the happy path first.

Thomas Demoor added a comment - 11/Feb/15 11:07 Failures currently waits for all parts to fail, each typically taking up to a connection timeout. This should be improved but I'd like some feedback on the happy path first.

Hadoop QA added a comment - 12/Feb/15 11:40

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12695553/info-003.patch.md
against trunk revision ac8d52b.

-1 patch. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5668//console

This message is automatically generated.

Hadoop QA added a comment - 12/Feb/15 11:40 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695553/info-003.patch.md against trunk revision ac8d52b. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5668//console This message is automatically generated.

Steve Loughran added a comment - 12/Feb/15 12:10

I don't know enough about the AWS library to comment on that aspect; others will have to.

One thing I would like to see (And which presumably could be applied to other bits of the S3a code), is better translation of AWS exceptions into common IOE subclasses. Eg. auth exception, file already exists exception, ....

There's enough places where things are being caught and wrapped you could have a generic static IOException convertException(AmazonClientException method somewhere to do the conversion. IOEs on their own are fairly uninformative.

Also: when wrapping exceptions, always include the{{toString()}} value of the nested exception in the extended text. That way, even if the nested exceptions get lost or stack traces don't get printed, the underlying problem is there for someone to look at.

Steve Loughran added a comment - 12/Feb/15 12:10 I don't know enough about the AWS library to comment on that aspect; others will have to. One thing I would like to see (And which presumably could be applied to other bits of the S3a code), is better translation of AWS exceptions into common IOE subclasses. Eg. auth exception, file already exists exception, .... There's enough places where things are being caught and wrapped you could have a generic static IOException convertException(AmazonClientException method somewhere to do the conversion. IOEs on their own are fairly uninformative. Also: when wrapping exceptions, always include the{{toString()}} value of the nested exception in the extended text. That way, even if the nested exceptions get lost or stack traces don't get printed, the underlying problem is there for someone to look at.

Hadoop QA added a comment - 12/Feb/15 12:10

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12698411/info-003.md
against trunk revision ac8d52b.

-1 patch. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5670//console

This message is automatically generated.

Hadoop QA added a comment - 12/Feb/15 12:10 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698411/info-003.md against trunk revision ac8d52b. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5670//console This message is automatically generated.

Hadoop QA added a comment - 12/Feb/15 12:34

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12695552/HADOOP-11183.003.patch
against trunk revision ac8d52b.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. There were no new javadoc warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-tools/hadoop-aws.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//artifact/patchprocess/newPatchFindbugsWarningshadoop-aws.html
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//console

This message is automatically generated.

Hadoop QA added a comment - 12/Feb/15 12:34 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695552/HADOOP-11183.003.patch against trunk revision ac8d52b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-aws. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//artifact/patchprocess/newPatchFindbugsWarningshadoop-aws.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//console This message is automatically generated.

Steve Loughran added a comment - 12/Feb/15 13:14

markdown docs should just go into src/site/markdown, with relevant links, as part of the overall patch.
please don't say bad things about HDFS performance, especially given ongoing work with erasure coding, TCP stack bypassing for local writes, etc etc. Best to leave all performance numbers out altogether.
consider that within an EC2 VM, memory storage should be faster than virtualized HDD, SSD may change the values, etc. etc.
...so just say "may offer performance improvements, especially on operations with the object store "close" to the writer, where "close" is defined by network bandwidth and latency"

Steve Loughran added a comment - 12/Feb/15 13:14 markdown docs should just go into src/site/markdown, with relevant links, as part of the overall patch. please don't say bad things about HDFS performance, especially given ongoing work with erasure coding, TCP stack bypassing for local writes, etc etc. Best to leave all performance numbers out altogether. consider that within an EC2 VM, memory storage should be faster than virtualized HDD, SSD may change the values, etc. etc. ...so just say "may offer performance improvements, especially on operations with the object store "close" to the writer, where "close" is defined by network bandwidth and latency"

Thomas Demoor added a comment - 12/Feb/15 16:56

The info.md markdown file (now pdf) contained design remarks (i.e., why can't this use TransferManager cfr. the rest of the code) not general documentation for the project. I have refactored (the paragraph on performance was indeed not of interest to the general community.) and renamed it to design-comments.pdf to make the intent clearer.

Thomas Demoor added a comment - 12/Feb/15 16:56 The info.md markdown file (now pdf) contained design remarks (i.e., why can't this use TransferManager cfr. the rest of the code) not general documentation for the project. I have refactored (the paragraph on performance was indeed not of interest to the general community.) and renamed it to design-comments.pdf to make the intent clearer.

Steve Loughran added a comment - 17/Feb/15 20:08

Thomas, if you can submit this in sync with trunk I'll take a look

Steve Loughran added a comment - 17/Feb/15 20:08 Thomas, if you can submit this in sync with trunk I'll take a look

Thomas Demoor added a comment - 18/Feb/15 10:24

004.patch: a rebase of 003.patch vs trunk. Thanks, stevel@apache.org.

Thomas Demoor added a comment - 18/Feb/15 10:24 004.patch: a rebase of 003.patch vs trunk. Thanks, stevel@apache.org .

Thomas Demoor added a comment - 18/Feb/15 11:39

I have some open questions:

Exceptions
Server side exceptions are thrown as AmazonServiceException (which extends AmazonClientException) returns quite some info on toString():

public String getMessage() {
        return "Status Code: " + getStatusCode() + ", "
            + "AWS Service: " + getServiceName() + ", "
            + "AWS Request ID: " + getRequestId() + ", "
            + "AWS Error Code: " + getErrorCode() + ", "
            + "AWS Error Message: " + super.getMessage();
    }

The error codes are detailed here: http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

A more general AmazonClientException is thrown if the server could not be reached (or there is another client side problem). Do you want me to wrap the entire list in the link above in standard Java exceptions? I agree that typed exceptions are good as they provide more info, but considering the detailed response codes, this might not be the top priority.

Statistics

Unlike hdfs, s3a passes null to the wrapping FSDataOutputStream and does the statistics counting itself. It counts the bytes transferred to the server (double counting retries, etc.) by adding listeners (from AWS lib) to the uploads. It also calls statistics.incrementWriteOps(1) on every part of a multipart upload. It thus gives an S3-centric view of the filesystem stats, not a Hadoop one.

The introduced S3AFastOutputStream leaves bytecounting to FSDataOutputStream (cfr. hdfs) and only counts a writeOp per successful create(). It thus has different behavior. Should I revert to the S3AOutputStream way or will we change that to be HDFS like (in a separate jira)?

Failures

Currently, failure of a MultiPartUpload is only checked upon closing the file. So f.i. if the server is unreachable each part waits for the connection setup timeout to fail, which takes a while. Once one part has failed, we should abort asap. I think adding a callback to each partUpload(ListenableFuture) that sets an AtomicBoolean failed = true if it has failed and checking this before starting a partUpload allows us to throw the Exception at the start of the next partUpload.

Thomas Demoor added a comment - 18/Feb/15 11:39 I have some open questions: Exceptions Server side exceptions are thrown as AmazonServiceException (which extends AmazonClientException) returns quite some info on toString(): public String getMessage() { return "Status Code: " + getStatusCode() + ", " + "AWS Service: " + getServiceName() + ", " + "AWS Request ID: " + getRequestId() + ", " + "AWS Error Code: " + getErrorCode() + ", " + "AWS Error Message: " + super .getMessage(); } The error codes are detailed here: http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html A more general AmazonClientException is thrown if the server could not be reached (or there is another client side problem). Do you want me to wrap the entire list in the link above in standard Java exceptions? I agree that typed exceptions are good as they provide more info, but considering the detailed response codes, this might not be the top priority. Statistics Unlike hdfs, s3a passes null to the wrapping FSDataOutputStream and does the statistics counting itself. It counts the bytes transferred to the server (double counting retries, etc.) by adding listeners (from AWS lib) to the uploads. It also calls statistics.incrementWriteOps(1) on every part of a multipart upload. It thus gives an S3-centric view of the filesystem stats, not a Hadoop one. The introduced S3AFastOutputStream leaves bytecounting to FSDataOutputStream (cfr. hdfs) and only counts a writeOp per successful create(). It thus has different behavior. Should I revert to the S3AOutputStream way or will we change that to be HDFS like (in a separate jira)? Failures Currently, failure of a MultiPartUpload is only checked upon closing the file. So f.i. if the server is unreachable each part waits for the connection setup timeout to fail, which takes a while. Once one part has failed, we should abort asap. I think adding a callback to each partUpload(ListenableFuture) that sets an AtomicBoolean failed = true if it has failed and checking this before starting a partUpload allows us to throw the Exception at the start of the next partUpload.

Steve Loughran added a comment - 18/Feb/15 19:24

Exceptions

we're going to need them translated into IOExceptions
I'd like typed for the basic FileNotFound exception & permissions, though not so typed that they lose data

Statistics.

I don't have a good opinion here. For the swift stuff I didn't do enough integration with Hadoop stats; I did collect counters on the various HTTP verbs, including average time, which helped track down some throttling of DELETE.

We can evolve statistics, especially if you highlight "unstable".

Failures

Fail-fast would be my preference, unless there's some attempt to retry on a transient outage. That might be a good option on some IOEs, (connection &c), but not others (Authentication failures)

Steve Loughran added a comment - 18/Feb/15 19:24 Exceptions we're going to need them translated into IOExceptions I'd like typed for the basic FileNotFound exception & permissions, though not so typed that they lose data Statistics. I don't have a good opinion here. For the swift stuff I didn't do enough integration with Hadoop stats; I did collect counters on the various HTTP verbs, including average time, which helped track down some throttling of DELETE. We can evolve statistics, especially if you highlight "unstable". Failures Fail-fast would be my preference, unless there's some attempt to retry on a transient outage. That might be a good option on some IOEs, (connection &c), but not others (Authentication failures)

Thomas Demoor added a comment - 20/Feb/15 14:35

Marked as unstable.

The underlying httpclient retries retriable errors and you can control it though fs.s3a.attempts.maximum

Did some more investigation and the dominant time it takes for failure is the dns failing to resolve. After that the subsequent parts fail fast in line with what is set in fs.s3a.establish.timeout. So the fail-fast I had in mind (and have implemented) seems premature optimization. Have been testing the current code for some time so I think we shouldn't take the risk to put fail-fast in so close to 2.7, I'll open up a separate jira for fail-fast.

Added site and core-default documentation. While passing by I corrected the description of the connection timeouts: they are defined in milliseconds, not seconds.

Thomas Demoor added a comment - 20/Feb/15 14:35 Marked as unstable. The underlying httpclient retries retriable errors and you can control it though fs.s3a.attempts.maximum Did some more investigation and the dominant time it takes for failure is the dns failing to resolve. After that the subsequent parts fail fast in line with what is set in fs.s3a.establish.timeout. So the fail-fast I had in mind (and have implemented) seems premature optimization. Have been testing the current code for some time so I think we shouldn't take the risk to put fail-fast in so close to 2.7, I'll open up a separate jira for fail-fast. Added site and core-default documentation. While passing by I corrected the description of the connection timeouts: they are defined in milliseconds, not seconds.

Hadoop QA added a comment - 21/Feb/15 07:27

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12699876/HADOOP-11183-005.patch
against trunk revision 6f01330.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. There were no new javadoc warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//artifact/patchprocess/newPatchFindbugsWarningshadoop-aws.html
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//console

This message is automatically generated.

Hadoop QA added a comment - 21/Feb/15 07:27 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699876/HADOOP-11183-005.patch against trunk revision 6f01330. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//artifact/patchprocess/newPatchFindbugsWarningshadoop-aws.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//console This message is automatically generated.

Steve Loughran added a comment - 21/Feb/15 12:59

I have done a code review, but not actually used this yet. I'm worried it's being looked at a bit late for the 2.7 branch, but I can't fault anyone but myself and other reviewers for that.

If we get it in, I'd like the markdown docs to highlight "experimental, unstable, use at own risk" for this option. Then 2.7 can be viewed as stabilisation of it, with goal being trusted by 2.8 & that warning removed. Perhaps a special subsection of s3a, "experiment fast output stream" and some specifics, rather than just listing a config flag, "fast", which everyone is going to turn on based on its name alone.

constructor

check for and reject negative buffer size. Maybe have a check & Warn/reject too small a buffer?
S3AFastOutputStream constructor should log @debug, not info

`write()`

would benefit from some comments explaining the two branches, e.g. "copy into buffer" and "buffer full, write it" + "recursively write remainder"
If I created a 1-byte buffer, would that tail recursion trigger a stack overflow? If so, some checks in ctor may be good, such as just hard coded minimum & upping low values to it.

`close()`

I'd swap the operations in the finally clause, so no matter what the superclass does, this close operation always finishes. Better yet, set close() earlier.

`uploadBuffer()`

make private

`putObject()`

InterruptedException shouldn't be swallowed here, it hides problems and can cause process shutdowns to hang. Either convert to InterruptedIOException, or set interrupted flag on the current thread again.

`waitForAllPartUploads()`

Need a policy on interrupts here too. Imagine that communications with a remote S3 server are blocked, something has just sent a kill -3 to the client app, and is now waiting for it to finish cleanly...

`ProgressableListener`

can me made private

Other.

Can we have flush() do a partial write? Would that gain anything in durability terms? At the very least, it may make timing of writes more consistent with other filesystems.
I can think of some more tests here. If S3AFastOutputStream added a counter of #of uploads, a test could grab that stream and verify that writes triggered it, flushes triggered it, 0-byte-writes didn't, close cleaned up, etc. Also be interesting to time & compare classic vs fast operations & print that in the test results.

Steve Loughran added a comment - 21/Feb/15 12:59 I have done a code review, but not actually used this yet. I'm worried it's being looked at a bit late for the 2.7 branch, but I can't fault anyone but myself and other reviewers for that. If we get it in, I'd like the markdown docs to highlight "experimental, unstable, use at own risk" for this option. Then 2.7 can be viewed as stabilisation of it, with goal being trusted by 2.8 & that warning removed. Perhaps a special subsection of s3a, "experiment fast output stream" and some specifics, rather than just listing a config flag, "fast", which everyone is going to turn on based on its name alone. constructor check for and reject negative buffer size. Maybe have a check & Warn/reject too small a buffer? S3AFastOutputStream constructor should log @debug, not info write() would benefit from some comments explaining the two branches, e.g. "copy into buffer" and "buffer full, write it" + "recursively write remainder" If I created a 1-byte buffer, would that tail recursion trigger a stack overflow? If so, some checks in ctor may be good, such as just hard coded minimum & upping low values to it. close() I'd swap the operations in the finally clause, so no matter what the superclass does, this close operation always finishes. Better yet, set close() earlier. uploadBuffer() make private putObject() InterruptedException shouldn't be swallowed here, it hides problems and can cause process shutdowns to hang. Either convert to InterruptedIOException, or set interrupted flag on the current thread again. waitForAllPartUploads() Need a policy on interrupts here too. Imagine that communications with a remote S3 server are blocked, something has just sent a kill -3 to the client app, and is now waiting for it to finish cleanly... ProgressableListener can me made private Other. Can we have flush() do a partial write? Would that gain anything in durability terms? At the very least, it may make timing of writes more consistent with other filesystems. I can think of some more tests here. If S3AFastOutputStream added a counter of #of uploads, a test could grab that stream and verify that writes triggered it, flushes triggered it, 0-byte-writes didn't, close cleaned up, etc. Also be interesting to time & compare classic vs fast operations & print that in the test results.

Thomas Demoor added a comment - 22/Feb/15 23:06

I ran with the mark as unstable suggestion and implemented your suggestions.

write()
2. If I created a 1-byte buffer, would that tail recursion trigger a stack overflow? If so, some checks in ctor may be good, such as just hard coded minimum & upping low values to it.

No, the exector's linkedblockingqueue blocks writes once max.total.tasks are waiting.

Can we have flush() do a partial write? Would that gain anything in durability terms? At the very least, it may make timing of writes more consistent with other filesystems.

AWS has a minimum upload size of 5MB, so if the buffer is currently smaller .... I can think of a workaround but it's messy.

I can think of some more tests here. If S3AFastOutputStream added a counter of #of uploads, a test could grab that stream and verify that writes triggered it, flushes triggered it, 0-byte-writes didn't, close cleaned up, etc. Also be interesting to time & compare classic vs fast operations & print that in the test results.

I agree we should add these over time. Counting async operations, is non-trivial however.

Thomas Demoor added a comment - 22/Feb/15 23:06 I ran with the mark as unstable suggestion and implemented your suggestions. write() 2. If I created a 1-byte buffer, would that tail recursion trigger a stack overflow? If so, some checks in ctor may be good, such as just hard coded minimum & upping low values to it. No, the exector's linkedblockingqueue blocks writes once max.total.tasks are waiting. Can we have flush() do a partial write? Would that gain anything in durability terms? At the very least, it may make timing of writes more consistent with other filesystems. AWS has a minimum upload size of 5MB, so if the buffer is currently smaller .... I can think of a workaround but it's messy. I can think of some more tests here. If S3AFastOutputStream added a counter of #of uploads, a test could grab that stream and verify that writes triggered it, flushes triggered it, 0-byte-writes didn't, close cleaned up, etc. Also be interesting to time & compare classic vs fast operations & print that in the test results. I agree we should add these over time. Counting async operations, is non-trivial however.

Hadoop QA added a comment - 23/Feb/15 00:06

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12700121/HADOOP-11183-006.patch
against trunk revision fe7a302.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. There were no new javadoc warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5757//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5757//console

This message is automatically generated.

Hadoop QA added a comment - 23/Feb/15 00:06 +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700121/HADOOP-11183-006.patch against trunk revision fe7a302. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5757//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5757//console This message is automatically generated.

Thomas Demoor added a comment - 23/Feb/15 10:20

Addendum to my previous comment: to be more clear on the flush(): minimum size of 5MB is for a partUpload (of a on going multi-part upload). Evidently, one can write smaller objects through a single put.

The reason I think it's safe to put this in 2.7 as unstable is that the codepath is never touched when the config flag is set to false (default). It's a drop in replacement for S3AOutputStream, but is only used if the user opts in.

Thomas Demoor added a comment - 23/Feb/15 10:20 Addendum to my previous comment: to be more clear on the flush(): minimum size of 5MB is for a partUpload (of a on going multi-part upload). Evidently, one can write smaller objects through a single put. The reason I think it's safe to put this in 2.7 as unstable is that the codepath is never touched when the config flag is set to false (default). It's a drop in replacement for S3AOutputStream, but is only used if the user opts in.

Steve Loughran added a comment - 27/Feb/15 01:57

OK, doing a test run on this. Like you say, it's harmless unless enabled, and the docs now make that clear.

The main risk is taking on the obligation to maintain it for an indefinite period. However, on the basis that we could always return the classic output stream, cut this code and simply print a warning if the flag is set, that's a low cost obligation.

Steve Loughran added a comment - 27/Feb/15 01:57 OK, doing a test run on this. Like you say, it's harmless unless enabled, and the docs now make that clear. The main risk is taking on the obligation to maintain it for an indefinite period. However, on the basis that we could always return the classic output stream, cut this code and simply print a warning if the flag is set, that's a low cost obligation.

Steve Loughran added a comment - 27/Feb/15 02:05

does TestS3ASmallMultiPartUpload actually enable the new feature?
a 40 MB upload/download can still take time on a slow link. Could we make this smaller, so the whole upload size is less than, say 1 MB?

Steve Loughran added a comment - 27/Feb/15 02:05 does TestS3ASmallMultiPartUpload actually enable the new feature? a 40 MB upload/download can still take time on a slow link. Could we make this smaller, so the whole upload size is less than, say 1 MB?

Steve Loughran added a comment - 27/Feb/15 02:08

looking at the comments, I see that 5MB is the minimum.

How about a test that writes, say 6MB?

Steve Loughran added a comment - 27/Feb/15 02:08 looking at the comments, I see that 5MB is the minimum. How about a test that writes, say 6MB?

Thomas Demoor added a comment - 27/Feb/15 05:25

Thanks Steve.

I agree on the 6MB multipart. Fast upload is not used by default. You still have to enable it in core-site.xml FastOutputStream does both single uploads and multipart. I added this test just to have a small multipartupload (rather than that commented out large multipart test somewhere) to test both outpustream implementations. Would you like me to force fast.upload to true?

Thomas Demoor added a comment - 27/Feb/15 05:25 Thanks Steve. I agree on the 6MB multipart. Fast upload is not used by default. You still have to enable it in core-site.xml FastOutputStream does both single uploads and multipart. I added this test just to have a small multipartupload (rather than that commented out large multipart test somewhere) to test both outpustream implementations. Would you like me to force fast.upload to true?

Steve Loughran added a comment - 27/Feb/15 18:45

For a test, yes, both output streams. You're adding new production code here

Steve Loughran added a comment - 27/Feb/15 18:45 For a test, yes, both output streams. You're adding new production code here

Thomas Demoor added a comment - 02/Mar/15 17:07

TestClass now forces fast upload to be true. Tests a regular 1MB upload and a 5+1MB multi-part upload.

Thomas Demoor added a comment - 02/Mar/15 17:07 TestClass now forces fast upload to be true. Tests a regular 1MB upload and a 5+1MB multi-part upload.

Hadoop QA added a comment - 02/Mar/15 18:08

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12701903/HADOOP-11183-007.patch
against trunk revision ca1c00b.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 2 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. There were no new javadoc warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws:

org.apache.hadoop.ipc.TestRPCWaitForProxy

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5813//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5813//console

This message is automatically generated.

Hadoop QA added a comment - 02/Mar/15 18:08 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701903/HADOOP-11183-007.patch against trunk revision ca1c00b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws: org.apache.hadoop.ipc.TestRPCWaitForProxy Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5813//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5813//console This message is automatically generated.

Steve Loughran added a comment - 03/Mar/15 05:45

The patch didn't apply as the (spurious?) changes to the fs contract base test were incompatible; the 008 patch fixes that by removing that file from the diff and has some formatting & typo fixes to the documentation. I'll do a test run when I next have bandwidth; from the code review itself it looks good.

One issue is that I didn't fully understand the bit in the docs regarding "the maximum size of a memory buffer ", mapping the properties to the

The maximum size of a memory buffer is `fs.s3a.multipart.threshold` / `fs.s3a.multipart.size` for an upload / partupload respectively.

What does that mean?

Steve Loughran added a comment - 03/Mar/15 05:45 The patch didn't apply as the (spurious?) changes to the fs contract base test were incompatible; the 008 patch fixes that by removing that file from the diff and has some formatting & typo fixes to the documentation. I'll do a test run when I next have bandwidth; from the code review itself it looks good. One issue is that I didn't fully understand the bit in the docs regarding "the maximum size of a memory buffer ", mapping the properties to the The maximum size of a memory buffer is `fs.s3a.multipart.threshold` / `fs.s3a.multipart.size` for an upload / partupload respectively. What does that mean?

Hadoop QA added a comment - 03/Mar/15 06:57

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12702087/HADOOP-11183-008.patch
against trunk revision d1c6acc.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. There were no new javadoc warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5824//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5824//console

This message is automatically generated.

Hadoop QA added a comment - 03/Mar/15 06:57 +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702087/HADOOP-11183-008.patch against trunk revision d1c6acc. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5824//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5824//console This message is automatically generated.

Thomas Demoor added a comment - 03/Mar/15 11:43

Spurious indeed. Clarified threshold and partsize settings.

Thomas Demoor added a comment - 03/Mar/15 11:43 Spurious indeed. Clarified threshold and partsize settings.

Hadoop QA added a comment - 03/Mar/15 11:50

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12702156/HADOOP-11183-009.patch
against trunk revision 4228de9.

-1 patch. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5831//console

This message is automatically generated.

Hadoop QA added a comment - 03/Mar/15 11:50 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702156/HADOOP-11183-009.patch against trunk revision 4228de9. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5831//console This message is automatically generated.

Thomas Demoor added a comment - 03/Mar/15 19:27

Rebased vs trunk

Thomas Demoor added a comment - 03/Mar/15 19:27 Rebased vs trunk

Hadoop QA added a comment - 03/Mar/15 22:05

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12702232/HADOOP-11183-010.patch
against trunk revision e17e5ba.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

-1 javac. The applied patch generated 1236 javac compiler warnings (more than the trunk's current 1199 warnings).

-1 javadoc. The javadoc tool appears to have generated 25 warning messages.
See https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//artifact/patchprocess/diffJavadocWarnings.txt for details.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws:

org.apache.hadoop.ha.TestZKFailoverControllerStress

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//testReport/
Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//console

This message is automatically generated.

Hadoop QA added a comment - 03/Mar/15 22:05 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702232/HADOOP-11183-010.patch against trunk revision e17e5ba. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The applied patch generated 1236 javac compiler warnings (more than the trunk's current 1199 warnings). -1 javadoc . The javadoc tool appears to have generated 25 warning messages. See https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//artifact/patchprocess/diffJavadocWarnings.txt for details. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws: org.apache.hadoop.ha.TestZKFailoverControllerStress Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//console This message is automatically generated.

Steve Loughran added a comment - 04/Mar/15 00:15

+1

Tested locally; 6 MB upload actually took "5.504" seconds, which is surprisingly fast, but I couldn't see any evidence that it hadn't happened, and as a regular one to 1.7s, it scaled proportionally to the 6x size increase.
saw an intermittent failure of TestS3AFileSystemContract.testRenameRootDirForbidden() which is something seen before, and unrelated to this patch.
docs clearer now
javadoc and javadoc warnings appear unrelated
core test failure is the intermittent ZK one again.

applying, Thanks!

Steve Loughran added a comment - 04/Mar/15 00:15 +1 Tested locally; 6 MB upload actually took "5.504" seconds, which is surprisingly fast, but I couldn't see any evidence that it hadn't happened, and as a regular one to 1.7s, it scaled proportionally to the 6x size increase. saw an intermittent failure of TestS3AFileSystemContract.testRenameRootDirForbidden() which is something seen before, and unrelated to this patch. docs clearer now javadoc and javadoc warnings appear unrelated core test failure is the intermittent ZK one again. applying, Thanks!

Hudson added a comment - 04/Mar/15 00:26

FAILURE: Integrated in Hadoop-trunk-Commit #7247 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7247/)
~~HADOOP-11183~~. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
hadoop-common-project/hadoop-common/CHANGES.txt
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java

Hudson added a comment - 04/Mar/15 00:26 FAILURE: Integrated in Hadoop-trunk-Commit #7247 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7247/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java

Thomas Demoor added a comment - 04/Mar/15 09:17

Thanks, Steve.

Thomas Demoor added a comment - 04/Mar/15 09:17 Thanks, Steve.

Hudson added a comment - 04/Mar/15 10:41

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/)
~~HADOOP-11183~~. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
hadoop-common-project/hadoop-common/CHANGES.txt
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java

Hudson added a comment - 04/Mar/15 10:41 FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java

Hudson added a comment - 04/Mar/15 10:43

FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/856/)
~~HADOOP-11183~~. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
hadoop-common-project/hadoop-common/CHANGES.txt

Hudson added a comment - 04/Mar/15 10:43 FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/856/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-common-project/hadoop-common/CHANGES.txt

Hudson added a comment - 04/Mar/15 14:22

FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/)
~~HADOOP-11183~~. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
hadoop-common-project/hadoop-common/CHANGES.txt
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java

Hudson added a comment - 04/Mar/15 14:22 FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java

Hudson added a comment - 04/Mar/15 14:23

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/)
~~HADOOP-11183~~. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
hadoop-common-project/hadoop-common/CHANGES.txt
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md

Hudson added a comment - 04/Mar/15 14:23 FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md

Hudson added a comment - 04/Mar/15 15:07

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/)
~~HADOOP-11183~~. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
hadoop-common-project/hadoop-common/CHANGES.txt

Hudson added a comment - 04/Mar/15 15:07 FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-common-project/hadoop-common/CHANGES.txt

Hudson added a comment - 04/Mar/15 15:35

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/)
~~HADOOP-11183~~. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
hadoop-common-project/hadoop-common/CHANGES.txt
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java

Hudson added a comment - 04/Mar/15 15:35 SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java

Hadoop Common

Memory-based S3AOutputstream

Details

Description

Attachments

Attachments

Issue Links

Activity

Overview

A rough algorithm

Memory usage

Exceptions

Statistics.

Failures

constructor

`write()`

`close()`

`uploadBuffer()`

`putObject()`

`waitForAllPartUploads()`

`ProgressableListener`

Other.

People

Dates