[CAMEL-14929] camel-aws2-s3 - Doesn't support stream download of large files. - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.2.0
Fix Version/s: 3.6.0
Component/s: camel-aws2
Labels:
None

Estimated Complexity:
Unknown

Description

Hi,

The component `camel-aws2-s3` should be able to support streaming consume/download to allow the copy/download of large files from S3. The current implementation or 'saves' the contents of input stream into the memory or completely disregard it not giving a change for the next components to manipulate the stream. This seems to be a no ideal implementation.

The issue essentially is on class `org.apache.camel.component.aws2.s3.AWS2S3Endpoint` in between lines 169 to 178 and lines 201 to 212.

The logic on lines 169 to 178 there is:

if the parameter `includeBody` is true it will consume the S3 stream into the memory which is no ideal for large files.
if the parameter `includeBody` is false it won't consume the S3 stream however the S3 stream will be lost, I couldn't find any other way to access it therefore the S3 is open for nothing on this case. This doesn't seem reasonable as well. I think the S3 stream should be put in the `body` raw so the next component in the pipeline can consume it.

The logic on lines 201 to 212 is:

if the parameter `includeBody` is false it surprisingly close the S3 input stream confirming that there will be no way to consume it afterwards.
if the parameter `includeBody` is true the S3 input stream will be left open however there is way to access it as it is created on line 77 of `org.apache.camel.component.aws2.s3.AWS2S3Consumer` and afterwards if not included in the body it get lost.

The ideal behaviour I think would be:

if `includedBody` is true then consume S3 input stream into the memory, save it in the body and close it.
if `includeBody` is false then put the raw S3 input stream in the body and don't close it.
if `autoCloseBody` is true then schedule the S3 input stream closing for when exchange is finished.
if `autoCloseBody` is false then leave to caller to close it which I'm not sure how this can be done in the current implementation.