[JCR-4369] Avoid S3 Incomplete Read Warning - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.16.3, 2.17.5
Fix Version/s: 2.18, 2.17.6
Component/s: jackrabbit-aws-ext
Labels:
None

Description

While using S3DataStore, the following logs are observed occasionally:

WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged 
GET or drain the input stream after use.

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in S3DataStore implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as recommendation in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).

After looking at the code, I noticed that CachingDataStore#proactiveCaching is enabled by default, which means the S3DataStore tries to proactively download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the minor problem is now, whenever the S3DataStore reads content (in other words get an input stream on an S3Object, it is recommended to read all data or abort the input stream. Just to close the input stream is not good enough in AWS SDK perspective, resulting in the warning. See S3AbortableInputStream#close() method. [1]

Therefore, some S3 related classes (such as org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream), CachingDataStore#getStream(DataIdentifier), etc.) should be improved like the following:

If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
Otherwise, it should abort the underlying S3ObjectInputStream.

The issue is a known one in AWS toolkit. [2,3] It seems like clients using the toolkit needs to abort the input stream if it doesn't want to read data fully.

[1] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
[2] https://github.com/aws/aws-sdk-java/issues/1211
[3] https://github.com/aws/aws-sdk-java/issues/1657

Attachments

Issue Links

is related to

OAK-9646 Avoid S3 Incomplete Read Warning

Open

links to

GitHub Pull Request #61

Activity

People

Assignee:: Amit Jain

Reporter:: Woonsan Ko

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Sep/18 17:52

Updated:: 16/Dec/21 18:36

Resolved:: 07/Sep/18 05:50