Uploaded image for project: 'Jackrabbit Content Repository'
  1. Jackrabbit Content Repository
  2. JCR-4369

Avoid S3 Incomplete Read Warning

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.16.3, 2.17.5
    • Fix Version/s: 2.18, 2.17.6
    • Component/s: jackrabbit-aws-ext
    • Labels:
      None

      Description

      While using S3DataStore, the following logs are observed occasionally:

      WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
      aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged 
      GET or drain the input stream after use.
      

      The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in S3DataStore implementation. Not just caused by a broken http connection by client.

      By the way, this issue is not a major one as AWS toolkit seems to just give a warning as recommendation in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).

      After looking at the code, I noticed that CachingDataStore#proactiveCaching is enabled by default, which means the S3DataStore tries to proactively download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

      Anyway, the minor problem is now, whenever the S3DataStore reads content (in other words get an input stream on an S3Object, it is recommended to read all data or abort the input stream. Just to close the input stream is not good enough in AWS SDK perspective, resulting in the warning. See S3AbortableInputStream#close() method. [1]

      Therefore, some S3 related classes (such as org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream), CachingDataStore#getStream(DataIdentifier), etc.) should be improved like the following:

      • If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
      • Otherwise, it should abort the underlying S3ObjectInputStream.

      The issue is a known one in AWS toolkit. [2,3] It seems like clients using the toolkit needs to abort the input stream if it doesn't want to read data fully.

      [1] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
      [2] https://github.com/aws/aws-sdk-java/issues/1211
      [3] https://github.com/aws/aws-sdk-java/issues/1657

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                amitjain Amit Jain
                Reporter:
                woon_san Woonsan Ko
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: