Uploaded image for project: 'Apache Apex Malhar'
  1. Apache Apex Malhar
  2. APEXMALHAR-2174

S3 File Reader reading more data than expected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.5.0
    • None
    • None

    Description

      This is observed through the AWS billing.
      Issue might be the S3InputStream.read() which is used in readEntity().

      Reading the block can be achieved through the AmazonS3 api's. So, I am proposing the following solution:
      ```
      GetObjectRequest rangeObjectRequest = new GetObjectRequest(
      bucketName, key);
      rangeObjectRequest.setRange(startByte, noOfBytes);
      S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
      S3ObjectInputStream wrappedStream = objectPortion.getObjectContent();
      byte[] record = ByteStreams.toByteArray(wrappedStream);

      Advantages of this solution: Parallel read will work for all types of s3 file systems.

      Attachments

        Issue Links

          Activity

            People

              chaithu Chaitanya Chebolu
              chaithu Chaitanya Chebolu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: