Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.8
-
None
-
None
Description
I am running complex queries against Parquet files on S3 (about 17GB) on a large machine (m5d.24xlarge on EC2, which has 96 vCPUs) and get the errors like the following:
java.io.InterruptedIOException: getFileStatus on s3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
java.io.InterruptedIOException: Reopen at position 15899845068 ons3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
This seems to originate from the AWS SDK, where this error may apparently occur if (1) the S3Object is not closed properly, or (2) too many requests are being made to the bucket. The last time I tried, I found the request limit to S3 to be in the order of 6k/s; is it possible that that limit is reached in my workload?
Let me know what kind of information you need to get to the bottom of the problem.
Attachments
Issue Links
- is duplicated by
-
ASTERIXDB-2945 Ensure S3 connection pool is large enough
- Closed