[ASTERIXDB-2944] "SdkClientException: Timeout waiting for connection from pool" when using Parquet on S3 at large scale - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.8
Fix Version/s: None
Component/s: EXT - External data
Labels:
None

Description

I am running complex queries against Parquet files on S3 (about 17GB) on a large machine (m5d.24xlarge on EC2, which has 96 vCPUs) and get the errors like the following:

java.io.InterruptedIOException: getFileStatus on s3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

java.io.InterruptedIOException: Reopen at position 15899845068 ons3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

This seems to originate from the AWS SDK, where this error may apparently occur if (1) the S3Object is not closed properly, or (2) too many requests are being made to the bucket. The last time I tried, I found the request limit to S3 to be in the order of 6k/s; is it possible that that limit is reached in my workload?

Let me know what kind of information you need to get to the bottom of the problem.

Attachments

Issue Links

is duplicated by

ASTERIXDB-2945 Ensure S3 connection pool is large enough

Closed

Activity

People

Assignee:: Wail Y. Alkowaileet

Reporter:: Ingo Müller

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Aug/21 18:20

Updated:: 19/Aug/21 18:48

Resolved:: 19/Aug/21 18:48