Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-2944

"SdkClientException: Timeout waiting for connection from pool" when using Parquet on S3 at large scale

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.8
    • None
    • EXT - External data
    • None

    Description

      I am running complex queries against Parquet files on S3 (about 17GB) on a large machine (m5d.24xlarge on EC2, which has 96 vCPUs) and get the errors like the following:

      java.io.InterruptedIOException: getFileStatus on s3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

      java.io.InterruptedIOException: Reopen at position 15899845068 ons3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

      This seems to originate from the AWS SDK, where this error may apparently occur if (1) the S3Object is not closed properly, or (2) too many requests are being made to the bucket. The last time I tried, I found the request limit to S3 to be in the order of 6k/s; is it possible that that limit is reached in my workload?

      Let me know what kind of information you need to get to the bottom of the problem.

      Attachments

        Issue Links

          Activity

            People

              wyk Wail Y. Alkowaileet
              ingomueller.net Ingo Müller
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: