Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
v3.0.0, v3.1.0
-
None
-
None
-
None
Description
Environment
- Kylin server 3.0.0
- EMR 5.28
Issue
After an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause in both cases is:
Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?]
Based on https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/ increasing the fs.s3.maxConnections setting to 10000 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem.
A full stack trace from the QueryService is attached.