Details
Description
ATS v2 call times out with below error when it can't connect to HBase instance.
bash-4.2$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 'Accept: application/json' --max-time 5 --negotiate -u : 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092' curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
ATS log
2018-05-15 23:10:03,623 INFO client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow, ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1 2018-05-15 23:10:13,651 INFO client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow, ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1 2018-05-15 23:10:23,730 INFO client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow, ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1 2018-05-15 23:10:33,788 INFO client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow, ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1
There are two issues here.
1) Check why ATS can't connect to HBase
2) In case of connection error, ATS call should not get timeout. It should fail with proper error.
Attachments
Attachments
Issue Links
- is related to
-
YARN-9374 HBaseTimelineWriterImpl sync writes has to avoid thread blocking if storage down
- Resolved