Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8302

ATS v2 should handle HBase connection issue properly

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.2.0, 3.1.1
    • Component/s: ATSv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      ATS v2 call times out with below error when it can't connect to HBase instance.

      bash-4.2$ curl -i -k -s -1  -H 'Content-Type: application/json'  -H 'Accept: application/json' --max-time 5   --negotiate -u : 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092'
      curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
      
      ATS log
      2018-05-15 23:10:03,623 INFO  client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
      ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1
      2018-05-15 23:10:13,651 INFO  client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
      ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1
      2018-05-15 23:10:23,730 INFO  client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
      ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1
      2018-05-15 23:10:33,788 INFO  client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
      ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1

      There are two issues here.
      1) Check why ATS can't connect to HBase
      2) In case of connection error, ATS call should not get timeout. It should fail with proper error.

        Attachments

        1. YARN-8302.1.patch
          17 kB
          Billie Rinaldi
        2. YARN-8302.2.patch
          18 kB
          Sunil Govindan
        3. YARN-8302.3.patch
          18 kB
          Sunil Govindan

          Activity

            People

            • Assignee:
              billie.rinaldi Billie Rinaldi
              Reporter:
              yeshavora Yesha Vora
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: