Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8302

ATS v2 should handle HBase connection issue properly

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.2.0, 3.1.1
    • ATSv2
    • Reviewed

    Description

      ATS v2 call times out with below error when it can't connect to HBase instance.

      bash-4.2$ curl -i -k -s -1  -H 'Content-Type: application/json'  -H 'Accept: application/json' --max-time 5   --negotiate -u : 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092'
      curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
      
      ATS log
      2018-05-15 23:10:03,623 INFO  client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
      ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1
      2018-05-15 23:10:13,651 INFO  client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
      ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1
      2018-05-15 23:10:23,730 INFO  client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
      ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1
      2018-05-15 23:10:33,788 INFO  client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
      ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1

      There are two issues here.
      1) Check why ATS can't connect to HBase
      2) In case of connection error, ATS call should not get timeout. It should fail with proper error.

      Attachments

        1. YARN-8302.1.patch
          17 kB
          Billie Rinaldi
        2. YARN-8302.2.patch
          18 kB
          Sunil G
        3. YARN-8302.3.patch
          18 kB
          Sunil G

        Issue Links

          Activity

            People

              billie Billie Rinaldi
              yeshavora Yesha Vora
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: