Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-12995

Ambari alerts reports "UNKNOWN" error for secondary YARN RM and NM in a kerberoized YARN HA deployment

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.1
    • Fix Version/s: 2.3.0
    • Component/s: alerts
    • Labels:
      None
    • Environment:

      Requires YARN HA with Kerberos

      Description

      What is observed:

      On my currently active YARN NodeManager and ResourceManager, Ambari
      alerts are fine.

      On the secondary YARN NodeManager and ResourceManager, Ambari reports
      "Status: Unknown" / "HTTP 200 response (metrics unavailable)". This
      is for the alerts:

      • NodeManager Health Summary
      • ResourceManager CPU Utilization
      • ResourceManager RPC Latency

      The Ambari web interface does not make this error obvious, as it says
      "0 alerts" in the top bar. But you can see the alerts with "unknown"
      status when you go to the ambari alerts page, or if you query the
      alerts API.

      What is expected:
      Ambari alerts does not generate any alarms on a secondary YARN HA node as long as the node is responsive.


      A network dump of the ambari poll against the secondary RM looks like:

      Request:
      """
      GET /jmx?qry=Hadoop:service=ResourceManager,name=RMNMInfo HTTP/1.1
      ...
      """

      Response:
      """
      HTTP/1.1 200 OK
      ...
      Refresh: 3; url=http://

      {my-primary-rm}:8088/jmx
      Content-Length: 106
      Server: Jetty(6.1.26.hwx)

      This is standby RM. Redirecting to the current active RM:
      http://{my-primary-rm}

      :8088/jmx
      """


      I'm also filing a JIRA against YARN (per request from jhurley) and will post that info here.

      Comment from Jonathan Hurley jhurley@hortonworks.com:

      This is caused by how YARN does HA mode. With two YARN RMs, the standby RM returns a 200 response with a JavaScript redirect instead of an 3xx redirection. When not using Kerberos, Ambari should be able to parse the headers and follow the JS-based redirect. However, on a Kerberized cluster, we use curl which cannot do this. Therefore, requests against the secondary RM will return an UNKNOWN response since it did get a 200. I think a few things can be improved here:

      1) There should be a ticket filed for YARN to have their HA mode use a proper redirect
      2) Ambari might not want to produce an UNKNOWN response here since it gives a false feeling that something went wrong.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                arobertson Andrew Robertson
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: