Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-23866

Kerberos Service Check failure due to kinit failure on random node

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Implemented
    • 2.5.2
    • 2.7.0
    • None
    • Multiple Kerberos Domain Controllers across multiple data centers for single realm.

    Description

      We were seeing Kerberos Service checks failures in Ambari. Specifically it would fail during the first run of the day, succeed on the second, then fail on the next but succeed if run again and so forth.

      Reviewing the operation log, it showed kinit failure from random node(s)
      kinit: Client XXXX not found in Kerberos database while getting initial credentials

      Since AMBARI-9852

      The service check must perform the following steps:
        1.Create a unique principal in the relevant KDC (server)
        2.Test that the principal can be used to authenticate via kinit (agent)
        3.Destroy the principal (server)

      Which is a very good check of services.

      So what is happening...

      In our environment we have multiple Kerberos Domain Controllers across multiple data centers all providing the same realm.

      The creation of a unique principal occurs at a single KDC and is propagated to the others.

      The agents were testing the principal at different KDC, i.e. before it had a change to propagate. This is why the second service check would succeed.

       

      Attachments

        Issue Links

          Activity

            People

              quirogadf David F. Quiroga
              quirogadf David F. Quiroga
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h