Uploaded image for project: 'Sentry (Retired)'
  1. Sentry (Retired)
  2. SENTRY-1703

Solr-Sentry in kerberos mode makes too many KDC requests and returns unauthorized on KDC timeout

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • 1.5.1
    • None
    • Solr Plugin
    • None

    Description

      Sentry Version: 1.5.1-cdh5.8.0

      We are seeing intermittent authorization failures with Sentry Solr plugin in a Kerberos environment.

      1. We are writing to Solr using the SolrJ client from within Spark jobs in a multi-node Spark/Hadoop cluster and frequently get authorization errors from Solr in individual spark tasks saying "User XX does not have privileges for YYcollection" which are generated by the Solr-Sentry plugin. (The user very well has access to the collection and it works fine rest of the times).
      2. The root cause seems to be that on every Solr call from the client, Sentry reaches out to KDC on behalf of solr/hostname, thereby drowning the KDC in tons of requests per second, and at some point fails on a KDC timeout, throwing the exception: org.apache.sentry.binding.solr.authz.SentrySolrAuthorizationException: User XX does not have privileges for YYcollection to the calling client.

      I didn't get enough time to investigate why Sentry is making so many KDC calls, maybe it's doing it for each document in a batched Solr operation, or it logs in using keytab each time and doesn't cache the ticket, etc.

      Caching the result of authProvider.hasAccess() in SolrAuthzBinding.java for a reasonably short time might not be a bad idea.

      My question in the meantime is: Are there any tuning knobs to somehow reduce the load on KDC, or increase the KDC request timeout value, or anything along these lines?

      Relevant stacktraces captured from Solr Admin are attached:
      1. stacktrace1.log : The timeout from KDC for sentry call
      2. stacktrace2.log: When Sentry cannot authenticate with KDC due to # 1 above
      3. stacktrace3.log: SolrException when authProvider.hasAccess() returns false due to # 2 above.

      Also attached is a snippet from the KDC log - the full log bloats to 17 MB within a minute, full of messages like:

      Apr 10 17:06:37 a0 krb5kdc[20427](info): TGS_REQ (1 etypes {23}) 10.0.0.1: ISSUE: authtime 1491818430, etypes {rep=23 tkt=23 ses=23}, solr/a0@REALM.COM for sentry/a0@REALM.COM
      

      This is reproducible in two separate clusters with different environments:
      CDH 5.10.1 and
      CDH 5.8.0

      Please let me know if I've left out any key information.

      Attachments

        1. kdc.log.txt
          6 kB
          Tushar I
        2. solr-sentry-test-master.zip
          11 kB
          Tushar I
        3. stacktrace1.log.txt
          9 kB
          Tushar I
        4. stacktrace2.log.txt
          7 kB
          Tushar I
        5. stacktrace3.log.txt
          5 kB
          Tushar I

        Issue Links

          Activity

            People

              Unassigned Unassigned
              wldata Tushar I
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: