Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-7527

AbstractKuduProcessor deadlocks after TGT refresh

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.12.0
    • Extensions
    • None

    Description

      The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos issue after TGT expires) introduced a new bug: after TGT refresh the processor ends up in a deadlock.

      The reason is that the onTrigger initiates a read lock:

          @Override
          public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
              kuduClientReadLock.lock();
              try {
                  onTrigger(context, session, kuduClientR);
              } finally {
                  kuduClientReadLock.unlock();
              }
          }
      

      and while the read lock is in effect, later (in the same stack) - if TGT refresh occurs - a write lock is attempted:

      ...
                  public synchronized boolean checkTGTAndRelogin() throws LoginException {
                      boolean didRelogin = super.checkTGTAndRelogin();
      
                      if (didRelogin) {
                          createKuduClient(context);
                      }
      
                      return didRelogin;
                  }
      ...
      
          protected void createKuduClient(ProcessContext context) {
              kuduClientWriteLock.lock();
              try {
                  if (this.kuduClientR.get() != null) {
                      try {
                          this.kuduClientR.get().close();
                      } catch (KuduException e) {
                          getLogger().error("Couldn't close Kudu client.");
                      }
                  }
      
                  if (kerberosUser != null) {
                      final KerberosAction<KuduClient> kerberosAction = new KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger());
                      this.kuduClientR.set(kerberosAction.execute());
                  } else {
                      this.kuduClientR.set(buildClient(context));
                  }
              } finally {
                  kuduClientWriteLock.unlock();
              }
          }
      

      This attempt at the write lock will get stuck, waiting for the previous read lock to get released.
      (Other threads may have acquired the same read lock but they can release it eventually - unless they too try to acquire the write lock themselves.)

      For the fix it seemed to be best to re-evalute the locking logic.
      Previously basically the whole onTrigger logic was encapsulated in a read lock, including the checking - and recreating as needed - the Kudu client (as explained before).
      It's best to just keep the actual privileged action in the read lock so the the refreshing of the TGT and re-creation of the Kudu client can safely be done in a write lock before that.

      Attachments

        Issue Links

          Activity

            People

              tpalfy Tamas Palfy
              tpalfy Tamas Palfy
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h