Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos issue after TGT expires) introduced a new bug: after TGT refresh the processor ends up in a deadlock.
The reason is that the onTrigger initiates a read lock:
@Override public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException { kuduClientReadLock.lock(); try { onTrigger(context, session, kuduClientR); } finally { kuduClientReadLock.unlock(); } }
and while the read lock is in effect, later (in the same stack) - if TGT refresh occurs - a write lock is attempted:
... public synchronized boolean checkTGTAndRelogin() throws LoginException { boolean didRelogin = super.checkTGTAndRelogin(); if (didRelogin) { createKuduClient(context); } return didRelogin; } ... protected void createKuduClient(ProcessContext context) { kuduClientWriteLock.lock(); try { if (this.kuduClientR.get() != null) { try { this.kuduClientR.get().close(); } catch (KuduException e) { getLogger().error("Couldn't close Kudu client."); } } if (kerberosUser != null) { final KerberosAction<KuduClient> kerberosAction = new KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger()); this.kuduClientR.set(kerberosAction.execute()); } else { this.kuduClientR.set(buildClient(context)); } } finally { kuduClientWriteLock.unlock(); } }
This attempt at the write lock will get stuck, waiting for the previous read lock to get released.
(Other threads may have acquired the same read lock but they can release it eventually - unless they too try to acquire the write lock themselves.)
For the fix it seemed to be best to re-evalute the locking logic.
Previously basically the whole onTrigger logic was encapsulated in a read lock, including the checking - and recreating as needed - the Kudu client (as explained before).
It's best to just keep the actual privileged action in the read lock so the the refreshing of the TGT and re-creation of the Kudu client can safely be done in a write lock before that.
Attachments
Issue Links
- links to