[NIFI-10052] Avoid obtaining any locks when creating/sending heartbeats - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.18.0
Component/s: Core Framework
Labels:

Description

When NiFi creates a heartbeat to send to the coordinator, it must obtain a few locks in order to generate that heartbeat. We should avoid obtaining any read locks, write locks, or synchronized monitors, especially those that may be held for a while. Doing so can result in NiFi getting disconnected from the cluster if a write lock is held for a long time.

Specifically, the following locks are obtained, at minimum:

FlowController readLock in the createHeartbeatMessage() method. Due to refactoring, this read lock is not necessary at all.
revisionManager.getRevisionUpdateCount() is synchronized. However, the synchronization here is not needed, as it just returns an AtomicLong.get(). This is perhaps the most important lock to avoid because any update to a component or group of components happens within revisionManager.updateRevision, which also is synchronized. So a large request like deleting thousands of components will block heartbeats from being created until this completes.
FlowController.getTotalFlowFileCount - this may be the most challenging to eliminate. It calls ProcessGroup.getConnections() and ProcessGroup.getProcessGroups(), which means that it must obtain the read lock of the Process Group twice - for every Process Group in the flow. We may be able to change StandardProcessGroup's connections and processGroups maps to ConcurrentHashMap's and just introduce a getQueueSize() method on ProcessGroup that can avoid having to lock so much
This createHeartbeatMessage() method also appears to reference FlowController's connectionStatus member variable without any locks, although it is not volatile and documentation indicates that it's guarded by read/write lock. So that needs to be addressed in order to ensure that the connectionStatus is always accurately reported.

Attachments

Issue Links

links to

GitHub Pull Request #6298

Activity

People

Assignee:: Hsin-Ying Lee

Reporter:: Mark Payne

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/May/22 21:48

Updated:: 17/Aug/22 18:08

Resolved:: 17/Aug/22 18:06

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h