[NIFI-6736] If not given enough threads, Load Balanced Connections may block for long periods of time without making progress - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.10.0
Component/s: Core Framework
Labels:
None

Description

When load-balanced connections are used, we have a few different properties that we can configure. Specifically, the properties with their default values are:

nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec

If the max thread count is below the number of connections per node * number of nodes in the cluster, everything still works well when there are reasonably high data volumes across all connections that are load-balanced. However, if one of the connections has a low data volume, we can get into a situation where the load balanced connections stop pushing data for some period of time, typically approximately some multiple of the "comms.timeout" property.

This appears to be due to the fact that the server is using Socket IO and not NIO and once data has been received, it will check if more data is available. If it does not receive any indication for some period of time, it will time out. Only then does it add the socket connection back to a pool of connections to read from. This means that the thread can be stuck, waiting to receive more data, and blocking any progress from other connections on that thread.

Attachments

Issue Links

links to

GitHub Pull Request #3784

Activity

People

Assignee:: Mark Payne

Reporter:: Mark Payne

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 01/Oct/19 21:02

Updated:: 02/Oct/19 18:29

Resolved:: 02/Oct/19 18:29

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

40m