When NiFi is deployed as a large cluster with many components, the UI often starts to feel sluggish. Refreshing the stats can take many seconds. To find out what the culprit was, I created a 10 node cluster. I then created a Process Group and added 100 Processors to the group (20 sets of 5 with each of the 5 connected together).
I then refreshed the stats many times. Each refresh took several seconds.
I narrowed down the amount of time taken to 3 key elements:
- On the backend, the longest part of the request was merging responses from all nodes by the Cluster Coordinator. The performance could be narrowed down to the TimeAdapter that is used on many elements such as ProcessorStatus that is used by Jackson to parse the timestamp and turn it into a Date object. This uses a DateTimeFormatter but creates a new one for each invocation, which is expensive. This can be cached & reused.
- There is a bug in ThreadPoolRequestReplicator. We have properties in nifi.properties for nifi.cluster.node.protocol.threads and nifi.cluster.node.protocol.max.threads. However, because of the way the thread pool is used, we never actually scale beyond the value of the nifi.cluster.node.protocol.threads property. By default, that means we never use more than 10 threads. And if we have 10 node cluster, and each UI refresh makes 4 requests, that's 40 requests that must be replicated (1 per node). And those get queued up instead of the thread pool growing. We can address this by dropping the nifi.cluster.node.protocol.threads property and just scaling up to nifi.cluster.node.protocol.max.threads threads, allowing scaling down to 0 if no active requests.
- The UI rendering is slow. Using Chrome's profiler, I find that, by far, the largest amount of time rendering the canvas is spent in the nf-canvas-utils ellipsis() method, determining whether or not ellipses are needed. Specifically, the call to node.getSubStringLength() is very expensive and is always called for each processor name, type, bundle, connection names, pg name, etc. We can significantly improve this by keeping a cache of [text, component width, text style] -> length to trim to before adding ellipsis (or a -1 to indicate it should not be trimmed). This will eliminate a very large proportion of these calls