[NIFI-5112] Inefficiency in replicating requests across cluster - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.7.0
Component/s: Core Framework
Labels:
None

Description

When replicating requests across the cluster, we do some things that are rather inefficient, which can cause the UI to feel sluggish. Because all of this is done while the UI awaits a response, we need to ensure that this area of the application is very responsive. Through profiling and code review, I have identified the following places where we can improve our efficiency:

Use of Jersey Client. Jersey Client provides a very easy-to-use API that is very powerful. It provides a lot of capabilities to scan class paths and automatically detect interceptors, etc. However, doing this comes at a cost. Profiling shows that, on average, on my laptop replicating a single request took about 100 milliseconds, 100% of which was spent actually constructing the Jersey objects. Less than 1 millisecond of time was spent writing the message to the socket, awaiting the reply, and parsing the response. By using a different client, we can significantly improve this.
Flow Serialization holds a Flow Controller Read lock for the entire duration. This means that we block any mutable operations, such as HTTP GET requests, while we build the appropriate DOM object for the flow, transform that DOM object into a String, and write that String to the output stream (including compression). We should be able to hold the Read Lock only while building the appropriate DOM object and then perform the transformation/serialization outside of the lock.
Template Serialization is inefficient. Currently, for each template, we serialize the DTO object to a String, then Deserialize that String into a DOM object (all of this is done in order to avoid XML-based injection attacks). We then add that DOM object into our flow's DOM object. We should instead hold onto/cache that DOM object so that we can cut out all of the above for all but the first iteration.
ReflectionUtils is used when a Processor is created in order to call any method annotated with @OnAdded. The implementation uses some Spring-based reflection utils in order to find any sort of Bridged methods. Doing this is expensive (on the order of 1 ms on my laptop). While this may not sound like a concern, that means that importing a template consisting of 5,000 processors will take 5 seconds just to find annotated methods. All within the context of a web request. Since these methods will not change, we should instead cache a list of Methods that contain the annotations so that we don't have to constantly look these up.
Authorization uses InovcationHandlers. These InvocationHandlers use reflection to compare the method being called to a well-known method. The call to Method.equals() is not expensive. However, the call to Class.getMethod() is expensive and is done for every single authorization check, which can amount to a significant amount of time being spent. Instead, we can store the method of interest in a member variable and reference that.

Attachments

Activity

People

Assignee:: Mark Payne

Reporter:: Mark Payne

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Apr/18 18:33

Updated:: 16/May/18 18:42

Resolved:: 16/May/18 18:42