Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-6900

Request to retrieve flow in clustered environment should be much less expensive




      When a request is made to the `/nifi-api/flow/process-groups/{pgId}` endpoint, the request must be replicated and the responses merged. The things that need to be merged include bulletins, component permissions, statuses, load balance indicators, validation errors, and perhaps a few others.

      However, each node currently responds with a fully populated `ProcessGroupFlowEntity`. This entity contains all information that is needed to display the current Process Group in the UI, as well as a lot of other details. For example, it contains the Property Descriptors for every component, including the property description, default value, etc. These should only be needed when configuring a component, not to display the canvas.

      This request can take a while when the flow is large or when the cluster is large, because the JSON must be parsed from every node in the cluster in order to merge the responses. Profiling shows that the expense can be broken down into two functions: parsing the nodes' responses into DTO objects and merging the responses, with parsing being the dominant function in terms of cost (over 80%).

      There are two big improvements that I think can be made:

      • Null out some things from the DTO before returning the response. Things like Property Descriptors, Property Values, and most all component configuration. These should be fetched when the component is configured. However, this change may require significant changes to the UI, as well.
      • Add a query parameter to the endpoint such as `minimal=true`. This query parameter would default to `false` in order to maintain backward compatibility but if set to `true`, the response would contain only the information needed in order to assemble a fully response to the client. To accomplish this, one response would need to be fully populated (likely, this would be whichever node is the Cluster Coordinator) and that response would include a 'fullyPopulated' flag. This would be the 'clientResponse' that is used when merging the node responses. All other nodes would first null out the elements that are not required for merging. So it would include things like the bulletins, validation errors, status, etc. Even the status could be further reduced by not including the "human readable" values but only the raw numeric values, since the human readable values are ignored when merging anyway.

      This would significantly reduce the amount of time taken to replicate this request, which would provide the user with a far better experience due to the significantly shorter response times.


        Issue Links



              Unassigned Unassigned
              markap14 Mark Payne
              0 Vote for this issue
              1 Start watching this issue