[NIFI-6322] Evaluator Objects are rebuilt on every call even when a CompiledExpression is used - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.9.2
Fix Version/s: 1.10.0
Component/s: Core Framework
Labels:
- expression-language
- performance

Description

Hi,

While doing some CPU sampling in our production environment, we encountered some strange results. It seems like that, during the evaluation of NiFi expressions, the modification of a HashSet is the most expensive operation in this process.

This feels pretty unrealistic considering all the other processing related to evaluating NiFi expressions.
After reviewing some code and some profiling it just looks like this HashSet modification is performed way more often than required. Especially that it is done at each evaluation.

This profiling output was produced with the following unit test:

@Test
public void testSimple() {
 final TestRunner runner = TestRunners.newTestRunner(new RouteOnAttribute());
 runner.setProperty(RouteOnAttribute.ROUTE_STRATEGY, RouteOnAttribute.ROUTE_ANY_MATCHES.getValue());
 runner.setProperty("filter", "${literal('b'):equals(${a})}");
 for (int i = 0; i < 500; i++) {
 runner.enqueue(new byte[0], new HashMap<String, String>() {{
 put("a", "b");
 }});
 }
 runner.run(500);
}

The key question is: Why are the Evaluator Objects (and all the stuff related to it) built twice:

Once in ExpressionCompiler.compile()
Once again in CompiledExpression.evaluate()

In other words: Every call to CompiledExpression.evaluate() leads to a new ExpressionCompiler being created and expensive calls being made. Why not just reuse Evaluator objects created beforehand that are stored in the CompiledExpression?

Is there a specific design decision behind that? It looks like there is room for performance improvement, especially for heavily used processors.

On our live system, where we perform expensive tasks like language detection, mail parsing and such, this situation causes the most amount of CPU eaten by the expression language evaluation.

Thank you very much for looking into this.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Selection_094.png
27/May/19 13:52
127 kB
Frederik Petersen
image.png
27/May/19 13:53
73 kB
Frederik Petersen

Issue Links

is related to

NIFI-5801 Evaluating Expression Language can in many cases be made much more efficient

Resolved

links to

GitHub Pull Request #3500

GitHub Pull Request #3518

Activity

People

Assignee:: Unassigned

Reporter:: Frederik Petersen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/May/19 14:08

Updated:: 13/Jun/19 15:41

Resolved:: 13/Jun/19 15:41

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

6h 10m