[NIFI-3648] Address Excessive Garbage Collection - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6.0
Component/s: Core Framework, Extensions
Labels:
None

Description

We have a lot of places in the codebase where we generate lots of unnecessary garbage - especially byte arrays. We need to clean this up to in order to relieve stress on the garbage collector.

Specific points that I've found create unnecessary garbage:

Provenance CompressableRecordWriter creates a new BufferedOutputStream for each 'compression block' that it creates. Each one has a 64 KB byte[]. This is very wasteful. We should instead subclass BufferedOutputStream so that we are able to provide a byte[] to use instead of an int that indicates the size. This way, we can just keep re-using the same byte[] that we create for each writer. This saves about 32,000 of these 64 KB byte[] for each writer that we create. And we create more than 1 of these per minute.

EvaluateJsonPath uses a BufferedInputStream but it is not necessary, because the underlying library will also buffer data. So we are unnecessarily creating a lot of byte[]'s

CompressContent uses Buffered Input AND Output. And uses 64 KB byte[]. And doesn't need them at all, because it reads and writes with its own byte[] buffer via StreamUtils.copy

Site-to-site uses CompressionInputStream. This stream creates a new byte[] in the readChunkHeader() method continually. We should instead only create a new byte[] if we need a bigger buffer and otherwise just use an offset & length variable.

Right now, SplitText uses TextLineDemarcator. The fill() method increases the size of the internal byte[] by 8 KB each time. When dealing with a large chunk of data, this is VERY expensive on GC because we continually create a byte[] and then discard it to create a new one. Take for example an 800 KB chunk. We would do this 100,000 times. If we instead double the size we would only have to create 8 of these.

Other Processors that use Buffered streams unnecessarily:

ConvertJSONToSQL
ExecuteProcess
ExecuteStreamCommand
AttributesToJSON
EvaluateJsonPath (when writing to content)
ExtractGrok
JmsConsumer

Attachments

Issue Links

is related to

NIFI-3636 Session should not copy FlowFile Attribute Map when creating new FlowFile object unless attributes are changing

Resolved

relates to

NIFI-3257 Cluster stability issues during high throughput

Open

links to

GitHub Pull Request #1637

Activity

People

Assignee:: Mark Payne

Reporter:: Mark Payne

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 24/Mar/17 20:19

Updated:: 11/Jan/18 16:24

Resolved:: 11/Jan/18 16:24