[NIFI-2851] Improve performance of SplitText - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Core Framework
Labels:
None

Description

SplitText is fairly CPU-intensive and quite slow. A simple flow that splits a 1.4 million line text file into 5k line chunks and then splits those 5k line chunks into 1 line chunks is only capable of pushing through about 10k lines per second. This equates to about 10 MB/sec. JVisualVM shows that the majority of the time is spent in the locateSplitPoint() method. Isolating this code and inspecting how it works, and using some micro-benchmarking, it appears that if we refactor the calls to InputStream.read() to instead read into a byte array, we can improve performance.

Attachments

Issue Links

Dependent

NIFI-2876 Refactor TextLineDemarcator and StreamDemarcator into a common abstract class

Resolved

relates to

NIFI-3255 SplitText fails with IllegalArgumentException: Destination cannot be within sources

Resolved

links to

GitHub Pull Request #1116

GitHub Pull Request #1215

Activity

People

Assignee:: Oleg Zhurakousky

Reporter:: Mark Payne

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 01/Oct/16 00:04

Updated:: 27/Dec/16 08:33

Resolved:: 11/Nov/16 20:46