[NIFI-12700] PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.26.0, 2.0.0-M3
Component/s: None
Labels:
None

Description

The PutKudu processor's existing implementation uses a Map of KuduOperation -> FlowFile to keep track of which FlowFile was processing when the KuduOperation was created. This is mapping is eventually used to associate FlowFiles with the RowError (if any occurs), a mapping that is necessary for transferring FlowFiles to success/failure relationships or logging failures among other things.

For very large inputs, Kudu Operation objects can grow very large. There is no memory leak, but still could cause OutOfMemory issues in very large input data. There is a possibility to not require the use of a KuduOperation -> FlowFile map for unbatched flush modes (e.g. when using the AUTO_FLUSH_SYNC flush mode, where the KuduSession.apply() would have already flushed the buffer before returning, https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html)

This Jira attempts to capture the efforts for refactoring PutKudu processor to make it more memory optimized.

Attachments

Issue Links

links to

GitHub Pull Request #8322

GitHub Pull Request #8501

Activity

People

Assignee:: Emilio Setiadarma

Reporter:: Emilio Setiadarma

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Jan/24 23:49

Updated:: 15/Mar/24 15:45

Resolved:: 15/Mar/24 15:45

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: