[NIFI-4385] Adjust the QueryDatabaseTable processor for handling big tables. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.3.0
Fix Version/s: None
Component/s: Core Framework
Labels:
None

Description

When querying large database tables, the QueryDatabaseTable processor does not perform very well.
The processor will always perform the full query and then transfer all flowfiles as a list instead of
transferring them particularly after the ResultSet is fetching the next rows(If a fetch size is given).
If you want to query a billion rows from a table,
the processor will add all flowfiles in an ArrayList<FlowFile> in memory
before transferring the whole list after the last row is fetched by the ResultSet.
I've checked the code in org.apache.nifi.processors.standard.QueryDatabaseTable.java
and in my opinion, it would be no big deal to move the session.transfer to a proper position in the code (into the while loop where the flowfile is added to the list) to
achieve a real stream support. There was also a bug report for this problem
which resulted in adding the new property Maximum Number of Fragments,
but this property will just limit the results.
Now you have to multiply Max Rows Per Flow File with Maximum Number of Fragments to get your limit,
which is not really a solution for the original problem imho.
Also the workaround with GenerateTableFetch and/or ExecuteSQL processors is much slower than using a database cursor or a ResultSet
and stream the rows in flowfiles directly in the queue.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Tim Späth

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 15/Sep/17 10:30

Updated:: 22/Feb/24 00:59