Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-4385

Adjust the QueryDatabaseTable processor for handling big tables.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.3.0
    • None
    • Core Framework
    • None

    Description

      When querying large database tables, the QueryDatabaseTable processor does not perform very well.
      The processor will always perform the full query and then transfer all flowfiles as a list instead of
      transferring them particularly after the ResultSet is fetching the next rows(If a fetch size is given).
      If you want to query a billion rows from a table,
      the processor will add all flowfiles in an ArrayList<FlowFile> in memory
      before transferring the whole list after the last row is fetched by the ResultSet.
      I've checked the code in org.apache.nifi.processors.standard.QueryDatabaseTable.java
      and in my opinion, it would be no big deal to move the session.transfer to a proper position in the code (into the while loop where the flowfile is added to the list) to
      achieve a real stream support. There was also a bug report for this problem
      which resulted in adding the new property Maximum Number of Fragments,
      but this property will just limit the results.
      Now you have to multiply Max Rows Per Flow File with Maximum Number of Fragments to get your limit,
      which is not really a solution for the original problem imho.
      Also the workaround with GenerateTableFetch and/or ExecuteSQL processors is much slower than using a database cursor or a ResultSet
      and stream the rows in flowfiles directly in the queue.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tspaeth Tim Späth
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: