Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-3268

Add AUTO_INCREMENT column in GenerateTableFetch to benefit index

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.1
    • Fix Version/s: None
    • Component/s: Core Framework
    • Labels:
    • Environment:
      - ubuntu 16.04
      - java version "1.8.0_111"
      - Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
      - Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

      Description

      I added AUTO_INCREMENT column in GenerateTableFetch to benefit index column
      By default this processor uses OFFSET, i have problems with large data. somme column has index so we could use index to speed up query time.

      I posted question here :

      https://community.hortonworks.com/questions/72586/how-can-i-use-an-array-with-putelasticsearch.html

      If you indexed un column (id), you could use this sql

      ```
      select xxx
      From xxxxx
      where 200000=>id
      order by id
      limit 200000
      ```

      “OFFSET is bad for skipping previous rows.” [Online]. Available: http://Use-The-Index-Luke.com/sql/partial-results/fetch-next-page. [Accessed: 27-Dec-2016].

      Thank you in advance

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user qfdk opened a pull request:

          https://github.com/apache/nifi/pull/1376

          NIFI-3268 Add AUTO_INCREMENT column in GenerateTableFetch to benefit index

          Thank you for submitting a contribution to Apache NiFi.

          In order to streamline the review of the contribution we ask you
          to ensure the following steps have been taken:

              1. For all changes:
          • [X] Is there a JIRA ticket associated with this PR? Is it referenced
            in the commit message?
          • [X] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
          • [X] Has your PR been rebased against the latest commit within the target branch (typically master)?
          • [X] Is your initial contribution a single, squashed commit?
              1. For code changes:
          • [X] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
          • [X] Have you written or updated unit tests to verify your changes?
          • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
          • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
          • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
          • [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
              1. For documentation related changes:
          • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
              1. Note:
                Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/qfdk/nifi NIFI-JOUVE

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/nifi/pull/1376.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #1376


          commit a488d4ae5f4212196ceb30bafae7f3050c5ff8cc
          Author: HAOZHI Li <lhaozhi@jouve.fr>
          Date: 2016-12-30T13:38:38Z

          Add AUTO_INCREMENT column in GenerateTableFetch to benefit index


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user qfdk opened a pull request: https://github.com/apache/nifi/pull/1376 NIFI-3268 Add AUTO_INCREMENT column in GenerateTableFetch to benefit index Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: For all changes: [X] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? [X] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. [X] Has your PR been rebased against the latest commit within the target branch (typically master)? [X] Is your initial contribution a single, squashed commit? For code changes: [X] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? [X] Have you written or updated unit tests to verify your changes? [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0] ( http://www.apache.org/legal/resolved.html#category-a)? [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? For documentation related changes: [ ] Have you ensured that format looks appropriate for the output in which it is rendered? Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/qfdk/nifi NIFI-JOUVE Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/1376.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1376 commit a488d4ae5f4212196ceb30bafae7f3050c5ff8cc Author: HAOZHI Li <lhaozhi@jouve.fr> Date: 2016-12-30T13:38:38Z Add AUTO_INCREMENT column in GenerateTableFetch to benefit index
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ijokarumawak commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @qfdk Thanks for your contribution! Excuse us for taking so long to respond.

          Please note that when you submit a PR for a JIRA, push the "submit patch" button to move JIRA status forward to "Patch Available". This ensure other developers to know it's ready for PR.
          https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide#ContributorGuide-CodeReviewProcess

          I like the idea of using indexed column instead of offset.
          I wanted to test this PR. However, the same GenerateTableFetch class is recently updated by #1407 and this PR is now conflicted to merge.

          Would you update this PR by rebasing with the latest master branch and fix conflicts? And also add Expression support to the "AUTO_INCREMENT(index) column name" you added?

          Show
          githubbot ASF GitHub Bot added a comment - Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1376 @qfdk Thanks for your contribution! Excuse us for taking so long to respond. Please note that when you submit a PR for a JIRA, push the "submit patch" button to move JIRA status forward to "Patch Available". This ensure other developers to know it's ready for PR. https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide#ContributorGuide-CodeReviewProcess I like the idea of using indexed column instead of offset. I wanted to test this PR. However, the same GenerateTableFetch class is recently updated by #1407 and this PR is now conflicted to merge. Would you update this PR by rebasing with the latest master branch and fix conflicts? And also add Expression support to the "AUTO_INCREMENT(index) column name" you added?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ijokarumawak commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/1376#discussion_r100967405

          — Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java —
          @@ -87,6 +87,18 @@
          .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
          .build();

          + public static final PropertyDescriptor AUTO_INCREMENT_KEY = new PropertyDescriptor.Builder()
          + .name("gen-table-fetch-partition-index")
          + .displayName("AUTO_INCREMENT(index) column name")
          + .description("The column has AUTO_INCREMENT attribute and index."
          + + "If there is a column with AUTO_INCREMENT property and index in the database, we can use index instead of using OFFSET."
          + + "The value must start by 1")
          + .defaultValue("null")
          + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
          + .required(true)
          — End diff –

          I think this property should be optional, `required(true)` and `defaultValue("null")` can be removed.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/1376#discussion_r100967405 — Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java — @@ -87,6 +87,18 @@ .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR) .build(); + public static final PropertyDescriptor AUTO_INCREMENT_KEY = new PropertyDescriptor.Builder() + .name("gen-table-fetch-partition-index") + .displayName("AUTO_INCREMENT(index) column name") + .description("The column has AUTO_INCREMENT attribute and index." + + "If there is a column with AUTO_INCREMENT property and index in the database, we can use index instead of using OFFSET." + + "The value must start by 1") + .defaultValue("null") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .required(true) — End diff – I think this property should be optional, `required(true)` and `defaultValue("null")` can be removed.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ijokarumawak commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/1376#discussion_r100969417

          — Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java —
          @@ -223,19 +237,34 @@ public void onTrigger(final ProcessContext context, final ProcessSessionFactory
          }
          final int numberOfFetches = (partitionSize == 0) ? rowCount : (rowCount / partitionSize) + (rowCount % partitionSize == 0 ? 0 : 1);

          + if("null".equals(indexValue)) {
          + // Generate SQL statements to read "pages" of data
          + for (int i = 0; i < numberOfFetches; i++) {
          + FlowFile sqlFlowFile;

          • // Generate SQL statements to read "pages" of data
          • for (int i = 0; i < numberOfFetches; i++) {
          • FlowFile sqlFlowFile;
            + Integer limit = partitionSize == 0 ? null : partitionSize;
            + Integer offset = partitionSize == 0 ? null : i * partitionSize;
            + final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset);
            + sqlFlowFile = session.create();
            + sqlFlowFile = session.write(sqlFlowFile, out -> { + out.write(query.getBytes()); + }

            );
            + session.transfer(sqlFlowFile, REL_SUCCESS);
            + }
            + }else {
            + for (int i = 0; i < numberOfFetches; i++) {
            + FlowFile sqlFlowFile;

          • Integer limit = partitionSize == 0 ? null : partitionSize;
          • Integer offset = partitionSize == 0 ? null : i * partitionSize;
          • final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset);
          • sqlFlowFile = session.create();
          • sqlFlowFile = session.write(sqlFlowFile, out -> { - out.write(query.getBytes()); - }

            );

          • session.transfer(sqlFlowFile, REL_SUCCESS);
            + Integer limit = partitionSize;
            + whereClause = indexValue + " >= " + limit * i;
              • End diff –

          GenerateTableFetch stores max value columns in managed state, so that when it runs again, it fetches only records those have grater max column values.

          If we rewrite the `whereClause` here based on only partitionSize, it breaks this behavior when it runs the 2nd time or later.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/1376#discussion_r100969417 — Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java — @@ -223,19 +237,34 @@ public void onTrigger(final ProcessContext context, final ProcessSessionFactory } final int numberOfFetches = (partitionSize == 0) ? rowCount : (rowCount / partitionSize) + (rowCount % partitionSize == 0 ? 0 : 1); + if("null".equals(indexValue)) { + // Generate SQL statements to read "pages" of data + for (int i = 0; i < numberOfFetches; i++) { + FlowFile sqlFlowFile; // Generate SQL statements to read "pages" of data for (int i = 0; i < numberOfFetches; i++) { FlowFile sqlFlowFile; + Integer limit = partitionSize == 0 ? null : partitionSize; + Integer offset = partitionSize == 0 ? null : i * partitionSize; + final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset); + sqlFlowFile = session.create(); + sqlFlowFile = session.write(sqlFlowFile, out -> { + out.write(query.getBytes()); + } ); + session.transfer(sqlFlowFile, REL_SUCCESS); + } + }else { + for (int i = 0; i < numberOfFetches; i++) { + FlowFile sqlFlowFile; Integer limit = partitionSize == 0 ? null : partitionSize; Integer offset = partitionSize == 0 ? null : i * partitionSize; final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset); sqlFlowFile = session.create(); sqlFlowFile = session.write(sqlFlowFile, out -> { - out.write(query.getBytes()); - } ); session.transfer(sqlFlowFile, REL_SUCCESS); + Integer limit = partitionSize; + whereClause = indexValue + " >= " + limit * i; End diff – GenerateTableFetch stores max value columns in managed state, so that when it runs again, it fetches only records those have grater max column values. If we rewrite the `whereClause` here based on only partitionSize, it breaks this behavior when it runs the 2nd time or later.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user qfdk commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/1376#discussion_r101170880

          — Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java —
          @@ -87,6 +87,18 @@
          .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
          .build();

          + public static final PropertyDescriptor AUTO_INCREMENT_KEY = new PropertyDescriptor.Builder()
          + .name("gen-table-fetch-partition-index")
          + .displayName("AUTO_INCREMENT(index) column name")
          + .description("The column has AUTO_INCREMENT attribute and index."
          + + "If there is a column with AUTO_INCREMENT property and index in the database, we can use index instead of using OFFSET."
          + + "The value must start by 1")
          + .defaultValue("null")
          + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
          + .required(true)
          — End diff –

          I agree with you. I will change it 👍

          Show
          githubbot ASF GitHub Bot added a comment - Github user qfdk commented on a diff in the pull request: https://github.com/apache/nifi/pull/1376#discussion_r101170880 — Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java — @@ -87,6 +87,18 @@ .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR) .build(); + public static final PropertyDescriptor AUTO_INCREMENT_KEY = new PropertyDescriptor.Builder() + .name("gen-table-fetch-partition-index") + .displayName("AUTO_INCREMENT(index) column name") + .description("The column has AUTO_INCREMENT attribute and index." + + "If there is a column with AUTO_INCREMENT property and index in the database, we can use index instead of using OFFSET." + + "The value must start by 1") + .defaultValue("null") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .required(true) — End diff – I agree with you. I will change it 👍
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user qfdk commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/1376#discussion_r101172356

          — Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java —
          @@ -223,19 +237,34 @@ public void onTrigger(final ProcessContext context, final ProcessSessionFactory
          }
          final int numberOfFetches = (partitionSize == 0) ? rowCount : (rowCount / partitionSize) + (rowCount % partitionSize == 0 ? 0 : 1);

          + if("null".equals(indexValue)) {
          + // Generate SQL statements to read "pages" of data
          + for (int i = 0; i < numberOfFetches; i++) {
          + FlowFile sqlFlowFile;

          • // Generate SQL statements to read "pages" of data
          • for (int i = 0; i < numberOfFetches; i++) {
          • FlowFile sqlFlowFile;
            + Integer limit = partitionSize == 0 ? null : partitionSize;
            + Integer offset = partitionSize == 0 ? null : i * partitionSize;
            + final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset);
            + sqlFlowFile = session.create();
            + sqlFlowFile = session.write(sqlFlowFile, out -> { + out.write(query.getBytes()); + }

            );
            + session.transfer(sqlFlowFile, REL_SUCCESS);
            + }
            + }else {
            + for (int i = 0; i < numberOfFetches; i++) {
            + FlowFile sqlFlowFile;

          • Integer limit = partitionSize == 0 ? null : partitionSize;
          • Integer offset = partitionSize == 0 ? null : i * partitionSize;
          • final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset);
          • sqlFlowFile = session.create();
          • sqlFlowFile = session.write(sqlFlowFile, out -> { - out.write(query.getBytes()); - }

            );

          • session.transfer(sqlFlowFile, REL_SUCCESS);
            + Integer limit = partitionSize;
            + whereClause = indexValue + " >= " + limit * i;
              • End diff –

          i will take a look for that, thank you for your advice

          Show
          githubbot ASF GitHub Bot added a comment - Github user qfdk commented on a diff in the pull request: https://github.com/apache/nifi/pull/1376#discussion_r101172356 — Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java — @@ -223,19 +237,34 @@ public void onTrigger(final ProcessContext context, final ProcessSessionFactory } final int numberOfFetches = (partitionSize == 0) ? rowCount : (rowCount / partitionSize) + (rowCount % partitionSize == 0 ? 0 : 1); + if("null".equals(indexValue)) { + // Generate SQL statements to read "pages" of data + for (int i = 0; i < numberOfFetches; i++) { + FlowFile sqlFlowFile; // Generate SQL statements to read "pages" of data for (int i = 0; i < numberOfFetches; i++) { FlowFile sqlFlowFile; + Integer limit = partitionSize == 0 ? null : partitionSize; + Integer offset = partitionSize == 0 ? null : i * partitionSize; + final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset); + sqlFlowFile = session.create(); + sqlFlowFile = session.write(sqlFlowFile, out -> { + out.write(query.getBytes()); + } ); + session.transfer(sqlFlowFile, REL_SUCCESS); + } + }else { + for (int i = 0; i < numberOfFetches; i++) { + FlowFile sqlFlowFile; Integer limit = partitionSize == 0 ? null : partitionSize; Integer offset = partitionSize == 0 ? null : i * partitionSize; final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset); sqlFlowFile = session.create(); sqlFlowFile = session.write(sqlFlowFile, out -> { - out.write(query.getBytes()); - } ); session.transfer(sqlFlowFile, REL_SUCCESS); + Integer limit = partitionSize; + whereClause = indexValue + " >= " + limit * i; End diff – i will take a look for that, thank you for your advice
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user qfdk commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/1376#discussion_r101172664

          — Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java —
          @@ -223,19 +237,34 @@ public void onTrigger(final ProcessContext context, final ProcessSessionFactory
          }
          final int numberOfFetches = (partitionSize == 0) ? rowCount : (rowCount / partitionSize) + (rowCount % partitionSize == 0 ? 0 : 1);

          + if("null".equals(indexValue)) {
          + // Generate SQL statements to read "pages" of data
          + for (int i = 0; i < numberOfFetches; i++) {
          + FlowFile sqlFlowFile;

          • // Generate SQL statements to read "pages" of data
          • for (int i = 0; i < numberOfFetches; i++) {
          • FlowFile sqlFlowFile;
            + Integer limit = partitionSize == 0 ? null : partitionSize;
            + Integer offset = partitionSize == 0 ? null : i * partitionSize;
            + final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset);
            + sqlFlowFile = session.create();
            + sqlFlowFile = session.write(sqlFlowFile, out -> { + out.write(query.getBytes()); + }

            );
            + session.transfer(sqlFlowFile, REL_SUCCESS);
            + }
            + }else {
            + for (int i = 0; i < numberOfFetches; i++) {
            + FlowFile sqlFlowFile;

          • Integer limit = partitionSize == 0 ? null : partitionSize;
          • Integer offset = partitionSize == 0 ? null : i * partitionSize;
          • final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset);
          • sqlFlowFile = session.create();
          • sqlFlowFile = session.write(sqlFlowFile, out -> { - out.write(query.getBytes()); - }

            );

          • session.transfer(sqlFlowFile, REL_SUCCESS);
            + Integer limit = partitionSize;
            + whereClause = indexValue + " >= " + limit * i;
              • End diff –

          Thank you for your advice. I will work on it and solve the conflit.

          Show
          githubbot ASF GitHub Bot added a comment - Github user qfdk commented on a diff in the pull request: https://github.com/apache/nifi/pull/1376#discussion_r101172664 — Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java — @@ -223,19 +237,34 @@ public void onTrigger(final ProcessContext context, final ProcessSessionFactory } final int numberOfFetches = (partitionSize == 0) ? rowCount : (rowCount / partitionSize) + (rowCount % partitionSize == 0 ? 0 : 1); + if("null".equals(indexValue)) { + // Generate SQL statements to read "pages" of data + for (int i = 0; i < numberOfFetches; i++) { + FlowFile sqlFlowFile; // Generate SQL statements to read "pages" of data for (int i = 0; i < numberOfFetches; i++) { FlowFile sqlFlowFile; + Integer limit = partitionSize == 0 ? null : partitionSize; + Integer offset = partitionSize == 0 ? null : i * partitionSize; + final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset); + sqlFlowFile = session.create(); + sqlFlowFile = session.write(sqlFlowFile, out -> { + out.write(query.getBytes()); + } ); + session.transfer(sqlFlowFile, REL_SUCCESS); + } + }else { + for (int i = 0; i < numberOfFetches; i++) { + FlowFile sqlFlowFile; Integer limit = partitionSize == 0 ? null : partitionSize; Integer offset = partitionSize == 0 ? null : i * partitionSize; final String query = dbAdapter.getSelectStatement(tableName, columnNames, whereClause, StringUtils.join(maxValueColumnNameList, ", "), limit, offset); sqlFlowFile = session.create(); sqlFlowFile = session.write(sqlFlowFile, out -> { - out.write(query.getBytes()); - } ); session.transfer(sqlFlowFile, REL_SUCCESS); + Integer limit = partitionSize; + whereClause = indexValue + " >= " + limit * i; End diff – Thank you for your advice. I will work on it and solve the conflit.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user patricker commented on the issue:

          https://github.com/apache/nifi/pull/1376

          How database specific is this? I hadn't heard of it before, in looking around online it looks like it's for MySQL/MariaDB?

          It sounds like a great idea for the systems that can support it, I'm just wondering where it belongs. Is this something that should be in the DatabaseAdapter layer?

          Show
          githubbot ASF GitHub Bot added a comment - Github user patricker commented on the issue: https://github.com/apache/nifi/pull/1376 How database specific is this? I hadn't heard of it before, in looking around online it looks like it's for MySQL/MariaDB? It sounds like a great idea for the systems that can support it, I'm just wondering where it belongs. Is this something that should be in the DatabaseAdapter layer?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ijokarumawak commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @patricker Thank you for pointing that out. It can vary among databases. Peter, What RDB do you usually use? @qfdk I thought you're using PostgreSQL by reading the [HCC question](https://community.hortonworks.com/questions/72586/how-can-i-use-an-array-with-putelasticsearch.html) you posted. It would be great if we can review it together to see whether this PR can work with different databases.

          @qfdk I saw you've committed several updates, is it ready to another review cycle?

          Show
          githubbot ASF GitHub Bot added a comment - Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1376 @patricker Thank you for pointing that out. It can vary among databases. Peter, What RDB do you usually use? @qfdk I thought you're using PostgreSQL by reading the [HCC question] ( https://community.hortonworks.com/questions/72586/how-can-i-use-an-array-with-putelasticsearch.html ) you posted. It would be great if we can review it together to see whether this PR can work with different databases. @qfdk I saw you've committed several updates, is it ready to another review cycle?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user qfdk commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @ijokarumawak yes you are right, It's my post Because the *new master* can't passer travis-ci. I tried to compile it with travis-ci so i committed several updates to launch travis-ci job

          Show
          githubbot ASF GitHub Bot added a comment - Github user qfdk commented on the issue: https://github.com/apache/nifi/pull/1376 @ijokarumawak yes you are right, It's my post Because the * new master * can't passer travis-ci. I tried to compile it with travis-ci so i committed several updates to launch travis-ci job
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ijokarumawak commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @qfdk I understand. It's ok to have Travis test failure with other USER_LANGUAGE than English for now. Since the test has passed with English locale, we can move forward. We're aware of several tests can fail with other locale, and actively fixing those tests right now, for example, NIFI-3466(https://issues.apache.org/jira/browse/NIFI-3466).

          In order to add new capability to this generic processor, we need to test with variety Database engines as many user use it with different databases. If you verified it with PostgreSQL, then I'd try reviewing this with other database such as Oracle, SQL server or MySQL.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1376 @qfdk I understand. It's ok to have Travis test failure with other USER_LANGUAGE than English for now. Since the test has passed with English locale, we can move forward. We're aware of several tests can fail with other locale, and actively fixing those tests right now, for example, NIFI-3466 ( https://issues.apache.org/jira/browse/NIFI-3466 ). In order to add new capability to this generic processor, we need to test with variety Database engines as many user use it with different databases. If you verified it with PostgreSQL, then I'd try reviewing this with other database such as Oracle, SQL server or MySQL.
          Hide
          qfdk qfdk added a comment -

          submit patch for testing

          Show
          qfdk qfdk added a comment - submit patch for testing
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ijokarumawak commented on the issue:

          https://github.com/apache/nifi/pull/1376

          Hi @qfdk , excuse me for not providing further review comments. Today, I wanted to test this updated PR but unfortunately it is conflicted. Would you rebase it with the latest master, address merge conflict and squash commits into one? Please let us know when it gets ready to resume review.

          It may not be ideal, but I think it will take time to get this PR fully reviewed with various DBMS to see if it works as expected and how it can improve performance. On the other hand, the NiFi community is [discussing about releasing 1.2.0](http://apache-nifi-developer-list.39713.n7.nabble.com/Closing-in-on-a-NiFi-1-2-0-release-td14907.html). To make this effort along with the release cycles, can we remove 1.2.0 as Fix Version from JIRA NIFI-3268(https://issues.apache.org/jira/browse/NIFI-3268)? The Fix Version is used to track which version the issue is fixed, and should be empty until it's resolved.

          Thanks for your contribution.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1376 Hi @qfdk , excuse me for not providing further review comments. Today, I wanted to test this updated PR but unfortunately it is conflicted. Would you rebase it with the latest master, address merge conflict and squash commits into one? Please let us know when it gets ready to resume review. It may not be ideal, but I think it will take time to get this PR fully reviewed with various DBMS to see if it works as expected and how it can improve performance. On the other hand, the NiFi community is [discussing about releasing 1.2.0] ( http://apache-nifi-developer-list.39713.n7.nabble.com/Closing-in-on-a-NiFi-1-2-0-release-td14907.html ). To make this effort along with the release cycles, can we remove 1.2.0 as Fix Version from JIRA NIFI-3268 ( https://issues.apache.org/jira/browse/NIFI-3268)? The Fix Version is used to track which version the issue is fixed, and should be empty until it's resolved. Thanks for your contribution.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ijokarumawak commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @qfdk I updated the JIRA and emptied Fix Version for now.

          Please let me request one more thing, is it possible for you to provide test result with PostgreSQL (or whatever DBMS you're comfortable with) to illustrate how much this enhancement can improve performance? It'd be really helpful to shorten review cycle. Thanks!

          Show
          githubbot ASF GitHub Bot added a comment - Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1376 @qfdk I updated the JIRA and emptied Fix Version for now. Please let me request one more thing, is it possible for you to provide test result with PostgreSQL (or whatever DBMS you're comfortable with) to illustrate how much this enhancement can improve performance? It'd be really helpful to shorten review cycle. Thanks!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user qfdk commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @ijokarumawak Hey, i will take a look as soon as possible because i found some new modifications in the master so i will try to do that. PS: `nifi-1.1.0.2.1.1.0-2` works with my modification, we have 26,360,516 documents (5.6 Gb), in 5 minutes we could finish reading et avron2json

          Show
          githubbot ASF GitHub Bot added a comment - Github user qfdk commented on the issue: https://github.com/apache/nifi/pull/1376 @ijokarumawak Hey, i will take a look as soon as possible because i found some new modifications in the master so i will try to do that. PS: `nifi-1.1.0.2.1.1.0-2` works with my modification, we have 26,360,516 documents (5.6 Gb), in 5 minutes we could finish reading et avron2json
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ijokarumawak commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @qfdk "26,360,516 documents (5.6 Gb), in 5 minutes" is amazing! A video would be very helpful, but a image or two might be enough. I'd like to know the difference between the time to read all those documents (records) with this proposed index column, and without it. Thank you!

          Show
          githubbot ASF GitHub Bot added a comment - Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1376 @qfdk "26,360,516 documents (5.6 Gb), in 5 minutes" is amazing! A video would be very helpful, but a image or two might be enough. I'd like to know the difference between the time to read all those documents (records) with this proposed index column, and without it. Thank you!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user qfdk commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @ijokarumawak I made a shot video https://youtu.be/ZD-CXgfJ6Xg(https://youtu.be/ZD-CXgfJ6Xg)
          this video works for *nifi-1.1.0.2.1.1.0-2* PostgreSQL, I will recheck the latest version of nifi.

          Show
          githubbot ASF GitHub Bot added a comment - Github user qfdk commented on the issue: https://github.com/apache/nifi/pull/1376 @ijokarumawak I made a shot video https://youtu.be/ZD-CXgfJ6Xg ( https://youtu.be/ZD-CXgfJ6Xg ) this video works for * nifi-1.1.0.2.1.1.0-2 * PostgreSQL, I will recheck the latest version of nifi.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ijokarumawak commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @qfdk Thank you very much for recording and sharing the video. I can see how it improves execution time of generated SQL. This enhancement is significant for the first execution of GenerateTableFetch processor if there are many records to fetch. The performance difference is huge!

          For the 2nd time or later, GenerateTableFetch uses 'where' clause using 'Max value columns' so it will not be a problem after 1st run if the max value columns are properly indexed. But if there are so many records inserted or updated, between GenerateTableFetch runs, it will have the same issue if we use `limit offset`.

          I understand how it should work now. Please let me know when you finish updating PR. I will review it as soon as possible. Thanks!

          Show
          githubbot ASF GitHub Bot added a comment - Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1376 @qfdk Thank you very much for recording and sharing the video. I can see how it improves execution time of generated SQL. This enhancement is significant for the first execution of GenerateTableFetch processor if there are many records to fetch. The performance difference is huge! For the 2nd time or later, GenerateTableFetch uses 'where' clause using 'Max value columns' so it will not be a problem after 1st run if the max value columns are properly indexed. But if there are so many records inserted or updated, between GenerateTableFetch runs, it will have the same issue if we use `limit offset`. I understand how it should work now. Please let me know when you finish updating PR. I will review it as soon as possible. Thanks!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user qfdk commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @ijokarumawak I'm very glad to work in this code in my free time. Now there was some issues that i want to debug the prossor lunched but nothing happend ... nifi-1.1.0.2.1.1.0-2 works verry well but the lastest version it can't work correctly .. I would change some code in this week-end and try to fix them.

          Show
          githubbot ASF GitHub Bot added a comment - Github user qfdk commented on the issue: https://github.com/apache/nifi/pull/1376 @ijokarumawak I'm very glad to work in this code in my free time. Now there was some issues that i want to debug the prossor lunched but nothing happend ... nifi-1.1.0.2.1.1.0-2 works verry well but the lastest version it can't work correctly .. I would change some code in this week-end and try to fix them.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ijokarumawak commented on the issue:

          https://github.com/apache/nifi/pull/1376

          @qfdk Sometimes, I encounter issue while debugging, such as forgetting terminate old IntelliJ debugger or having other thread being stopped at different break point, then NiFi jetty thread can't respond. If NiFi UI keeps showing a little running circle and doesn't update, probably that is the case.
          Otherwise, I've been able to debug it with IntelliJ fine.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1376 @qfdk Sometimes, I encounter issue while debugging, such as forgetting terminate old IntelliJ debugger or having other thread being stopped at different break point, then NiFi jetty thread can't respond. If NiFi UI keeps showing a little running circle and doesn't update, probably that is the case. Otherwise, I've been able to debug it with IntelliJ fine.

            People

            • Assignee:
              Unassigned
              Reporter:
              qfdk qfdk
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development