Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-4371

Add support for query timeout in Hive processors

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Extensions
    • Labels:
      None

      Description

      With HIVE-4924 it is possible to set a query timeout when executing a query against Hive (starting with Hive 2.1). Right now, NiFi is built using Hive 1.2.1 and this feature is not available by default (the method is not implemented in the driver). However, if building NiFi with specific profiles this feature can be used.

      The objective is to expose the query timeout parameter in the processor and enable expression language. If the version of the driver is not implementing the query timeout the processor will be in invalid state (unless expression language is used, and in this case, the flow file will be routed to the failure relationship).

      1. Screen Shot 2017-09-09 at 4.31.21 PM.png
        189 kB
        Pierre Villard
      2. Screen Shot 2017-09-09 at 6.38.51 PM.png
        285 kB
        Pierre Villard
      3. Screen Shot 2017-09-09 at 6.40.48 PM.png
        68 kB
        Pierre Villard

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user pvillard31 opened a pull request:

          https://github.com/apache/nifi/pull/2138

          NIFI-4371 - add support for query timeout in Hive processors

          Thank you for submitting a contribution to Apache NiFi.

          In order to streamline the review of the contribution we ask you
          to ensure the following steps have been taken:

              1. For all changes:
          • [ ] Is there a JIRA ticket associated with this PR? Is it referenced
            in the commit message?
          • [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
          • [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
          • [ ] Is your initial contribution a single, squashed commit?
              1. For code changes:
          • [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
          • [ ] Have you written or updated unit tests to verify your changes?
          • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
          • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
          • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
          • [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
              1. For documentation related changes:
          • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
              1. Note:
                Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/pvillard31/nifi NIFI-4371

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/nifi/pull/2138.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2138


          commit 8a4ec6cd7788dd83899ffd9536caa9317690134e
          Author: Pierre Villard <pierre.villard.fr@gmail.com>
          Date: 2017-09-09T16:57:34Z

          NIFI-4371 - add support for query timeout in Hive processors


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user pvillard31 opened a pull request: https://github.com/apache/nifi/pull/2138 NIFI-4371 - add support for query timeout in Hive processors Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: For all changes: [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? [ ] Is your initial contribution a single, squashed commit? For code changes: [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? [ ] Have you written or updated unit tests to verify your changes? [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0] ( http://www.apache.org/legal/resolved.html#category-a)? [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? For documentation related changes: [ ] Have you ensured that format looks appropriate for the output in which it is rendered? Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pvillard31/nifi NIFI-4371 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/2138.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2138 commit 8a4ec6cd7788dd83899ffd9536caa9317690134e Author: Pierre Villard <pierre.villard.fr@gmail.com> Date: 2017-09-09T16:57:34Z NIFI-4371 - add support for query timeout in Hive processors
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattyb149 commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/2138#discussion_r143055024

          — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java —
          @@ -290,6 +292,9 @@ public void process(final OutputStream out) throws IOException {
          }
          }

          + // set query timeout
          + st.setQueryTimeout(queryTimeout);
          — End diff –

          Nothing to be done here necessarily, but just wanted to mention that the success of this method is dependent on the customValidate() preventing the processor from running if the Hive JDBC driver does not support setQueryTimeout.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2138#discussion_r143055024 — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java — @@ -290,6 +292,9 @@ public void process(final OutputStream out) throws IOException { } } + // set query timeout + st.setQueryTimeout(queryTimeout); — End diff – Nothing to be done here necessarily, but just wanted to mention that the success of this method is dependent on the customValidate() preventing the processor from running if the Hive JDBC driver does not support setQueryTimeout.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user joewitt commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/2138#discussion_r143279988

          — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java —
          @@ -290,6 +292,9 @@ public void process(final OutputStream out) throws IOException {
          }
          }

          + // set query timeout
          + st.setQueryTimeout(queryTimeout);
          — End diff –

          I get the point of the customValidate but I'd keep its check but mark it as valid whether the timeout method is supported or not. YOu can set some processor instance boolean to advise whether it is supported then in this call/check set the timeout if it is and ignore it if not. Otherwise we're requiring them to use a version of hive which supports it and that seems a little heavy handed. It is probably a good idea to do the current validation logic in some other lifecycle call like 'onAdded' or something, warn if timeouts not supported and point out hanging threads are possible, and keep going.

          Show
          githubbot ASF GitHub Bot added a comment - Github user joewitt commented on a diff in the pull request: https://github.com/apache/nifi/pull/2138#discussion_r143279988 — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java — @@ -290,6 +292,9 @@ public void process(final OutputStream out) throws IOException { } } + // set query timeout + st.setQueryTimeout(queryTimeout); — End diff – I get the point of the customValidate but I'd keep its check but mark it as valid whether the timeout method is supported or not. YOu can set some processor instance boolean to advise whether it is supported then in this call/check set the timeout if it is and ignore it if not. Otherwise we're requiring them to use a version of hive which supports it and that seems a little heavy handed. It is probably a good idea to do the current validation logic in some other lifecycle call like 'onAdded' or something, warn if timeouts not supported and point out hanging threads are possible, and keep going.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user joewitt commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/2138#discussion_r143280686

          — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java —
          @@ -290,6 +292,9 @@ public void process(final OutputStream out) throws IOException {
          }
          }

          + // set query timeout
          + st.setQueryTimeout(queryTimeout);
          — End diff –

          disregard what i said. matt pointed out that the code checks IF a timeout was set and is non-zero. Based on that I think the way it was implemented is awesome.

          Show
          githubbot ASF GitHub Bot added a comment - Github user joewitt commented on a diff in the pull request: https://github.com/apache/nifi/pull/2138#discussion_r143280686 — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java — @@ -290,6 +292,9 @@ public void process(final OutputStream out) throws IOException { } } + // set query timeout + st.setQueryTimeout(queryTimeout); — End diff – disregard what i said. matt pointed out that the code checks IF a timeout was set and is non-zero. Based on that I think the way it was implemented is awesome.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user joewitt commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/2138#discussion_r143281431

          — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/AbstractHiveQLProcessor.java —
          @@ -66,6 +71,38 @@
          .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
          .build();

          + public static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
          + .name("hive-query-timeout")
          + .displayName("Query timeout")
          + .description("Sets the number of seconds the driver will wait for a query to execute. "
          + + "A value of 0 means no timeout. This feature is available starting with Hive 2.1.")
          — End diff –

          I'd remove the 'This feature is available starting with Hive 2.1". Otherwise this all lgtm

          Show
          githubbot ASF GitHub Bot added a comment - Github user joewitt commented on a diff in the pull request: https://github.com/apache/nifi/pull/2138#discussion_r143281431 — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/AbstractHiveQLProcessor.java — @@ -66,6 +71,38 @@ .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR) .build(); + public static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder() + .name("hive-query-timeout") + .displayName("Query timeout") + .description("Sets the number of seconds the driver will wait for a query to execute. " + + "A value of 0 means no timeout. This feature is available starting with Hive 2.1.") — End diff – I'd remove the 'This feature is available starting with Hive 2.1". Otherwise this all lgtm
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattyb149 commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/2138#discussion_r143281834

          — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/AbstractHiveQLProcessor.java —
          @@ -66,6 +71,38 @@
          .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
          .build();

          + public static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
          + .name("hive-query-timeout")
          + .displayName("Query timeout")
          + .description("Sets the number of seconds the driver will wait for a query to execute. "
          + + "A value of 0 means no timeout. This feature is available starting with Hive 2.1.")
          — End diff –

          The part about the feature availability is nice if/when there's a choice, but for now it's not technically germane, since the Hive NAR ships with a particular version of Hive. I will remove that part and merge, the rest LGTM

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2138#discussion_r143281834 — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/AbstractHiveQLProcessor.java — @@ -66,6 +71,38 @@ .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR) .build(); + public static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder() + .name("hive-query-timeout") + .displayName("Query timeout") + .description("Sets the number of seconds the driver will wait for a query to execute. " + + "A value of 0 means no timeout. This feature is available starting with Hive 2.1.") — End diff – The part about the feature availability is nice if/when there's a choice, but for now it's not technically germane, since the Hive NAR ships with a particular version of Hive. I will remove that part and merge, the rest LGTM
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattyb149 commented on the issue:

          https://github.com/apache/nifi/pull/2138

          The "unit tests" for TestSelectHiveQL use Derby as the database, only to test the functionality of getting the "HiveQL" statement to the database and parsing its results. In that vein, Derby supports setQueryTimeout (DERBY-31(https://issues.apache.org/jira/browse/DERBY-31)), so can we add a unit test that sets the value, to exercise that part of the code?

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/2138 The "unit tests" for TestSelectHiveQL use Derby as the database, only to test the functionality of getting the "HiveQL" statement to the database and parsing its results. In that vein, Derby supports setQueryTimeout ( DERBY-31 ( https://issues.apache.org/jira/browse/DERBY-31 )), so can we add a unit test that sets the value, to exercise that part of the code?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattyb149 commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/2138#discussion_r143296927

          — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/AbstractHiveQLProcessor.java —
          @@ -66,6 +71,38 @@
          .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
          .build();

          + public static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
          + .name("hive-query-timeout")
          + .displayName("Query timeout")
          + .description("Sets the number of seconds the driver will wait for a query to execute. "
          + + "A value of 0 means no timeout. This feature is available starting with Hive 2.1.")
          — End diff –

          How about we replace the "This feature is available..." sentence with "NOTE: Non-zero values may not be supported by the driver"

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2138#discussion_r143296927 — Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/AbstractHiveQLProcessor.java — @@ -66,6 +71,38 @@ .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR) .build(); + public static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder() + .name("hive-query-timeout") + .displayName("Query timeout") + .description("Sets the number of seconds the driver will wait for a query to execute. " + + "A value of 0 means no timeout. This feature is available starting with Hive 2.1.") — End diff – How about we replace the "This feature is available..." sentence with "NOTE: Non-zero values may not be supported by the driver"
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user pvillard31 commented on the issue:

          https://github.com/apache/nifi/pull/2138

          Thanks for the review @mattyb149 and @joewitt. I updated the property description based on your comments. Regarding the unit test, since I'm using a ``HiveStatement`` object in the custom validate method, I'm not sure I can easily test the property (it'll always fail in a default build even with the Derby backend). And, if adding a unit test where I expect the validation to raise an error, this test may not have the expected result if using different profiles in the maven build.

          Show
          githubbot ASF GitHub Bot added a comment - Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2138 Thanks for the review @mattyb149 and @joewitt. I updated the property description based on your comments. Regarding the unit test, since I'm using a ``HiveStatement`` object in the custom validate method, I'm not sure I can easily test the property (it'll always fail in a default build even with the Derby backend). And, if adding a unit test where I expect the validation to raise an error, this test may not have the expected result if using different profiles in the maven build.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattyb149 commented on the issue:

          https://github.com/apache/nifi/pull/2138

          I'm getting NPEs in the unit tests, something weird with MockPropertyValue getting created without "expectExpressions" being set to anything, causing isExpressionLanguagePresent() to throw the NPE

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/2138 I'm getting NPEs in the unit tests, something weird with MockPropertyValue getting created without "expectExpressions" being set to anything, causing isExpressionLanguagePresent() to throw the NPE
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user pvillard31 commented on the issue:

          https://github.com/apache/nifi/pull/2138

          I already noticed this error while working on others PRs (I'm a bit surprised I didn't notice the NPE on this PR...). It's because we're checking if the processor is valid before enabling expression validation (https://github.com/apache/nifi/blob/master/nifi-mock/src/main/java/org/apache/nifi/util/StandardProcessorTestRunner.java#L169). We can't really do it the other way around without changing a lot of things.

          I updated the PR to just check if ``expectExpressions`` is null and, if yes, return false. This way we can use ``isExpressionLanguagePresent()`` in a custom validate method.

          Show
          githubbot ASF GitHub Bot added a comment - Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2138 I already noticed this error while working on others PRs (I'm a bit surprised I didn't notice the NPE on this PR...). It's because we're checking if the processor is valid before enabling expression validation ( https://github.com/apache/nifi/blob/master/nifi-mock/src/main/java/org/apache/nifi/util/StandardProcessorTestRunner.java#L169 ). We can't really do it the other way around without changing a lot of things. I updated the PR to just check if ``expectExpressions`` is null and, if yes, return false. This way we can use ``isExpressionLanguagePresent()`` in a custom validate method.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattyb149 commented on the issue:

          https://github.com/apache/nifi/pull/2138

          I talked to @markap14 about it, perhaps this fix is fine or we can just change it to a boolean, but I'll let him take a look too.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/2138 I talked to @markap14 about it, perhaps this fix is fine or we can just change it to a boolean, but I'll let him take a look too.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattyb149 commented on a diff in the pull request:

          https://github.com/apache/nifi/pull/2138#discussion_r150299869

          — Diff: nifi-mock/src/main/java/org/apache/nifi/util/MockPropertyValue.java —
          @@ -225,7 +225,7 @@ public String toString() {

          @Override
          public boolean isExpressionLanguagePresent() {

          • if (!expectExpressions) {
            + if (expectExpressions == null || !expectExpressions) {
              • End diff –

          This fix (in whatever form it takes) should be as part of NIFI-4590(https://issues.apache.org/jira/browse/NIFI-4590). As a temporary fix in my own PR #2260 , I put a try/catch around isExpressionLanguagePresent() with a comment about NIFI-4590. You don't have to do that specifically, but if you can get around changing the mock stuff in this PR, I can merge and add a reference to your workaround in NIFI-4590, to ask that your workaround be removed as part of that Jira.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2138#discussion_r150299869 — Diff: nifi-mock/src/main/java/org/apache/nifi/util/MockPropertyValue.java — @@ -225,7 +225,7 @@ public String toString() { @Override public boolean isExpressionLanguagePresent() { if (!expectExpressions) { + if (expectExpressions == null || !expectExpressions) { End diff – This fix (in whatever form it takes) should be as part of NIFI-4590 ( https://issues.apache.org/jira/browse/NIFI-4590 ). As a temporary fix in my own PR #2260 , I put a try/catch around isExpressionLanguagePresent() with a comment about NIFI-4590 . You don't have to do that specifically, but if you can get around changing the mock stuff in this PR, I can merge and add a reference to your workaround in NIFI-4590 , to ask that your workaround be removed as part of that Jira.

            People

            • Assignee:
              pvillard Pierre Villard
              Reporter:
              pvillard Pierre Villard
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development