Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4.0
    • Fix Version/s: None
    • Component/s: Core Framework
    • Labels:
      None

      Description

      PutParquet doesn't support S3 targets due to the lack of the hadoop-aws dependency.

      To recreate it:
      1) Modify core-site.xml to change fs.defaultName to one starting with s3://
      2) Add a PutParquet
      3) It will fail to run. The logs will show the missing hadoop-aws dependency.

      Simple fix is just to add hadoop-aws.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user joewitt commented on the issue:

          https://github.com/apache/nifi/pull/2293

          @baank so given joey's comments/JIRA/example do you feel this gets you to a good state?

          Show
          githubbot ASF GitHub Bot added a comment - Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2293 @baank so given joey's comments/JIRA/example do you feel this gets you to a good state?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jfrazee commented on the issue:

          https://github.com/apache/nifi/pull/2293

          @joewitt I don't think so. If we get NIFI-4650 in, then we can probably support builds against 2.8.x, 2.9.x, etc. on official releases without introducing more deps to PutParquet.

          @baank I know the suggested change would allow a lot more convenience and maybe it's a little opaque to know that we intend to use the additional classpath. What might be super helpful in these cases in "Additional Details" documentation on the processor to lay out some known jar configurations and build scenarios.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jfrazee commented on the issue: https://github.com/apache/nifi/pull/2293 @joewitt I don't think so. If we get NIFI-4650 in, then we can probably support builds against 2.8.x, 2.9.x, etc. on official releases without introducing more deps to PutParquet. @baank I know the suggested change would allow a lot more convenience and maybe it's a little opaque to know that we intend to use the additional classpath. What might be super helpful in these cases in "Additional Details" documentation on the processor to lay out some known jar configurations and build scenarios.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user baank closed the pull request at:

          https://github.com/apache/nifi/pull/2293

          Show
          githubbot ASF GitHub Bot added a comment - Github user baank closed the pull request at: https://github.com/apache/nifi/pull/2293
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user baank commented on the issue:

          https://github.com/apache/nifi/pull/2293

          @joewitt .. I appreciate the situation and what I meant was that HDF is likely to adopt whatever changes are part of the main line branch. But I appreciate it's an open community.

          Unfortunately for us building our own custom NiFi is not acceptable unless we take on the fully own the support risk.

          Happy to close this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user baank commented on the issue: https://github.com/apache/nifi/pull/2293 @joewitt .. I appreciate the situation and what I meant was that HDF is likely to adopt whatever changes are part of the main line branch. But I appreciate it's an open community. Unfortunately for us building our own custom NiFi is not acceptable unless we take on the fully own the support risk. Happy to close this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user joewitt commented on the issue:

          https://github.com/apache/nifi/pull/2293

          @jfrazee did I misunderstand your suggestion?

          Show
          githubbot ASF GitHub Bot added a comment - Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2293 @jfrazee did I misunderstand your suggestion?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user joewitt commented on the issue:

          https://github.com/apache/nifi/pull/2293

          @baank To be clear this is the Apache NiFi open source community. There are no customers here - just an open community.

          Regarding this contribution: The contribution as provided thus far creates an alternative problem, for which, a couple of committers have said we should find a better way. What Joey is showing/suggesting above, specifically with NIFI-4650, would give you the flexibility you appear to need and doesn't seem to come with the same introduced alternative problem. So this is probably a good path to pursue.

          Regarding the comments about a particular vendor distribution obviously this community cannot speak to that.

          Hopefully that helps you see what is being suggested is not recommending you do any forking.

          Thanks

          Show
          githubbot ASF GitHub Bot added a comment - Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2293 @baank To be clear this is the Apache NiFi open source community. There are no customers here - just an open community. Regarding this contribution: The contribution as provided thus far creates an alternative problem, for which, a couple of committers have said we should find a better way. What Joey is showing/suggesting above, specifically with NIFI-4650 , would give you the flexibility you appear to need and doesn't seem to come with the same introduced alternative problem. So this is probably a good path to pursue. Regarding the comments about a particular vendor distribution obviously this community cannot speak to that. Hopefully that helps you see what is being suggested is not recommending you do any forking. Thanks
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user baank commented on the issue:

          https://github.com/apache/nifi/pull/2293

          People like myself who represent very large enterprise customers are looking to run a supported platform in production i.e. HDF. So custom builds is a very big change in direction.

          Can I request a final decision whether this will be merged or not ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user baank commented on the issue: https://github.com/apache/nifi/pull/2293 People like myself who represent very large enterprise customers are looking to run a supported platform in production i.e. HDF. So custom builds is a very big change in direction. Can I request a final decision whether this will be merged or not ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jfrazee commented on the issue:

          https://github.com/apache/nifi/pull/2293

          @baank So, we still wouldn't need to update the Hadoop version on everything yet, because in principle you can just do a build of NiFi overriding the hadoop.version property and use 2.8.x. For example:

          ```sh
          $ mvn -T 2.0C clean install -Dhadoop.version=2.8.2 -Dhadoop.guava.version=12.0.1 -Dhadoop.http.client.version=4.5.2 -Dhadoop.http.core.version=4.4.4 -DskipTests
          ```
          That said, this is a little bit of a lie because in later versions of HttpComponents HttpClient and HttpCore aren't versioned identically and we currently only use a single property hadoop.http.client.version for these; i.e., the hadoop.http.core.version property doesn't exist yet. See NIFI-4650(https://issues.apache.org/jira/browse/NIFI-4650) though.

          So, I did the build above with the new property and tested with the following jars and things seem to work:

          ```
          aws-java-sdk-core-1.11.68.jar
          aws-java-sdk-kms-1.11.68.jar
          aws-java-sdk-s3-1.11.68.jar
          hadoop-aws-2.8.2.jar
          hadoop-common-2.8.2.jar
          httpclient-4.5.2.jar
          httpcore-4.4.4.jar
          jackson-annotations-2.6.0.jar
          jackson-core-2.6.1.jar
          jackson-databind-2.6.1.jar
          joda-time-2.8.2.jar
          ```

          We're trying to be very cautious about updating the default to the next major version of Hadoop so it might be best to stick with this still being a property override.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jfrazee commented on the issue: https://github.com/apache/nifi/pull/2293 @baank So, we still wouldn't need to update the Hadoop version on everything yet, because in principle you can just do a build of NiFi overriding the hadoop.version property and use 2.8.x. For example: ```sh $ mvn -T 2.0C clean install -Dhadoop.version=2.8.2 -Dhadoop.guava.version=12.0.1 -Dhadoop.http.client.version=4.5.2 -Dhadoop.http.core.version=4.4.4 -DskipTests ``` That said, this is a little bit of a lie because in later versions of HttpComponents HttpClient and HttpCore aren't versioned identically and we currently only use a single property hadoop.http.client.version for these; i.e., the hadoop.http.core.version property doesn't exist yet. See NIFI-4650 ( https://issues.apache.org/jira/browse/NIFI-4650 ) though. So, I did the build above with the new property and tested with the following jars and things seem to work: ``` aws-java-sdk-core-1.11.68.jar aws-java-sdk-kms-1.11.68.jar aws-java-sdk-s3-1.11.68.jar hadoop-aws-2.8.2.jar hadoop-common-2.8.2.jar httpclient-4.5.2.jar httpcore-4.4.4.jar jackson-annotations-2.6.0.jar jackson-core-2.6.1.jar jackson-databind-2.6.1.jar joda-time-2.8.2.jar ``` We're trying to be very cautious about updating the default to the next major version of Hadoop so it might be best to stick with this still being a property override.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user baank commented on the issue:

          https://github.com/apache/nifi/pull/2293

          @jfrazee .. The version of Hadoop-AWS you are referring to does not support ANY S3 encryption which rules that solution out for many organisations. To support S3 encryption you need hadoop-aws-2.8.1 as a minimum.

          So unless I am mistaken the Additional Classpath Resources is not a solution since it will result in JAR conflicts.

          I am happy to look into upgrading all of the other processors to 2.8.1 as we need encryption and really don't want to be running a fork of NiFi in production.

          Show
          githubbot ASF GitHub Bot added a comment - Github user baank commented on the issue: https://github.com/apache/nifi/pull/2293 @jfrazee .. The version of Hadoop-AWS you are referring to does not support ANY S3 encryption which rules that solution out for many organisations. To support S3 encryption you need hadoop-aws-2.8.1 as a minimum. So unless I am mistaken the Additional Classpath Resources is not a solution since it will result in JAR conflicts. I am happy to look into upgrading all of the other processors to 2.8.1 as we need encryption and really don't want to be running a fork of NiFi in production.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user joewitt commented on the issue:

          https://github.com/apache/nifi/pull/2293

          i share joey's view on this. And also we'd have to factor in the license/notice updates necessary for this. Finally, we'll need to be careful about the hadoop version used and being consistent with other processors.

          Show
          githubbot ASF GitHub Bot added a comment - Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2293 i share joey's view on this. And also we'd have to factor in the license/notice updates necessary for this. Finally, we'll need to be careful about the hadoop version used and being consistent with other processors.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jfrazee commented on the issue:

          https://github.com/apache/nifi/pull/2293

          @baank Definitely thanks for the contrib, but unfortunately I don't think this is the right solution/it supports S3 without any code changes.

          PutParquet provides an "Additional Classpath Resources" property that you can point at a directory and provide all the S3 dependencies. Here's what I used:

          ```
          aws-java-sdk-1.7.4.jar
          hadoop-aws-2.7.3.jar
          hadoop-common-2.7.3.jar
          httpclient-4.5.3.jar
          httpcore-4.4.4.jar
          jackson-annotations-2.6.0.jar
          jackson-core-2.6.1.jar
          jackson-databind-2.6.1.jar
          ```

          We take the same approach with PutHDFS for filesystems that aren't included in the core Hadoop libs, so it seems to make sense to keep doing the same here.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jfrazee commented on the issue: https://github.com/apache/nifi/pull/2293 @baank Definitely thanks for the contrib, but unfortunately I don't think this is the right solution/it supports S3 without any code changes. PutParquet provides an "Additional Classpath Resources" property that you can point at a directory and provide all the S3 dependencies. Here's what I used: ``` aws-java-sdk-1.7.4.jar hadoop-aws-2.7.3.jar hadoop-common-2.7.3.jar httpclient-4.5.3.jar httpcore-4.4.4.jar jackson-annotations-2.6.0.jar jackson-core-2.6.1.jar jackson-databind-2.6.1.jar ``` We take the same approach with PutHDFS for filesystems that aren't included in the core Hadoop libs, so it seems to make sense to keep doing the same here.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user baank opened a pull request:

          https://github.com/apache/nifi/pull/2293

          NIFI-4565 - PutParquet doesn't support S3

          Thank you for submitting a contribution to Apache NiFi.

          In order to streamline the review of the contribution we ask you
          to ensure the following steps have been taken:

              1. For all changes:
          • [ ] Is there a JIRA ticket associated with this PR? Is it referenced
            in the commit message?
          • [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
          • [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
          • [ ] Is your initial contribution a single, squashed commit?
              1. For code changes:
          • [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
          • [ ] Have you written or updated unit tests to verify your changes?
          • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
          • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
          • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
          • [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
              1. For documentation related changes:
          • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
              1. Note:
                Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/baank/nifi NIFI-4565

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/nifi/pull/2293.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2293


          commit 18e91e1d4ccbb05595d6a39c50a57068da9962d3
          Author: Naden Franciscus <nadenfranciscus@nbnco.com.au>
          Date: 2017-11-24T06:17:27Z

          NIFI-4565: PutParquet doesn't support S3


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user baank opened a pull request: https://github.com/apache/nifi/pull/2293 NIFI-4565 - PutParquet doesn't support S3 Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: For all changes: [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? [ ] Is your initial contribution a single, squashed commit? For code changes: [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? [ ] Have you written or updated unit tests to verify your changes? [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0] ( http://www.apache.org/legal/resolved.html#category-a)? [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? For documentation related changes: [ ] Have you ensured that format looks appropriate for the output in which it is rendered? Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/baank/nifi NIFI-4565 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/2293.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2293 commit 18e91e1d4ccbb05595d6a39c50a57068da9962d3 Author: Naden Franciscus <nadenfranciscus@nbnco.com.au> Date: 2017-11-24T06:17:27Z NIFI-4565 : PutParquet doesn't support S3

            People

            • Assignee:
              jfrazee Joey Frazee
              Reporter:
              nadenf Franco
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development