Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: None
    • Labels:
      None

      Description

      Compiling using the Hadoop 2.4 profile and running against Hadoop 2.6 is broken due to an updated jackson dependency in Hadoop 2.6.

      We need to create a profile for Hadoop 2.6 that fixes this issue similar to what was done for Spark 1.3 vs Spark 1.2.

      PR to follow shortly

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user RamVenkatesh opened a pull request:

          https://github.com/apache/incubator-zeppelin/pull/31

          ZEPPELIN-33 Need a maven profile for Hadoop 2.6

          Trivial change to create a new maven profile for Hadoop 2.6, tested by running

          mvn clean install -DskipTests -Pspark-1.2 -Phadoop-2.6 -Pyarn

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/RamVenkatesh/incubator-zeppelin ZEPPELIN-33

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/incubator-zeppelin/pull/31.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #31


          commit 17c81ac3bf399cb7d16d77b61d3e42fb89fae633
          Author: Ram Venkatesh <rvenkatesh@hortonworks.com>
          Date: 2015-04-07T14:03:00Z

          ZEPPELIN-33 Need a maven profile for Hadoop 2.6


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user RamVenkatesh opened a pull request: https://github.com/apache/incubator-zeppelin/pull/31 ZEPPELIN-33 Need a maven profile for Hadoop 2.6 Trivial change to create a new maven profile for Hadoop 2.6, tested by running mvn clean install -DskipTests -Pspark-1.2 -Phadoop-2.6 -Pyarn You can merge this pull request into a Git repository by running: $ git pull https://github.com/RamVenkatesh/incubator-zeppelin ZEPPELIN-33 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-zeppelin/pull/31.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #31 commit 17c81ac3bf399cb7d16d77b61d3e42fb89fae633 Author: Ram Venkatesh <rvenkatesh@hortonworks.com> Date: 2015-04-07T14:03:00Z ZEPPELIN-33 Need a maven profile for Hadoop 2.6
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jongyoul commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-90939463

          I remembered that we decided not to make the profile of hadoop-2.6 because It's not different hadoop 2.4 and we follow the spark's profiles, but I think Zeppelin has a minor mistake about codehaus.jackson.version. @RamVenkatesh How about adding a jackson version of 1.9.13 in the profile hadoop-2.4? I think this is only change between hadoop-2.4 and hadoop-2.6.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jongyoul commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-90939463 I remembered that we decided not to make the profile of hadoop-2.6 because It's not different hadoop 2.4 and we follow the spark's profiles, but I think Zeppelin has a minor mistake about codehaus.jackson.version. @RamVenkatesh How about adding a jackson version of 1.9.13 in the profile hadoop-2.4? I think this is only change between hadoop-2.4 and hadoop-2.6.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user RamVenkatesh commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91082014

          @jongyoul I am not sure if hadoop 2.4 will work with this updated version of jackson (at a minimum an untested configuration). Also, if the user does not specify the right version of jackson for hadoop-2.6 the result is a ClassNotFoundException that is hard to debug.

          Show
          githubbot ASF GitHub Bot added a comment - Github user RamVenkatesh commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91082014 @jongyoul I am not sure if hadoop 2.4 will work with this updated version of jackson (at a minimum an untested configuration). Also, if the user does not specify the right version of jackson for hadoop-2.6 the result is a ClassNotFoundException that is hard to debug.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jongyoul commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91090235

          @RamVenkatesh Yes, It works correctly with hadoop 2.4 on jackson 1.9.13. You can use hadoop 2.6 like this:

          ```
          mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0 -Phadoop-2.4 -Pyarn -DskipTests
          ```

          I hope it would be okay, and it should build Zeppelin with Hadoop correctly. Feel free to talk to me if it doesn't work.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jongyoul commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91090235 @RamVenkatesh Yes, It works correctly with hadoop 2.4 on jackson 1.9.13. You can use hadoop 2.6 like this: ``` mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0 -Phadoop-2.4 -Pyarn -DskipTests ``` I hope it would be okay, and it should build Zeppelin with Hadoop correctly. Feel free to talk to me if it doesn't work.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jongyoul commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91091244

          Additionally, in your build, jackson.version is 1.8.8 because you choose spark 1.2. In case of spark 1.2, jackson.version is 1.8.8 by default, I think it's a mistake because jackson.version influence hadoop actually but configured in spark-1.3 profile, so If you change hadoop-2.4's jackson.version to 1.9.13, you can build Zeppelin with spark 1.2. and hadoop 2.6

          ```
          mvn clean install -DskipTests -Pspark-1.2 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Pyarn
          ```

          It may works correctly.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jongyoul commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91091244 Additionally, in your build, jackson.version is 1.8.8 because you choose spark 1.2. In case of spark 1.2, jackson.version is 1.8.8 by default, I think it's a mistake because jackson.version influence hadoop actually but configured in spark-1.3 profile, so If you change hadoop-2.4's jackson.version to 1.9.13, you can build Zeppelin with spark 1.2. and hadoop 2.6 ``` mvn clean install -DskipTests -Pspark-1.2 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Pyarn ``` It may works correctly.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user RamVenkatesh commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91567618

          @jongyoul I don't see the benefit of changing the Hadoop 2.4 profile's dependencies, I have confirmed that 2.4 actually depends the 1.8.8 version of Jackson. I think the profiles should reflect the dependencies correctly, else it is confusing and the profile name just becomes an arbitrary label.

          Show
          githubbot ASF GitHub Bot added a comment - Github user RamVenkatesh commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91567618 @jongyoul I don't see the benefit of changing the Hadoop 2.4 profile's dependencies, I have confirmed that 2.4 actually depends the 1.8.8 version of Jackson. I think the profiles should reflect the dependencies correctly, else it is confusing and the profile name just becomes an arbitrary label.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jongyoul commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91604854

          @rambenkatesh Yes. I understand what you say but I and earlier discussion don't want to break the spark existing profiles rules. If you check the profiles of spark, they don't have the profile of Hadoop 2.6. Because Zeppelin follows the Spark yet, I think it's important not to make a difference. And why do make the profile of Hadoop 2.6 even though there is no difference between Hadoop 2.4 and one of 2.6? Are you sure that Hadoop 2.4 depends on Jackson 1.8.8? As I've checked earlier, Avro uses Jackson 1.9.13 even though Hive uses Jackson 1.8.8. I've patched that version issue of Spark. Please check it again and comment about this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jongyoul commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91604854 @rambenkatesh Yes. I understand what you say but I and earlier discussion don't want to break the spark existing profiles rules. If you check the profiles of spark, they don't have the profile of Hadoop 2.6. Because Zeppelin follows the Spark yet, I think it's important not to make a difference. And why do make the profile of Hadoop 2.6 even though there is no difference between Hadoop 2.4 and one of 2.6? Are you sure that Hadoop 2.4 depends on Jackson 1.8.8? As I've checked earlier, Avro uses Jackson 1.9.13 even though Hive uses Jackson 1.8.8. I've patched that version issue of Spark. Please check it again and comment about this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user RamVenkatesh commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91729957

          @jongyoul please see https://github.com/apache/hadoop/blob/branch-2.4/hadoop-project/pom.xml

          <dependency>
          <groupId>org.codehaus.jackson</groupId>
          <artifactId>jackson-mapper-asl</artifactId>
          <version>1.8.8</version>
          </dependency>

          Also, longer term, when submitting to a (secure) Hadoop cluster, there are differences between Hadoop 2.4 and Hadoop 2.6. I understand your point that Spark 1.2.1 is not tested with Hadoop 2.6, but this option enables us to keep the dependencies correct. What do others think?

          Show
          githubbot ASF GitHub Bot added a comment - Github user RamVenkatesh commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91729957 @jongyoul please see https://github.com/apache/hadoop/blob/branch-2.4/hadoop-project/pom.xml <dependency> <groupId>org.codehaus.jackson</groupId> <artifactId>jackson-mapper-asl</artifactId> <version>1.8.8</version> </dependency> Also, longer term, when submitting to a (secure) Hadoop cluster, there are differences between Hadoop 2.4 and Hadoop 2.6. I understand your point that Spark 1.2.1 is not tested with Hadoop 2.6, but this option enables us to keep the dependencies correct. What do others think?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jongyoul commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91746744

          @RamVenkatesh It looks something different between spark and vanila hadoop. Look at https://github.com/apache/spark/blob/master/pom.xml#L1620-L1631. the profile hadoop-2.4 of Spark supports that jackson.version is 1.9.13. As I've told you, I've tested this jackson library 1.8.8 and 1.9.13 with parsing json of spark 1.2.0, 1.2.1 and 1.3.0 with hadoop 2.3, 2.4, 2.5, 2.3.0-cdh5.0.1, 2.5.0-cdh5.3.0 and 2.5.0-cdh5.3.1 in production level. It works fine with hive and hive on spark. In fact, the version of jackson version is less important, more important thing is fixing same version among jackson-core-asl, jackson-mapper-asl, jackson-xc and jackson-jaxrs. You can find theses in https://github.com/apache/spark/blob/master/pom.xml#L934-L955. Finally, I think adding fixing jackson library 1.9.13 is enough to support hadoop-2.6. And concerning (secure) Hadoop cluster, I've not tested it but It's Ok because I believe Spark already is being tested. From now, Zeppelin uses Hadoop from Spark, actually. If you found serious problem while using Hadoop, feel free to talk to me. I've known you patched for hiveInterpreter. Is there any problem to use hiveInterpreter with this jackson version?

          Show
          githubbot ASF GitHub Bot added a comment - Github user jongyoul commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91746744 @RamVenkatesh It looks something different between spark and vanila hadoop. Look at https://github.com/apache/spark/blob/master/pom.xml#L1620-L1631 . the profile hadoop-2.4 of Spark supports that jackson.version is 1.9.13. As I've told you, I've tested this jackson library 1.8.8 and 1.9.13 with parsing json of spark 1.2.0, 1.2.1 and 1.3.0 with hadoop 2.3, 2.4, 2.5, 2.3.0-cdh5.0.1, 2.5.0-cdh5.3.0 and 2.5.0-cdh5.3.1 in production level. It works fine with hive and hive on spark. In fact, the version of jackson version is less important, more important thing is fixing same version among jackson-core-asl, jackson-mapper-asl, jackson-xc and jackson-jaxrs. You can find theses in https://github.com/apache/spark/blob/master/pom.xml#L934-L955 . Finally, I think adding fixing jackson library 1.9.13 is enough to support hadoop-2.6. And concerning (secure) Hadoop cluster, I've not tested it but It's Ok because I believe Spark already is being tested. From now, Zeppelin uses Hadoop from Spark, actually. If you found serious problem while using Hadoop, feel free to talk to me. I've known you patched for hiveInterpreter. Is there any problem to use hiveInterpreter with this jackson version?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jongyoul commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91747685

          @Leemoonsoo @swkimme @RamVenkatesh and I discuss with adding a new profile for hadoop 2.6. Do you think about this? I think we should follow Spark's way but @RamVenkatesh thinks it's clear to make a profile for each version. Both @RamVenkatesh and I have valid opinions for maintaining dependencies. I hope you make a policy of this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jongyoul commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91747685 @Leemoonsoo @swkimme @RamVenkatesh and I discuss with adding a new profile for hadoop 2.6. Do you think about this? I think we should follow Spark's way but @RamVenkatesh thinks it's clear to make a profile for each version. Both @RamVenkatesh and I have valid opinions for maintaining dependencies. I hope you make a policy of this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user echarles commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91997703

          Quick input: I favor adding a profile for each version, it makes thinks clear.

          I am confused every time I need to build spark with options like"..-Phadoop-2.4 -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0...".

          Show
          githubbot ASF GitHub Bot added a comment - Github user echarles commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-91997703 Quick input: I favor adding a profile for each version, it makes thinks clear. I am confused every time I need to build spark with options like"..-Phadoop-2.4 -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0...".
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user Leemoonsoo commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-92101256

          If it is documented and any user can easily find/follow the instruction, both okay to me.

          I think,
          @jongyoul suggested way - follow Spark's way - is better for code maintenance,
          @RamVenkatesh suggested way - have profile for each hadoop version - is better for user use.

          Will there a good way to take advantages of both way? Any good idea?

          Show
          githubbot ASF GitHub Bot added a comment - Github user Leemoonsoo commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-92101256 If it is documented and any user can easily find/follow the instruction, both okay to me. I think, @jongyoul suggested way - follow Spark's way - is better for code maintenance, @RamVenkatesh suggested way - have profile for each hadoop version - is better for user use. Will there a good way to take advantages of both way? Any good idea?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user RamVenkatesh commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-92158417

          @jongyoul , @Leemoonsoo @echarles thank you all for your comments, appreciate it.

          A few more points:
          1. Without this change, our Zeppelin installation is completely broken - am upping the priority of this bug to blocker to expedite resolution.
          2. Even though the Spark project built against Hadoop 2.4, Spark 1.2.1 + Hadoop 2.6 is a valid tested configuration in some Spark distros, HDP for example. And this is optional, so people who want the stock Spark experience can specify the 2.4 profile.
          3. Hadoop 2.4 and Hadoop 2.6 are not identical, there are a number of fixes in 2.6 that we will need and independent of this discussion we will need a way to build with Hadoop 2.6. Modifying the 2.4 profile to specify 2.6 does not seem the best option to me.

          IMO everything we can do to help with adoption out of the box should be our priority now.

          To address the maintenance concern, may I suggest we add a Hadoop 2.6 profile now, which can be removed at what the community thinks is the right time?

          Show
          githubbot ASF GitHub Bot added a comment - Github user RamVenkatesh commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-92158417 @jongyoul , @Leemoonsoo @echarles thank you all for your comments, appreciate it. A few more points: 1. Without this change, our Zeppelin installation is completely broken - am upping the priority of this bug to blocker to expedite resolution. 2. Even though the Spark project built against Hadoop 2.4, Spark 1.2.1 + Hadoop 2.6 is a valid tested configuration in some Spark distros, HDP for example. And this is optional, so people who want the stock Spark experience can specify the 2.4 profile. 3. Hadoop 2.4 and Hadoop 2.6 are not identical, there are a number of fixes in 2.6 that we will need and independent of this discussion we will need a way to build with Hadoop 2.6. Modifying the 2.4 profile to specify 2.6 does not seem the best option to me. IMO everything we can do to help with adoption out of the box should be our priority now. To address the maintenance concern, may I suggest we add a Hadoop 2.6 profile now, which can be removed at what the community thinks is the right time?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user Leemoonsoo commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-93354325

          @RamVenkatesh Your suggestion about 2.6 profile sounds like a good plan. I think we can try and see how it goes. @jongyoul What do you think?

          Show
          githubbot ASF GitHub Bot added a comment - Github user Leemoonsoo commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-93354325 @RamVenkatesh Your suggestion about 2.6 profile sounds like a good plan. I think we can try and see how it goes. @jongyoul What do you think?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jongyoul commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-93615854

          @Leemoonsoo Okay, I fully understand @RamVenkatesh's opinion and facts. Only thing that I'm worried about is that we may make more profiles for each version of hadoop and spark. I don't hope that occurs. We should need more efforts not to do that.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jongyoul commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-93615854 @Leemoonsoo Okay, I fully understand @RamVenkatesh's opinion and facts. Only thing that I'm worried about is that we may make more profiles for each version of hadoop and spark. I don't hope that occurs. We should need more efforts not to do that.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user Leemoonsoo commented on the pull request:

          https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-93683246

          @jongyoul Thank you for the answer. Then i'll merge this PR. And how about create some kind of guidelines for maven profile creation, are you interested in? As we're getting more and more Interpreter implementation as well as Spark's new version release, i think we might need one.

          Show
          githubbot ASF GitHub Bot added a comment - Github user Leemoonsoo commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/31#issuecomment-93683246 @jongyoul Thank you for the answer. Then i'll merge this PR. And how about create some kind of guidelines for maven profile creation, are you interested in? As we're getting more and more Interpreter implementation as well as Spark's new version release, i think we might need one.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/incubator-zeppelin/pull/31

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/incubator-zeppelin/pull/31
          Hide
          moon Lee moon soo added a comment -
          Show
          moon Lee moon soo added a comment - Issue resolved by pull request 31 https://github.com/apache/incubator-zeppelin/pulls/31

            People

            • Assignee:
              venkateshrin Ram Venkatesh
              Reporter:
              venkateshrin Ram Venkatesh
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development