Hive
  1. Hive
  2. HIVE-487

Hive does not compile with Hadoop 0.20.0

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.3.0
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Attempting to compile Hive with Hadoop 0.20.0 fails:

      aaron@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
      (several lines elided)
      compile:
      [echo] Compiling: hive
      [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
      [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol
      [javac] symbol : method getCommandLineConfig()
      [javac] location: class org.apache.hadoop.mapred.JobClient
      [javac] Configuration commandConf = JobClient.getCommandLineConfig();
      [javac] ^
      [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol
      [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf)
      [javac] location: interface org.apache.hadoop.mapred.InputFormat
      [javac] inputFormat.validateInput(newjob);
      [javac] ^
      [javac] Note: Some input files use or override a deprecated API.
      [javac] Note: Recompile with -Xlint:deprecation for details.
      [javac] Note: Some input files use unchecked or unsafe operations.
      [javac] Note: Recompile with -Xlint:unchecked for details.
      [javac] 2 errors

      BUILD FAILED
      /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line:
      /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details.

      1. dynamic-proxy.tar.gz
        2 kB
        Todd Lipcon
      2. hive-487.3.patch
        9 kB
        Joydeep Sen Sarma
      3. hive-487.4.patch
        9 kB
        Joydeep Sen Sarma
      4. HIVE-487.patch
        40 kB
        Justin Lynn
      5. hive-487.txt
        34 kB
        Todd Lipcon
      6. hive-487.txt
        34 kB
        Todd Lipcon
      7. HIVE-487-2.patch
        5 kB
        Justin Lynn
      8. hive-487-jetty.patch
        17 kB
        Edward Capriolo
      9. hive-487-jetty-2.diff
        17 kB
        Edward Capriolo
      10. hive-487-runtime.patch
        56 kB
        Todd Lipcon
      11. hive-487-with-cli-changes.2.patch
        40 kB
        Joydeep Sen Sarma
      12. hive-487-with-cli-changes.3.patch
        41 kB
        Todd Lipcon
      13. hive-487-with-cli-changes.patch
        38 kB
        Joydeep Sen Sarma
      14. jetty-patch.patch
        8 kB
        Edward Capriolo
      15. junit-patch1.html
        372 kB
        Justin Lynn
      16. patch-487.txt
        37 kB
        Ashish Thusoo

        Issue Links

          Activity

          Hide
          Justin Lynn added a comment -

          This patch (based on trunk) will allow a compile and passes all but two unit tests (see attached junit test report). This is my first time contributing so I'm going to need a bit of help with the SQL result differences.

          Show
          Justin Lynn added a comment - This patch (based on trunk) will allow a compile and passes all but two unit tests (see attached junit test report). This is my first time contributing so I'm going to need a bit of help with the SQL result differences.
          Hide
          Justin Lynn added a comment -

          Actually it appears that those test failures are /not/ related to my changes (I checked out vanilla trunk and built and tested it, and those two tests still failed).

          Show
          Justin Lynn added a comment - Actually it appears that those test failures are /not/ related to my changes (I checked out vanilla trunk and built and tested it, and those two tests still failed).
          Hide
          Ashish Thusoo added a comment -

          Assigned.

          Show
          Ashish Thusoo added a comment - Assigned.
          Hide
          Ashish Thusoo added a comment -

          Thanks for your contribution.
          Please do a submit patch when you attach a patch so that we get it in the patch submitted queue.

          Show
          Ashish Thusoo added a comment - Thanks for your contribution. Please do a submit patch when you attach a patch so that we get it in the patch submitted queue.
          Hide
          Ashish Thusoo added a comment -

          Hi Justin, Why have all the @Overrides have been taken out from the code.

          Show
          Ashish Thusoo added a comment - Hi Justin, Why have all the @Overrides have been taken out from the code.
          Hide
          Justin Lynn added a comment -

          The @Override annotations were removed because they were causing runtime exceptions to be generated in the latest sun Java VM. This is because those annotations were used on methods that were not correctly overriding anything in the respective superclass.

          Show
          Justin Lynn added a comment - The @Override annotations were removed because they were causing runtime exceptions to be generated in the latest sun Java VM. This is because those annotations were used on methods that were not correctly overriding anything in the respective superclass.
          Hide
          Prasad Chakka added a comment -

          which version of JAVA VM are you using?

          The below doesn't throw such errors but some configurations of Eclipse does throw such errors.

          pchakka@dev111 ~ > java -version
          java version "1.6.0_07"
          Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
          Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)

          Show
          Prasad Chakka added a comment - which version of JAVA VM are you using? The below doesn't throw such errors but some configurations of Eclipse does throw such errors. pchakka@dev111 ~ > java -version java version "1.6.0_07" Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
          Hide
          Justin Lynn added a comment -

          Output of my java -version reads:
          java version "1.6.0_13"
          Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
          Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode)

          Looking for more differences or reasons why this might happen. Double checking some things.

          Show
          Justin Lynn added a comment - Output of my java -version reads: java version "1.6.0_13" Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode) Looking for more differences or reasons why this might happen. Double checking some things.
          Hide
          Justin Lynn added a comment -

          This patch removes all extraneous @override removals (potentially not neccessary, should've been done in another bug). Passes all unit tests on trunk (rev. 778559).

          Show
          Justin Lynn added a comment - This patch removes all extraneous @override removals (potentially not neccessary, should've been done in another bug). Passes all unit tests on trunk (rev. 778559).
          Hide
          Justin Lynn added a comment -

          This patch does not attempt to re-implement the hadoop version specific functionality that was removed. This could potentially cause a bug in running trunk hive on hadoop 0.17.x as indicated by the comment on the removed code. Is this acceptable or should an attempt be made at reimplementation of intention using non-depreciated/removed interfaces?

          Show
          Justin Lynn added a comment - This patch does not attempt to re-implement the hadoop version specific functionality that was removed. This could potentially cause a bug in running trunk hive on hadoop 0.17.x as indicated by the comment on the removed code. Is this acceptable or should an attempt be made at reimplementation of intention using non-depreciated/removed interfaces?
          Hide
          Ashish Thusoo added a comment -

          Actually this has to compile with hadoop 0.17.x otherwise we will not be able to deploy this internally at FB. We are still on hadoop 0.17 and we have already lauched trunk into production.

          Show
          Ashish Thusoo added a comment - Actually this has to compile with hadoop 0.17.x otherwise we will not be able to deploy this internally at FB. We are still on hadoop 0.17 and we have already lauched trunk into production.
          Hide
          Ashish Thusoo added a comment -

          Actually this has to compile with hadoop 0.17.x otherwise we will not be able to deploy this internally at FB. We are still on hadoop 0.17 and we have already lauched trunk into production.

          Show
          Ashish Thusoo added a comment - Actually this has to compile with hadoop 0.17.x otherwise we will not be able to deploy this internally at FB. We are still on hadoop 0.17 and we have already lauched trunk into production.
          Hide
          Justin Lynn added a comment -

          Understood, I'll see what I can do. However, it appears that the API is starting to pull away with what would easily be reverse compatible. A 0.20.0+ branch might be warranted.

          Show
          Justin Lynn added a comment - Understood, I'll see what I can do. However, it appears that the API is starting to pull away with what would easily be reverse compatible. A 0.20.0+ branch might be warranted.
          Hide
          Aaron Kimball added a comment -

          I'm +1 on an 0.20 branch. Cloudera's Distribution for Hadoop will be moving toward an 0.20 base and we would like to offer a compatible edition of Hive.

          Show
          Aaron Kimball added a comment - I'm +1 on an 0.20 branch. Cloudera's Distribution for Hadoop will be moving toward an 0.20 base and we would like to offer a compatible edition of Hive.
          Hide
          Ashwani Kumar added a comment -

          Another +1 for the 0.20 branch.

          Show
          Ashwani Kumar added a comment - Another +1 for the 0.20 branch.
          Hide
          Joydeep Sen Sarma added a comment -

          I agree that at some point we would drop support for older branches (<=17). But it's still worth looking at whether we can avoid doing this right now. For example for the ExecDriver change:

          • // workaround for hadoop-17 - jobclient only looks at commandlineconfig
          • Configuration commandConf = JobClient.getCommandLineConfig();
          • if (commandConf != null) {
          • commandConf.set("tmpfiles", realFiles);

          we can do this using reflection - (grep -i declaredmethod HiveInputFormat.java).

          are the mapredtask.java changes necessary?

          Show
          Joydeep Sen Sarma added a comment - I agree that at some point we would drop support for older branches (<=17). But it's still worth looking at whether we can avoid doing this right now. For example for the ExecDriver change: // workaround for hadoop-17 - jobclient only looks at commandlineconfig Configuration commandConf = JobClient.getCommandLineConfig(); if (commandConf != null) { commandConf.set("tmpfiles", realFiles); we can do this using reflection - (grep -i declaredmethod HiveInputFormat.java). are the mapredtask.java changes necessary?
          Hide
          Joydeep Sen Sarma added a comment -

          this version uses reflection in a couple of places so that ql continues to compile with older versions.

          i haven't had a chance to look at the hwi code and make that portable (it doesn't compile with 19 currently). If Edward could take a look - that would be awesome.

          Show
          Joydeep Sen Sarma added a comment - this version uses reflection in a couple of places so that ql continues to compile with older versions. i haven't had a chance to look at the hwi code and make that portable (it doesn't compile with 19 currently). If Edward could take a look - that would be awesome.
          Hide
          Edward Capriolo added a comment -

          I gave a quick try and was getting hunk errors. Are these patches cumulative? Should I apply them in order HIVE-487.patch, HIVE-487-2.patch, hive-487.3.patch?

          I looked at jetty the HWI server in the patch. The changes are cosmetic. We might be able to make that portable with reflection as well if that is what we want.

          Show
          Edward Capriolo added a comment - I gave a quick try and was getting hunk errors. Are these patches cumulative? Should I apply them in order HIVE-487 .patch, HIVE-487 -2.patch, hive-487.3.patch? I looked at jetty the HWI server in the patch. The changes are cosmetic. We might be able to make that portable with reflection as well if that is what we want.
          Hide
          Joydeep Sen Sarma added a comment -

          regenerated.

          the HWI stuff does not compile against 0.19 and prior because of the change to using new jetty apis. one option is to bundle the new jetty jar with hive. hadoop seems to have moved to using ivy and i am wondering if we should do the same.

          Show
          Joydeep Sen Sarma added a comment - regenerated. the HWI stuff does not compile against 0.19 and prior because of the change to using new jetty apis. one option is to bundle the new jetty jar with hive. hadoop seems to have moved to using ivy and i am wondering if we should do the same.
          Hide
          Joydeep Sen Sarma added a comment -

          have to use reflection. putting new jetty jars in hive does not matter since hadoop's jars take precedence at runtime (since we launch everything via hadoop)

          can take a shot - looks simple ..

          Show
          Joydeep Sen Sarma added a comment - have to use reflection. putting new jetty jars in hive does not matter since hadoop's jars take precedence at runtime (since we launch everything via hadoop) can take a shot - looks simple ..
          Hide
          Prasad Chakka added a comment -

          i don't think we can keep single version of Hive for all active versions of Hadoop. Why don't we release a branch for 20 and periodically merge from trunk to 20?

          Show
          Prasad Chakka added a comment - i don't think we can keep single version of Hive for all active versions of Hadoop. Why don't we release a branch for 20 and periodically merge from trunk to 20?
          Hide
          Joydeep Sen Sarma added a comment -

          branching will get fairly complicated. where will 0.4.0 be branched off? sure there will be a time to deprecate support for older hadoop ersions - just not convinced this is it.

          Note that the dependency in this case is particularly frivolous. We could ship Hive with a single version of jetty that Hive components require - but instead we are depending on Hadoop to provide it. This seems more like a setup problem on our side.

          One simple option (to not use reflection) is to add the right version of jetty into the runtime classpath (if it's not there already). the compile time works fine already. (since we control the classpath from build xmls)

          Show
          Joydeep Sen Sarma added a comment - branching will get fairly complicated. where will 0.4.0 be branched off? sure there will be a time to deprecate support for older hadoop ersions - just not convinced this is it. Note that the dependency in this case is particularly frivolous. We could ship Hive with a single version of jetty that Hive components require - but instead we are depending on Hadoop to provide it. This seems more like a setup problem on our side. One simple option (to not use reflection) is to add the right version of jetty into the runtime classpath (if it's not there already). the compile time works fine already. (since we control the classpath from build xmls)
          Hide
          Joydeep Sen Sarma added a comment -

          so i tried a custom classloader to force the new jetty jars to be used preferentially for loading classes.

          alas - it does not work. it seems that some classes from the hadoop jetty jars are already loaded by the time control is transferred to hive/hwi. trying to load remaining classes from the new jetty jar causes a 'sealing violation exception'. (this is with hadoop-19)

          only reasonable alternative i can think of now is to run the hwiserver by spawning a new jvm (with a modified classpath that omits hadoop's jetty jars)

          Show
          Joydeep Sen Sarma added a comment - so i tried a custom classloader to force the new jetty jars to be used preferentially for loading classes. alas - it does not work. it seems that some classes from the hadoop jetty jars are already loaded by the time control is transferred to hive/hwi. trying to load remaining classes from the new jetty jar causes a 'sealing violation exception'. (this is with hadoop-19) only reasonable alternative i can think of now is to run the hwiserver by spawning a new jvm (with a modified classpath that omits hadoop's jetty jars)
          Hide
          Edward Capriolo added a comment -

          Joydeep,
          We do depend on Hadoop to provide Jetty. The rational was not to require extra or external libraries for the user. At the time Hive had just become its own project from being a Hadoop contrib project so it made sense to depend on Hadoop Jetty. We have a few other options. We can use a completely different Web Server. http://tjws.sourceforge.net/. Now we have no conflicts. Or we can just build a war with no embedded type options. Even if we switch to tjws we still might end up using reflection since the API could change over time although we would chose when to upgrade the servlet engine, not hadoop. For now I will make a version that uses reflection to start up the server, since these changes are mostly cosmetic.

          Show
          Edward Capriolo added a comment - Joydeep, We do depend on Hadoop to provide Jetty. The rational was not to require extra or external libraries for the user. At the time Hive had just become its own project from being a Hadoop contrib project so it made sense to depend on Hadoop Jetty. We have a few other options. We can use a completely different Web Server. http://tjws.sourceforge.net/ . Now we have no conflicts. Or we can just build a war with no embedded type options. Even if we switch to tjws we still might end up using reflection since the API could change over time although we would chose when to upgrade the servlet engine, not hadoop. For now I will make a version that uses reflection to start up the server, since these changes are mostly cosmetic.
          Hide
          Edward Capriolo added a comment -

          Ok half way there. I added an abstraction to use reflection to completely kick up the HWI Jetty Server. Right now I only added the code for jetty5 0.19.0. 20 soon. Is this what everyone had in mind?

          Show
          Edward Capriolo added a comment - Ok half way there. I added an abstraction to use reflection to completely kick up the HWI Jetty Server. Right now I only added the code for jetty5 0.19.0. 20 soon. Is this what everyone had in mind?
          Hide
          Ashish Thusoo added a comment -

          This sounds reasonable to me. Will go over the patch in more detail. Are you planning to upload another one soon or should I just review this one?

          Show
          Ashish Thusoo added a comment - This sounds reasonable to me. Will go over the patch in more detail. Are you planning to upload another one soon or should I just review this one?
          Hide
          Edward Capriolo added a comment -

          This is going a little slow. The reflection aspect is pretty painful coding. I think I am 98% percent complete. New test cases added. Hopefully I can have a final take in a day or two, sometimes its is hard to decide what exceptions to throw etc since there are very few design patters based around reflecting entire applications

          Show
          Edward Capriolo added a comment - This is going a little slow. The reflection aspect is pretty painful coding. I think I am 98% percent complete. New test cases added. Hopefully I can have a final take in a day or two, sometimes its is hard to decide what exceptions to throw etc since there are very few design patters based around reflecting entire applications
          Hide
          Edward Capriolo added a comment -

          This patch passes all unit tests. Also deployed and tested to a 0.19.0 cluster and a 0.20.0 cluster.

          Changes to hive-default.conf are to correct
          hive-hwi.war
          to
          hive_hwi.war

          My changes to HWIServer.java clobbered other changes in the previous 0.20 patch since HWIServer.java delegates most duty to ServerWrapper.java

          Show
          Edward Capriolo added a comment - This patch passes all unit tests. Also deployed and tested to a 0.19.0 cluster and a 0.20.0 cluster. Changes to hive-default.conf are to correct hive-hwi.war to hive_hwi.war My changes to HWIServer.java clobbered other changes in the previous 0.20 patch since HWIServer.java delegates most duty to ServerWrapper.java
          Hide
          Namit Jain added a comment -

          Overall, it looks good - but can you do a simple cosmetic changes.

          1. Have 1 patch instead of different patches for jetty and 487
          2. Add more comments:

          /**
          29 Hadoop 17-19 used Jetty5. Hadoop 20 uses jetty6. Hive still should compile and
          ...run with all versions.
          30 Java is strongly typed Class based language. The Reflection API is required to
          ...circumvent the strong
          31 typing. We have used the reflection API to deal with the known versions of Jetty
          ...we must work with.
          32 CS students: If you are ever in a debate about classless VS classful programming
          ...be sure to
          33 reference this code.
          34 */

          is very good:

          Can you repeat a subset of this in HadoopVersion also ?
          Hive still should compile and
          ...run with all versions. (17-20)

          3. usesJobShell: can you add more comments here – it is true for version 20 but not for 20 etc.
          4. This may be outside the scope of this - but should some unit tests run for hadoop17, and some for hadoop 20, as part of ant test.
          Currently, all of them use the default 19. As I mentioned before, this can be done in a follow-up also.

          Show
          Namit Jain added a comment - Overall, it looks good - but can you do a simple cosmetic changes. 1. Have 1 patch instead of different patches for jetty and 487 2. Add more comments: /** 29 Hadoop 17-19 used Jetty5. Hadoop 20 uses jetty6. Hive still should compile and ...run with all versions. 30 Java is strongly typed Class based language. The Reflection API is required to ...circumvent the strong 31 typing. We have used the reflection API to deal with the known versions of Jetty ...we must work with. 32 CS students: If you are ever in a debate about classless VS classful programming ...be sure to 33 reference this code. 34 */ is very good: Can you repeat a subset of this in HadoopVersion also ? Hive still should compile and ...run with all versions. (17-20) 3. usesJobShell: can you add more comments here – it is true for version 20 but not for 20 etc. 4. This may be outside the scope of this - but should some unit tests run for hadoop17, and some for hadoop 20, as part of ant test. Currently, all of them use the default 19. As I mentioned before, this can be done in a follow-up also.
          Hide
          Namit Jain added a comment -

          4. is not needed - we can enable 20 from hudson

          +1

          Can you add some more comments, and then I can commit it

          Show
          Namit Jain added a comment - 4. is not needed - we can enable 20 from hudson +1 Can you add some more comments, and then I can commit it
          Hide
          Todd Lipcon added a comment -

          A couple thoughts:

          • Does the same compiled jar truly work in all versions of Hadoop between 0.17 and 0.19? That is to say, can we consider an option in which we use some build.xml rules to, depending on the value of a hadoop.version variable, swap between two implementations of the same .java file (one compatible with Jetty 5, one with Jetty 6)? Then in the build product we could simply include two jars and have the wrapper scripts swap between them based on version. If size is a concern, the variant classes could be put in their own jar that would only be a few KB.
          • The reflection code in this patch is pretty messy. I mocked up an idea for a slightly cleaner way to do it, and will attach it as a tarball momentarily. The idea is to define our own interfaces which have the same methods as we need to use in Jetty, and use a dynamic proxy to forward those invocations through to the actual implementation class. Dynamically choosing between the two interfaces is simple at runtime by simply checking that the method signatures correspond. This is still dirty (and a bad role model for CS students ) but it should reduce the number of Class.forName and .getMethod calls in the wrapper class
          Show
          Todd Lipcon added a comment - A couple thoughts: Does the same compiled jar truly work in all versions of Hadoop between 0.17 and 0.19? That is to say, can we consider an option in which we use some build.xml rules to, depending on the value of a hadoop.version variable, swap between two implementations of the same .java file (one compatible with Jetty 5, one with Jetty 6)? Then in the build product we could simply include two jars and have the wrapper scripts swap between them based on version. If size is a concern, the variant classes could be put in their own jar that would only be a few KB. The reflection code in this patch is pretty messy. I mocked up an idea for a slightly cleaner way to do it, and will attach it as a tarball momentarily. The idea is to define our own interfaces which have the same methods as we need to use in Jetty, and use a dynamic proxy to forward those invocations through to the actual implementation class. Dynamically choosing between the two interfaces is simple at runtime by simply checking that the method signatures correspond. This is still dirty (and a bad role model for CS students ) but it should reduce the number of Class.forName and .getMethod calls in the wrapper class
          Hide
          Todd Lipcon added a comment -

          Here's a tarball showing the technique mentioned in the comment above. The script "run.sh" will compile and run the example once with "v1" on the classpath, and a second time with "v2" on the classpath. I'm not certain that this will cover all the cases that are needed for Jetty, but I figured I would throw it out there.

          Show
          Todd Lipcon added a comment - Here's a tarball showing the technique mentioned in the comment above. The script "run.sh" will compile and run the example once with "v1" on the classpath, and a second time with "v2" on the classpath. I'm not certain that this will cover all the cases that are needed for Jetty, but I figured I would throw it out there.
          Hide
          Edward Capriolo added a comment -

          @Todd - Where were you a few weeks ago?

          Then in the build product we could simply include two jars and have the wrapper scripts swap between them based on version
          

          The jars are upstream in Hadoop core. I did not look into this closely but the talk about 'Sealing exceptions' above led me to believe I should not try this.

          I have wrapped my head around most of your Dynamic Proxy idea. My only concern is will the ant process cooperate? Will eclipse think the HWI classes are broken? Can we translate your run.sh into something ant/eclipse can deal with?

          public class WebServer {
            public void someMethod(String arg) {
              System.out.println("Webserver v1: " + arg);
            }
          }
          

          I really don't want to have one 'someMethod' per each Jetty method. Just start(), stop(), init(). I like your implementation, but this is such a 'hacky' thing, I wonder is it worth thinking that hard? Hopefully the Jetty crew will be happy with their API for the next few years. Hopefully, we will not be supporting Hadoop 0.17.0 indefinitely. Honestly all that reflection has me 'burnt out'.

          If you/we can tackle the ant/eclipse issues I would be happy to use the 'Dynamic Proxy', but maybe we tackle it in a different Jira because this is a pretty big blocker and I am sure many people want to see this in the trunk.

          Show
          Edward Capriolo added a comment - @Todd - Where were you a few weeks ago? Then in the build product we could simply include two jars and have the wrapper scripts swap between them based on version The jars are upstream in Hadoop core. I did not look into this closely but the talk about 'Sealing exceptions' above led me to believe I should not try this. I have wrapped my head around most of your Dynamic Proxy idea. My only concern is will the ant process cooperate? Will eclipse think the HWI classes are broken? Can we translate your run.sh into something ant/eclipse can deal with? public class WebServer { public void someMethod(String arg) { System.out.println("Webserver v1: " + arg); } } I really don't want to have one 'someMethod' per each Jetty method. Just start(), stop(), init(). I like your implementation, but this is such a 'hacky' thing, I wonder is it worth thinking that hard? Hopefully the Jetty crew will be happy with their API for the next few years. Hopefully, we will not be supporting Hadoop 0.17.0 indefinitely. Honestly all that reflection has me 'burnt out'. If you/we can tackle the ant/eclipse issues I would be happy to use the 'Dynamic Proxy', but maybe we tackle it in a different Jira because this is a pretty big blocker and I am sure many people want to see this in the trunk.
          Hide
          Todd Lipcon added a comment -

          @Todd - Where were you a few weeks ago?

          Chillin' over on the HADOOP jira We're gearing up for release of our distribution that includes Hadoop 0.20.0, so just started watching this one more carefully.

          The jars are upstream in Hadoop core. I did not look into this closely but the talk about 'Sealing exceptions' above led me to believe I should not try this.

          Sorry, what I meant here is that the hive tarball would include lib/hive-0.4.0.jar, lib/jetty-shims/hive-jetty-shim-v6.jar and lib/jetty-shims/hive-jetty-shim-v5.jar

          In those jars we'd have two different implementations of the shim. The hive wrapper script would then do something like:

          HADOOP_JAR=$HADOOP_HOME/hadoop*core*jar
          if [[ $HADOOP_JAR =~ 0.1[789] ]]; then
            JETTY_SHIM=lib/jetty-shims/jetty-shim-v5.jar
          else
            JETTY_SHIM=lib/jetty-shims/jetty-shim-v6.jar
          fi
          CLASSPATH=$CLASSPATH:$JETTY_SHIM
          

          To generate the shim jars at compile time, we'd compile two different JettyShim.java files - one against the v5 API, and one against the v6 API.

          As for eclipse properly completing/warning for the right versions for the right files, I haven't the foggiest idea. But I am pretty sure it's not going to warn if your reflective calls are broken either

          My only concern is will the ant process cooperate?

          I don't see why not - my example build here is just to show how it works in a self contained way. The stuff inside v1-classes and v2-classes in the example are the equivalent of the two jetty jar versions - we don't have to compile them. The only code that has to compile is DynamicProxy.java which is completely normal code.

          If you/we can tackle the ant/eclipse issues I would be happy to use the 'Dynamic Proxy', but maybe we tackle it in a different Jira because this is a pretty big blocker and I am sure many people want to see this in the trunk.

          As for committing now and not worrying, that sounds pretty reasonable, as long as there's some kind of deprecation timeline set out. (e.g "in Hive 0.5.0 we will drop support for versions of Hadoop that use Jetty v5" or whatever). As someone who isn't a major Hive contributor, I'll defer to you guys completely – I just wanted to throw the idea up on the JIRA.

          Show
          Todd Lipcon added a comment - @Todd - Where were you a few weeks ago? Chillin' over on the HADOOP jira We're gearing up for release of our distribution that includes Hadoop 0.20.0, so just started watching this one more carefully. The jars are upstream in Hadoop core. I did not look into this closely but the talk about 'Sealing exceptions' above led me to believe I should not try this. Sorry, what I meant here is that the hive tarball would include lib/hive-0.4.0.jar, lib/jetty-shims/hive-jetty-shim-v6.jar and lib/jetty-shims/hive-jetty-shim-v5.jar In those jars we'd have two different implementations of the shim. The hive wrapper script would then do something like: HADOOP_JAR=$HADOOP_HOME/hadoop*core*jar if [[ $HADOOP_JAR =~ 0.1[789] ]]; then JETTY_SHIM=lib/jetty-shims/jetty-shim-v5.jar else JETTY_SHIM=lib/jetty-shims/jetty-shim-v6.jar fi CLASSPATH=$CLASSPATH:$JETTY_SHIM To generate the shim jars at compile time, we'd compile two different JettyShim.java files - one against the v5 API, and one against the v6 API. As for eclipse properly completing/warning for the right versions for the right files, I haven't the foggiest idea. But I am pretty sure it's not going to warn if your reflective calls are broken either My only concern is will the ant process cooperate? I don't see why not - my example build here is just to show how it works in a self contained way. The stuff inside v1-classes and v2-classes in the example are the equivalent of the two jetty jar versions - we don't have to compile them. The only code that has to compile is DynamicProxy.java which is completely normal code. If you/we can tackle the ant/eclipse issues I would be happy to use the 'Dynamic Proxy', but maybe we tackle it in a different Jira because this is a pretty big blocker and I am sure many people want to see this in the trunk. As for committing now and not worrying, that sounds pretty reasonable, as long as there's some kind of deprecation timeline set out. (e.g "in Hive 0.5.0 we will drop support for versions of Hadoop that use Jetty v5" or whatever). As someone who isn't a major Hive contributor, I'll defer to you guys completely – I just wanted to throw the idea up on the JIRA.
          Hide
          Joydeep Sen Sarma added a comment -

          need to add the shell script hack to switch the -libjars option as well based on the jar version.

          Show
          Joydeep Sen Sarma added a comment - need to add the shell script hack to switch the -libjars option as well based on the jar version.
          Hide
          Todd Lipcon added a comment -

          Here's a patch which adds a project called "shims" with separate source directories for 0.17, 0.18, 0.19, and 0.20. Inside each there is an implementation of JettyShims and HadoopShims which encapsulate all of the version-dependent code. The build.xml is set in such a way that $

          {hadoop.version}

          determines which one gets compiled.

          This probably needs a bit more javadoc before it's commitable, but I think it's worth considering this approach over reflection.

          Also, it seems like hadoop.version may be 0.18.0, 0.18.1, 0.18.2, etc. As long as it's kosher by Apache SVN standards, we should put a symlink for each of those versions in the shims/src/ directory pointing to 0.18, and same for the other minor releases. If symlinks aren't kosher, we need some way of parsing out the major version from within ant.

          Not being a regular contributor, I don't have a good test environment set up, but I've verified that this at least builds in all of the above versions.

          Show
          Todd Lipcon added a comment - Here's a patch which adds a project called "shims" with separate source directories for 0.17, 0.18, 0.19, and 0.20. Inside each there is an implementation of JettyShims and HadoopShims which encapsulate all of the version-dependent code. The build.xml is set in such a way that $ {hadoop.version} determines which one gets compiled. This probably needs a bit more javadoc before it's commitable, but I think it's worth considering this approach over reflection. Also, it seems like hadoop.version may be 0.18.0, 0.18.1, 0.18.2, etc. As long as it's kosher by Apache SVN standards, we should put a symlink for each of those versions in the shims/src/ directory pointing to 0.18, and same for the other minor releases. If symlinks aren't kosher, we need some way of parsing out the major version from within ant. Not being a regular contributor, I don't have a good test environment set up, but I've verified that this at least builds in all of the above versions.
          Hide
          Ashish Thusoo added a comment -

          I had added a GetVersionPref.java some time back to ant extensions in hive. It was later not used because we decided not to use preprocessing for 0.19 changes to validateInput and instead decided to rely on reflection. That can easily be resurrected.

          Let me look at this version as well. Also I am going to change this to a blocker as many people are waiting for this.

          Show
          Ashish Thusoo added a comment - I had added a GetVersionPref.java some time back to ant extensions in hive. It was later not used because we decided not to use preprocessing for 0.19 changes to validateInput and instead decided to rely on reflection. That can easily be resurrected. Let me look at this version as well. Also I am going to change this to a blocker as many people are waiting for this.
          Hide
          Ashish Thusoo added a comment -

          changing to a blocker.

          Show
          Ashish Thusoo added a comment - changing to a blocker.
          Hide
          Ashish Thusoo added a comment -

          Looks like a clean implementation. However, I do think that this will need some changes to the eclipse-templates to make it work with eclipse. We would want to conditionally add the src directory in shims corresponding to the proper version of hadoop to the eclipse launch templates in hive/eclipse-templates. Will try this out.

          Show
          Ashish Thusoo added a comment - Looks like a clean implementation. However, I do think that this will need some changes to the eclipse-templates to make it work with eclipse. We would want to conditionally add the src directory in shims corresponding to the proper version of hadoop to the eclipse launch templates in hive/eclipse-templates. Will try this out.
          Hide
          Ashish Thusoo added a comment -

          Seems to not compile with 0.17.0

          ant -Dhadoop.version=0.17.0 clean package ....

          [ivy:retrieve] 1 artifacts copied, 0 already retrieved (14101kB/79ms)

          install-hadoopcore-internal:
          [untar] Expanding: /data/users/athusoo/commits/hive_trunk_ws9/.ptest_0/build/hadoopcore/hadoop-0.17.0.tar.gz into /data/users/athusoo/commits/hive_trunk_ws9/.ptest_0/build/hadoopcore
          [touch] Creating /data/users/athusoo/commits/hive_trunk_ws9/.ptest_0/build/hadoopcore/hadoop-0.17.0.installed

          compile:
          [echo] Compiling: shims
          [javac] Compiling 2 source files to /data/users/athusoo/commits/hive_trunk_ws9/.ptest_0/build/shims/classes
          [javac] /data/users/athusoo/commits/hive_trunk_ws9/.ptest_0/shims/src/0.17.0/java/org/apache/hadoop/hive/shims/HadoopShims.java:48: cannot find symbol
          [javac] symbol : variable JobClient
          [javac] location: class org.apache.hadoop.hive.shims.HadoopShims
          [javac] Configuration conf = JobClient.getCommandLineConfig();
          [javac] ^
          [javac] 1 error

          Show
          Ashish Thusoo added a comment - Seems to not compile with 0.17.0 ant -Dhadoop.version=0.17.0 clean package .... [ivy:retrieve] 1 artifacts copied, 0 already retrieved (14101kB/79ms) install-hadoopcore-internal: [untar] Expanding: /data/users/athusoo/commits/hive_trunk_ws9/.ptest_0/build/hadoopcore/hadoop-0.17.0.tar.gz into /data/users/athusoo/commits/hive_trunk_ws9/.ptest_0/build/hadoopcore [touch] Creating /data/users/athusoo/commits/hive_trunk_ws9/.ptest_0/build/hadoopcore/hadoop-0.17.0.installed compile: [echo] Compiling: shims [javac] Compiling 2 source files to /data/users/athusoo/commits/hive_trunk_ws9/.ptest_0/build/shims/classes [javac] /data/users/athusoo/commits/hive_trunk_ws9/.ptest_0/shims/src/0.17.0/java/org/apache/hadoop/hive/shims/HadoopShims.java:48: cannot find symbol [javac] symbol : variable JobClient [javac] location: class org.apache.hadoop.hive.shims.HadoopShims [javac] Configuration conf = JobClient.getCommandLineConfig(); [javac] ^ [javac] 1 error
          Hide
          Todd Lipcon added a comment -

          Woops, sorry about that. Simply add an import for o.a.h.mapred.JobClient and it compiles. New patch in a second

          Show
          Todd Lipcon added a comment - Woops, sorry about that. Simply add an import for o.a.h.mapred.JobClient and it compiles. New patch in a second
          Hide
          Todd Lipcon added a comment -

          Fixes the missing import. Now compiles with hadoop.version=0.17.0

          Show
          Todd Lipcon added a comment - Fixes the missing import. Now compiles with hadoop.version=0.17.0
          Hide
          Ashish Thusoo added a comment -

          Modified Todd's patch so that it compiles cleanly with 0.20 and 0.17 as well. I have also added support in this patch to generate the proper eclipse files and have verified that this works with eclipse. Additionally the directory names within in shims have been renamed to 0.17, 0.18, ... 0.20 instead of 0.17.0, .. 0.20.0

          Please take a look at this. Want to get this in as soon as possible so that we can move ahead with the branching.

          Joy was mentioning that an additional change to the cli shell script needs to be made for -libjars support. Joy, can you elaborate on that?

          Show
          Ashish Thusoo added a comment - Modified Todd's patch so that it compiles cleanly with 0.20 and 0.17 as well. I have also added support in this patch to generate the proper eclipse files and have verified that this works with eclipse. Additionally the directory names within in shims have been renamed to 0.17, 0.18, ... 0.20 instead of 0.17.0, .. 0.20.0 Please take a look at this. Want to get this in as soon as possible so that we can move ahead with the branching. Joy was mentioning that an additional change to the cli shell script needs to be made for -libjars support. Joy, can you elaborate on that?
          Hide
          Todd Lipcon added a comment -

          Patch looks good for me (just inspected it visually over here)

          One question: once we use these shims, is it possible that we could have just a single hive distribution which works for all versions of Hadoop? I think we may be able to accomplish this by making the shim jar output be libs/shims/hive_shims-{$hadoop.version.prefix}.jar. Then either through ClassLoader magic or shell wrapper magic, we put the right one on the classpath at runtime based on which hadoop version is on the classpath.

          Is this possible? Having different tarballs of hive for different versions of hadoop makes our lives slightly difficult for packaging.

          Show
          Todd Lipcon added a comment - Patch looks good for me (just inspected it visually over here) One question: once we use these shims, is it possible that we could have just a single hive distribution which works for all versions of Hadoop? I think we may be able to accomplish this by making the shim jar output be libs/shims/hive_shims-{$hadoop.version.prefix}.jar. Then either through ClassLoader magic or shell wrapper magic, we put the right one on the classpath at runtime based on which hadoop version is on the classpath. Is this possible? Having different tarballs of hive for different versions of hadoop makes our lives slightly difficult for packaging.
          Hide
          Ashish Thusoo added a comment -

          It would be ideal if we can make a single jar work with different hadoop versions through classloader magic. There are also some things that are needed in the hive cli script which would have to be abstracted away through a configuration/envrionment variable. Lets try to do that for the long term, but get this in for 0.4.0. Does that sound reasonable?

          Show
          Ashish Thusoo added a comment - It would be ideal if we can make a single jar work with different hadoop versions through classloader magic. There are also some things that are needed in the hive cli script which would have to be abstracted away through a configuration/envrionment variable. Lets try to do that for the long term, but get this in for 0.4.0. Does that sound reasonable?
          Hide
          Todd Lipcon added a comment -

          Hi Ashish,

          That does sound reasonable, though I will likely take it on in the short term, as we will be distributing packages for hadoop-0.18 and hadoop-0.20 until the majority of the community and our customers have transitioned over. During that time period we'd like to have a single "hive" package which will function with either. We can apply my work on top of the 0.4.0 release for our distribution, so it shouldn't block it, but I do think it would be nice if this feature were "upstream" in the Apache release.

          I've got some time blocked off to work on this - if I get something working this week do you think it might be able to go into 0.4.0?

          -Todd

          Show
          Todd Lipcon added a comment - Hi Ashish, That does sound reasonable, though I will likely take it on in the short term, as we will be distributing packages for hadoop-0.18 and hadoop-0.20 until the majority of the community and our customers have transitioned over. During that time period we'd like to have a single "hive" package which will function with either. We can apply my work on top of the 0.4.0 release for our distribution, so it shouldn't block it, but I do think it would be nice if this feature were "upstream" in the Apache release. I've got some time blocked off to work on this - if I get something working this week do you think it might be able to go into 0.4.0? -Todd
          Hide
          Ashish Thusoo added a comment -

          sure. We can get it into 0.4.0. We can wait for your checkin before freezing on 0.4.0 but we would like to at least branch 0.4.0 this week (will have a vote out for it soon). All that means is that if we branch before your checkin, you will have to provide patches for trunk and 0.4.0. Is that ok?

          Show
          Ashish Thusoo added a comment - sure. We can get it into 0.4.0. We can wait for your checkin before freezing on 0.4.0 but we would like to at least branch 0.4.0 this week (will have a vote out for it soon). All that means is that if we branch before your checkin, you will have to provide patches for trunk and 0.4.0. Is that ok?
          Hide
          Todd Lipcon added a comment -

          Yep, that's fine. I'm a git user, branches don't faze me

          Show
          Todd Lipcon added a comment - Yep, that's fine. I'm a git user, branches don't faze me
          Hide
          Joydeep Sen Sarma added a comment -

          attached the previous version with modifications for cli.sh. these modifications are required (even though they don't fix the problem entirely - see below).

          the reason i am not able to fix the problem entirely is because -libjars is no longer processed automatically by RunJar. We have to convert CliDriver to implement 'Tool' interface for this to happen. this is easy - but i would rather not hold up things for that.

          I would suggest incorporating the patch as such - open a new jira for auxlibs/auxpath not working in 0.4/trunk and fix it there.

          Show
          Joydeep Sen Sarma added a comment - attached the previous version with modifications for cli.sh. these modifications are required (even though they don't fix the problem entirely - see below). the reason i am not able to fix the problem entirely is because -libjars is no longer processed automatically by RunJar. We have to convert CliDriver to implement 'Tool' interface for this to happen. this is easy - but i would rather not hold up things for that. I would suggest incorporating the patch as such - open a new jira for auxlibs/auxpath not working in 0.4/trunk and fix it there.
          Hide
          Joydeep Sen Sarma added a comment -

          on second thoughts - there's some code in execdriver that i need to call from CliDriver and things should work. will upload another one soon.

          Show
          Joydeep Sen Sarma added a comment - on second thoughts - there's some code in execdriver that i need to call from CliDriver and things should work. will upload another one soon.
          Hide
          Joydeep Sen Sarma added a comment -

          one more:

          • --auxpath works in both 19 and 20 now. auxlib should also work - i haven't tested it separately
          • removed -libjars for hadoop versions 20 and above from cli shell script. changes to CliDriver to add aux jars to classpath at runtime

          note that hive server and hwi don't work with auxpath/lib in 20 and above (since that would also require non trivial changes to HWIServer and HiveServer). we can fix this as a followon (in case someone is using the two in combination - which seems doubtful).

          please review changes to bin/ext/cli.sh and CliDriver

          Show
          Joydeep Sen Sarma added a comment - one more: --auxpath works in both 19 and 20 now. auxlib should also work - i haven't tested it separately removed -libjars for hadoop versions 20 and above from cli shell script. changes to CliDriver to add aux jars to classpath at runtime note that hive server and hwi don't work with auxpath/lib in 20 and above (since that would also require non trivial changes to HWIServer and HiveServer). we can fix this as a followon (in case someone is using the two in combination - which seems doubtful). please review changes to bin/ext/cli.sh and CliDriver
          Hide
          Namit Jain added a comment -

          [junit] Begin query: alter2.q
          [junit] diff -a -I (file:&#41;\|(/tmp/.*) /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/alter2.q.out /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/alter2.q.out
          [junit] Done query: alter2.q
          [junit] Begin query: alter3.q
          [junit] plan = /tmp/plan60193.xml
          [junit] java.lang.NoClassDefFoundError: org/apache/hadoop/hive/shims/HadoopShims
          [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.initializeFiles(ExecDriver.java:95)
          [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:358)
          [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:571)
          [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          [junit] at java.lang.reflect.Method.invoke(Method.java:597)
          [junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
          [junit] at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
          [junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
          [junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
          [junit] at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
          [junit] Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.shims.HadoopShims
          [junit] at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
          [junit] at java.security.AccessController.doPrivileged(Native Method)
          [junit] at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
          [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
          [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
          [junit] at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
          [junit] ... 12 more

          Most of the tests are failing

          Show
          Namit Jain added a comment - [junit] Begin query: alter2.q [junit] diff -a -I ( file:&#41;\ |(/tmp/.*) /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/alter2.q.out /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/alter2.q.out [junit] Done query: alter2.q [junit] Begin query: alter3.q [junit] plan = /tmp/plan60193.xml [junit] java.lang.NoClassDefFoundError: org/apache/hadoop/hive/shims/HadoopShims [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.initializeFiles(ExecDriver.java:95) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:358) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:571) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:165) [junit] at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) [junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) [junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) [junit] at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) [junit] Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.shims.HadoopShims [junit] at java.net.URLClassLoader$1.run(URLClassLoader.java:200) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at java.net.URLClassLoader.findClass(URLClassLoader.java:188) [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:306) [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:251) [junit] at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) [junit] ... 12 more Most of the tests are failing
          Hide
          Todd Lipcon added a comment -

          Taking a look at these failing tests now... any chance someone could hop on IRC in ##hive on freenode? I'm happy to do the work, but would appreciate having someone to hit with quick questions since I'm not too familiar with the code base.

          Show
          Todd Lipcon added a comment - Taking a look at these failing tests now... any chance someone could hop on IRC in ##hive on freenode? I'm happy to do the work, but would appreciate having someone to hit with quick questions since I'm not too familiar with the code base.
          Hide
          Todd Lipcon added a comment -

          The issue turned out to be that the shim classes weren't getting built into hive_exec.jar, which seems to include the built classes of many other of the components. I'm not entirely sure why this is designed like this (why not just have hive_exec.jar add the other jars to its own classloader at startup?) but including build/shims/classes in there fixed the tests. Attaching new patch momentarily

          Show
          Todd Lipcon added a comment - The issue turned out to be that the shim classes weren't getting built into hive_exec.jar, which seems to include the built classes of many other of the components. I'm not entirely sure why this is designed like this (why not just have hive_exec.jar add the other jars to its own classloader at startup?) but including build/shims/classes in there fixed the tests. Attaching new patch momentarily
          Hide
          Joydeep Sen Sarma added a comment -

          hive_exec is the one that's submitted to hadoop to execute the map-reduce jobs. so we bundle all the required classes in it up front.

          it could be done differently (using libjars) - but was the path of least resistance at the start.

          Show
          Joydeep Sen Sarma added a comment - hive_exec is the one that's submitted to hadoop to execute the map-reduce jobs. so we bundle all the required classes in it up front. it could be done differently (using libjars) - but was the path of least resistance at the start.
          Hide
          Todd Lipcon added a comment -

          Attaching a new patch which makes the shim behavior happen at runtime. Here's the general idea:

          • the shims/build.xml now uses Ivy to download tarballs for Hadoop 17, 18, 19, and 20. It builds each of the shim sources (from src/0.XX/), which have now been renamed so that each classname is unique (eg Hadoop20Shims.class).
          • The results of all of these builds end up in a single hive_shims.jar
          • Instead of being classes with all static methods, the shim classes are now non-static and are instantiated using ShimLoader.class, in a new shims/src/common/ directory
          • ShimLoader simply uses o.a.h.util.VersionInfo to determine the current version info, and reflection to instantiate the proper shims for the current version.

          I've tested this against pseudodistributed 18 and 20 clusters and it seemed to work. Unit tests also appear to work, though I haven't had a chance to let them run all the way through. I have not tested HWI at all as of yet.

          Still TODO:

          • I may have broken eclipse integration somewhat. I'm hoping someone who uses Eclipse can twiddle the necessary stuff there.
          • I would appreciate a review of the javadocs for the HadoopShims interface. I don't know the specifics of some of the 17 behavior, so my docs are lame and vague.
          • I think build.xml needs to be modified just a bit more so that the output directory/tarball no longer includes $ {hadoop.version}

            in it. Additionally there are one or two ant conditionals based on hadoop version - I haven't had a chance to investigate them, but they should probably be removed

          • I think we should have a policy that hadoop.version defaults to the most recently released apache trunk - right now it defaults to 0.19.
          • To compile the shims we're downloading the entire release tarballs off the apache mirror. Would be nicer if we could just download the specific jars we need to compile against, but that might be a pipe dream.
          Show
          Todd Lipcon added a comment - Attaching a new patch which makes the shim behavior happen at runtime. Here's the general idea: the shims/build.xml now uses Ivy to download tarballs for Hadoop 17, 18, 19, and 20. It builds each of the shim sources (from src/0.XX/), which have now been renamed so that each classname is unique (eg Hadoop20Shims.class). The results of all of these builds end up in a single hive_shims.jar Instead of being classes with all static methods, the shim classes are now non-static and are instantiated using ShimLoader.class, in a new shims/src/common/ directory ShimLoader simply uses o.a.h.util.VersionInfo to determine the current version info, and reflection to instantiate the proper shims for the current version. I've tested this against pseudodistributed 18 and 20 clusters and it seemed to work. Unit tests also appear to work, though I haven't had a chance to let them run all the way through. I have not tested HWI at all as of yet. Still TODO: I may have broken eclipse integration somewhat. I'm hoping someone who uses Eclipse can twiddle the necessary stuff there. I would appreciate a review of the javadocs for the HadoopShims interface. I don't know the specifics of some of the 17 behavior, so my docs are lame and vague. I think build.xml needs to be modified just a bit more so that the output directory/tarball no longer includes $ {hadoop.version} in it. Additionally there are one or two ant conditionals based on hadoop version - I haven't had a chance to investigate them, but they should probably be removed I think we should have a policy that hadoop.version defaults to the most recently released apache trunk - right now it defaults to 0.19. To compile the shims we're downloading the entire release tarballs off the apache mirror. Would be nicer if we could just download the specific jars we need to compile against, but that might be a pipe dream.
          Hide
          Joydeep Sen Sarma added a comment -

          hey - how do i apply this git diff using patch?

          Show
          Joydeep Sen Sarma added a comment - hey - how do i apply this git diff using patch?
          Hide
          Todd Lipcon added a comment -

          Normal patch -p0 ought to work:

          todd@todd-laptop:~/cloudera/cdh/repos/hive$ patch -p0 < /tmp/hive-487-runtime.patch
          patching file ant/build.xml
          patching file bin/ext/cli.sh
          patching file build-common.xml
          patching file build.xml
          etc...

          (from a clean trunk checkout)

          Show
          Todd Lipcon added a comment - Normal patch -p0 ought to work: todd@todd-laptop:~/cloudera/cdh/repos/hive$ patch -p0 < /tmp/hive-487-runtime.patch patching file ant/build.xml patching file bin/ext/cli.sh patching file build-common.xml patching file build.xml etc... (from a clean trunk checkout)
          Hide
          Joydeep Sen Sarma added a comment -

          my bad .. code looks pretty clean.

          one concern is the stuff that u mentioned already - that all hadoop versions need to be downloaded. in particular - sometime back i had made some fixes to allow hive to compile against a specific hadoop tree (see http://bit.ly/2d4Ch). but this would be reverting that i imagine.

          Show
          Joydeep Sen Sarma added a comment - my bad .. code looks pretty clean. one concern is the stuff that u mentioned already - that all hadoop versions need to be downloaded. in particular - sometime back i had made some fixes to allow hive to compile against a specific hadoop tree (see http://bit.ly/2d4Ch ). but this would be reverting that i imagine.
          Hide
          Namit Jain added a comment -

          Committed. Thanks Todd

          Show
          Namit Jain added a comment - Committed. Thanks Todd
          Hide
          Zheng Shao added a comment -

          When I try to start Hive with hadoop built by myself, I saw an exception:

          java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format)
          at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:101)
          at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:80)
          at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:62)
          at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:226)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
          at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
          at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)

          I guess I need to specify some Hadoop version information when compiling hadoop?

          Show
          Zheng Shao added a comment - When I try to start Hive with hadoop built by myself, I saw an exception: java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format) at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:101) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:80) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:62) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:226) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:166) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220) I guess I need to specify some Hadoop version information when compiling hadoop?
          Hide
          Todd Lipcon added a comment -

          Hi Zheng,

          There is some kind of bug I've seen before in Hadoop's build process where the version info doesn't get generated on your first compile. It's silly, but try running 'ant package' a second time in your Hadoop build tree? Running "hadoop version" should let you know whether the version info got compiled in.

          -Todd

          Show
          Todd Lipcon added a comment - Hi Zheng, There is some kind of bug I've seen before in Hadoop's build process where the version info doesn't get generated on your first compile. It's silly, but try running 'ant package' a second time in your Hadoop build tree? Running "hadoop version" should let you know whether the version info got compiled in. -Todd
          Hide
          Zheng Shao added a comment -

          By doing "ant ... package package" I am able to generate the hadoop distribution with version information, and Hive runs fine with it now! Thanks, Todd!

          Show
          Zheng Shao added a comment - By doing "ant ... package package" I am able to generate the hadoop distribution with version information, and Hive runs fine with it now! Thanks, Todd!

            People

            • Assignee:
              Justin Lynn
              Reporter:
              Aaron Kimball
            • Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development