Hive
  1. Hive
  2. HIVE-3017

hive-exec jar, contains classes from other modules(hive-serde, hive-shims, hive-common etc) duplicating those classes in two jars

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      HIVE-2646 added the jars from hive-serde to the hive-exec class:

      ...
           0 Wed May 09 20:56:30 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/
        1971 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/ListTypeInfo.class
        2396 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/MapTypeInfo.class
        2788 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/PrimitiveTypeInfo.class
        4408 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/StructTypeInfo.class
         900 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.class
        6576 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/TypeInfoFactory.class
        1231 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils$1.class
        1239 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils$TypeInfoParser$Token.class
        7145 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils$TypeInfoParser.class
       14482 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.class
        2594 Wed May 09 20:56:28 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/UnionTypeInfo.class
         144 Wed May 09 20:56:30 PDT 2012 org/apache/hadoop/hive/serde2/typeinfo/package-info.class
      ...

      Was this intentional? If so, the serde jar should be deprecated. If not, the serde classes should be removed since this creates two sources of truth for them and can cause other problems (see HCATALOG-407).

        Issue Links

          Activity

          Hide
          Ashutosh Chauhan added a comment -

          I agree having same classes in two different jars is confusing. I don't think it was intentional in HIVE-2646 work to change the content of jar files. I think we should revert to original contents and thus remove any classes which got added in exec jar which were not there earlier.

          Show
          Ashutosh Chauhan added a comment - I agree having same classes in two different jars is confusing. I don't think it was intentional in HIVE-2646 work to change the content of jar files. I think we should revert to original contents and thus remove any classes which got added in exec jar which were not there earlier.
          Hide
          Travis Crawford added a comment -

          I just checked the contents of hive-exec-0.8.0.jar which is the earliest version available in Maven. HIVE-2646 has fix version 0.9.1.

          Looking in the 0.8.0 jar we do see the serde2 classes listed above. So it seems they've been bundled into hive-exec for some time.

          While providing the same classes in two jars is janky, I don't think this behavior has changed in a while. With that in mind, think we should close this? At least you can depend on hive-serde2 directly if you want (unlike ql which are only in the exec jar).

          Show
          Travis Crawford added a comment - I just checked the contents of hive-exec-0.8.0.jar which is the earliest version available in Maven. HIVE-2646 has fix version 0.9.1. Looking in the 0.8.0 jar we do see the serde2 classes listed above. So it seems they've been bundled into hive-exec for some time. While providing the same classes in two jars is janky, I don't think this behavior has changed in a while. With that in mind, think we should close this? At least you can depend on hive-serde2 directly if you want (unlike ql which are only in the exec jar).
          Hide
          Jakob Homan added a comment -

          Something this bad should be fixed. It's particularly vexing when trying to develop against Hive and needing to update two jars for one class. Fat-jarring (which this esesentially is) is evil. If the intention is to provide a convenient package for deployment, can't that be done in maven by declaring a meta project with the other jars?

          Show
          Jakob Homan added a comment - Something this bad should be fixed. It's particularly vexing when trying to develop against Hive and needing to update two jars for one class. Fat-jarring (which this esesentially is) is evil. If the intention is to provide a convenient package for deployment, can't that be done in maven by declaring a meta project with the other jars?
          Hide
          Arup Malakar added a comment -

          I figured that hive-exec also contains all the classes from hive-shims-0.10.0-SNAPSHOT.jar, hive-common-0.10.0-SNAPSHOT.jar etc.
          I haven't checked what else it duplicates. Having the same class present in two jars lead to confusion and is more error prone. In case we have both the jars in classpath and they don't contain the same version of a class, wouldn't know which class got used.

          (Edited the title)

          Show
          Arup Malakar added a comment - I figured that hive-exec also contains all the classes from hive-shims-0.10.0-SNAPSHOT.jar, hive-common-0.10.0-SNAPSHOT.jar etc. I haven't checked what else it duplicates. Having the same class present in two jars lead to confusion and is more error prone. In case we have both the jars in classpath and they don't contain the same version of a class, wouldn't know which class got used. (Edited the title)
          Hide
          Edward Capriolo added a comment -

          The core issue is that hive needs things to work. Example hive might use CommonUtil.String.isEmpty() from 2.5. Well some user code might require CommonUtil.String.isEmpty() from 2.6. The only full proof solution is that hive will have to shade (or rename every class or) everything it uses so it will not possibly conflict with anything in the world.

          For example, I use hive+cassandra both have different versions of thrift at the moment, both have different versions of antlr as well. The only answer is upgrade one projects so the libs match or make a fat jar and rename all the conflicts.

          These options have come up and been discussed before.

          Show
          Edward Capriolo added a comment - The core issue is that hive needs things to work. Example hive might use CommonUtil.String.isEmpty() from 2.5. Well some user code might require CommonUtil.String.isEmpty() from 2.6. The only full proof solution is that hive will have to shade (or rename every class or) everything it uses so it will not possibly conflict with anything in the world. For example, I use hive+cassandra both have different versions of thrift at the moment, both have different versions of antlr as well. The only answer is upgrade one projects so the libs match or make a fat jar and rename all the conflicts. These options have come up and been discussed before.
          Hide
          Travis Crawford added a comment -

          From my perspective I don't see an issue with hive-exec.jar being a fat jar, since it really simplifies running Hive queries. With all dependencies repackaged as a fat jar there's nothing else to "add jar" or put on your classpath.

          It is challenging to build tools on top of Hive though because of the fat jar classpath surprises. The area I think could be improved is publishing jars of each subproject independently, as well as hive-exec. For example, I think users would get a lot of value from publishing hive-ql.jar for use by other project that integrate with Hive.

          Hive queries would continue to work as they do today, and people building tools on top of Hive could also control their classpath. Thoughts?

          Show
          Travis Crawford added a comment - From my perspective I don't see an issue with hive-exec.jar being a fat jar, since it really simplifies running Hive queries. With all dependencies repackaged as a fat jar there's nothing else to "add jar" or put on your classpath. It is challenging to build tools on top of Hive though because of the fat jar classpath surprises. The area I think could be improved is publishing jars of each subproject independently, as well as hive-exec. For example, I think users would get a lot of value from publishing hive-ql.jar for use by other project that integrate with Hive. Hive queries would continue to work as they do today, and people building tools on top of Hive could also control their classpath. Thoughts?
          Hide
          Edward Capriolo added a comment -

          It is more then just a fat jar issue. If you are living in the same class loader you can only load a class once. So if hive-ql uses common-util-X.Y.Z and some other piece of code uses commun-util-X.Y.G and there is a breaking change in common-util you can not satisfy that.

          Now as to the fat jar issue. You have a choice fat jar or use hadoops -distjar. Either produces the same result all the jars are on the classpath.

          I am not saying there is no way out of this problem, but it is like this purposely. Other solutions have been proposed but there is no follow through. Some other tickets talk about this but I do not have the number off hand.

          Show
          Edward Capriolo added a comment - It is more then just a fat jar issue. If you are living in the same class loader you can only load a class once. So if hive-ql uses common-util-X.Y.Z and some other piece of code uses commun-util-X.Y.G and there is a breaking change in common-util you can not satisfy that. Now as to the fat jar issue. You have a choice fat jar or use hadoops -distjar. Either produces the same result all the jars are on the classpath. I am not saying there is no way out of this problem, but it is like this purposely. Other solutions have been proposed but there is no follow through. Some other tickets talk about this but I do not have the number off hand.

            People

            • Assignee:
              Unassigned
              Reporter:
              Jakob Homan
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development