Hive
  1. Hive
  2. HIVE-574

Hive should use ClassLoader from hadoop Configuration

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.3.0
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None

      Description

      See HIVE-338.
      Hive should always use the getClassByName method from hadoop Configuration, so that we choose the correct ClassLoader. Examples include all plug-in interfaces, including UDF/GenericUDF/UDAF, SerDe, and FileFormats. Basically the following code snippet shows the idea:

      package org.apache.hadoop.conf;
      public class Configuration implements Iterable<Map.Entry<String,String>> {
         ...
        /**
         * Load a class by name.
         * 
         * @param name the class name.
         * @return the class object.
         * @throws ClassNotFoundException if the class is not found.
         */
        public Class<?> getClassByName(String name) throws ClassNotFoundException {
          return Class.forName(name, true, classLoader);
        }
      
      1. HIVE-574.3.patch
        20 kB
        Zheng Shao
      2. HIVE-574.2.patch
        18 kB
        Zheng Shao
      3. HIVE-574.1.patch
        9 kB
        Zheng Shao

        Issue Links

          Activity

          Hide
          He Yongqiang added a comment -

          For adding EnumSet support in Hadoop, i added a method loadClass(Configuration conf, String className) in ObjectWritable.
          <noformat>
          /**

          • Find and load the class with given name <tt>className</tt> by first finding
          • it in the specified <tt>conf</tt>. If the specified <tt>conf</tt> is null,
          • try load it directly.
            */
            public static Class<?> loadClass(Configuration conf, String className)
            Unknown macro: { Class<?> declaredClass = null; try { if (conf != null) declaredClass = conf.getClassByName(className); else declaredClass = Class.forName(className); } catch (ClassNotFoundException e) { throw new RuntimeException("readObject can't find class " + className, e); } return declaredClass; }

            </noformat>
            I donot know if it can be used here.

          Show
          He Yongqiang added a comment - For adding EnumSet support in Hadoop, i added a method loadClass(Configuration conf, String className) in ObjectWritable. <noformat> /** Find and load the class with given name <tt>className</tt> by first finding it in the specified <tt>conf</tt>. If the specified <tt>conf</tt> is null, try load it directly. */ public static Class<?> loadClass(Configuration conf, String className) Unknown macro: { Class<?> declaredClass = null; try { if (conf != null) declaredClass = conf.getClassByName(className); else declaredClass = Class.forName(className); } catch (ClassNotFoundException e) { throw new RuntimeException("readObject can't find class " + className, e); } return declaredClass; } </noformat> I donot know if it can be used here.
          Hide
          Zheng Shao added a comment -

          This patch uses the ClassLoader from all paths sets the class loader in most cases.
          It also adds the "addedJars" to HIVEADDEDJARS so that ExecDriver can add it into the ClassLoader for both the ClassLoader in the conf and the thread contextClassLoader.

          Show
          Zheng Shao added a comment - This patch uses the ClassLoader from all paths sets the class loader in most cases. It also adds the "addedJars" to HIVEADDEDJARS so that ExecDriver can add it into the ClassLoader for both the ClassLoader in the conf and the thread contextClassLoader.
          Hide
          Joydeep Sen Sarma added a comment -
          • there's a bunch of places where JavaUtils.getClassLoader() is called - all of these need to use the conf classloader instead.
          • where are we invoking conf.setClassLoader? (addToClassPath should be calling this it seems)

          if conf.set/getClassLoader is supported in all versions 17 onwards - then we can just remove the -libjars business - since that doesn't help us any bit now (ExecDriver is not a Tool - so it doesn't even get the conf with the classpath set that a Tool would normally get by specifying the -libjars option).

          Show
          Joydeep Sen Sarma added a comment - there's a bunch of places where JavaUtils.getClassLoader() is called - all of these need to use the conf classloader instead. where are we invoking conf.setClassLoader? (addToClassPath should be calling this it seems) if conf.set/getClassLoader is supported in all versions 17 onwards - then we can just remove the -libjars business - since that doesn't help us any bit now (ExecDriver is not a Tool - so it doesn't even get the conf with the classpath set that a Tool would normally get by specifying the -libjars option).
          Hide
          Zheng Shao added a comment -

          This patch fixed a bug in ExecDriver.initialize where we were setting "tmpjars", which were overwritten by ExecDriver.execute().

          > there's a bunch of places where JavaUtils.getClassLoader() is called - all of these need to use the conf classloader instead.
          I tried to change all of these to use conf classloader, but in lots of places there are no shared conf objects, so I will leave it for now. I changed the comment so it's clear that in hive (outside hadoop map-reduce jobs) we are using thread classloader, while in hadoop map-reduce jobs we are using conf classloaders.

          > where are we invoking conf.setClassLoader? (addToClassPath should be calling this it seems)
          I moved that logic outside of the addToClassPath function. All callers will do that set.

          > if conf.set/getClassLoader is supported in all versions 17 onwards - then we can just remove the -libjars business - since that doesn't help us any bit now (ExecDriver is not a Tool - so it doesn't even get the conf with the classpath set that a Tool would normally get by specifying the -libjars option).

          Maybe in the future we should move ExecDriver to a tool? I think we should rely more on hadoop to set the classpath for us, so maybe in the future we should remove the special "if (local)" block in ExecDriver where we set the classpath ourselves.

          Show
          Zheng Shao added a comment - This patch fixed a bug in ExecDriver.initialize where we were setting "tmpjars", which were overwritten by ExecDriver.execute(). > there's a bunch of places where JavaUtils.getClassLoader() is called - all of these need to use the conf classloader instead. I tried to change all of these to use conf classloader, but in lots of places there are no shared conf objects, so I will leave it for now. I changed the comment so it's clear that in hive (outside hadoop map-reduce jobs) we are using thread classloader, while in hadoop map-reduce jobs we are using conf classloaders. > where are we invoking conf.setClassLoader? (addToClassPath should be calling this it seems) I moved that logic outside of the addToClassPath function. All callers will do that set. > if conf.set/getClassLoader is supported in all versions 17 onwards - then we can just remove the -libjars business - since that doesn't help us any bit now (ExecDriver is not a Tool - so it doesn't even get the conf with the classpath set that a Tool would normally get by specifying the -libjars option). Maybe in the future we should move ExecDriver to a tool? I think we should rely more on hadoop to set the classpath for us, so maybe in the future we should remove the special "if (local)" block in ExecDriver where we set the classpath ourselves.
          Hide
          Zheng Shao added a comment -

          Make ExecMapper and ExecReducer print out class path with l4j.info(), to ease debugging of classpaths in the future.

          Show
          Zheng Shao added a comment - Make ExecMapper and ExecReducer print out class path with l4j.info(), to ease debugging of classpaths in the future.
          Hide
          Joydeep Sen Sarma added a comment -

          we should always be able to get a Conf from our session (SessionState.get()) from any part of the code. the same conf would be used throughout a session.

          Show
          Joydeep Sen Sarma added a comment - we should always be able to get a Conf from our session (SessionState.get()) from any part of the code. the same conf would be used throughout a session.
          Hide
          Min Zhou added a comment -

          +1 for Zheng, thanks
          It worked fine here, nothing abnormal.

          Show
          Min Zhou added a comment - +1 for Zheng, thanks It worked fine here, nothing abnormal.
          Hide
          Joydeep Sen Sarma added a comment -

          hi zheng - i would prefer to add a getClassByName to JavaUtils (replace getClassLoader) and then use the Session Configuration object's getclassbyname (SessionState.get().getConf()) if it's available - or else default to thread classloader or else finally default to current classloader.

          i feel very uncomfortable that right now we are sometimes using the conf classloader and sometimes using the thread classloader. it's sheer chance that all the uses of JavaUtils.getClassLoader() are happening within the same thread as the thread that adds jar files (or the main thread of the ExecDriver). This can change anytime tomorrow.

          Show
          Joydeep Sen Sarma added a comment - hi zheng - i would prefer to add a getClassByName to JavaUtils (replace getClassLoader) and then use the Session Configuration object's getclassbyname (SessionState.get().getConf()) if it's available - or else default to thread classloader or else finally default to current classloader. i feel very uncomfortable that right now we are sometimes using the conf classloader and sometimes using the thread classloader. it's sheer chance that all the uses of JavaUtils.getClassLoader() are happening within the same thread as the thread that adds jar files (or the main thread of the ExecDriver). This can change anytime tomorrow.
          Hide
          Zheng Shao added a comment -

          @Joydeep,

          I will use SessionState.get().getConf().getClassLoader() for all compilation time code and ExecDriver (JobClient side) code. For execution time code (Mapper and Reducer) I will use the conf from Mapper/Reducer.configure() method.

          Is that good?

          Show
          Zheng Shao added a comment - @Joydeep, I will use SessionState.get().getConf().getClassLoader() for all compilation time code and ExecDriver (JobClient side) code. For execution time code (Mapper and Reducer) I will use the conf from Mapper/Reducer.configure() method. Is that good?
          Hide
          Joydeep Sen Sarma added a comment -

          yes - makes total sense.

          Show
          Joydeep Sen Sarma added a comment - yes - makes total sense.
          Hide
          Zheng Shao added a comment -

          I was trying to make the changes and found some other problems.

          The SessionState is only available in ql, while JavaUtils is in common right now. SessionState depends on a bunch of Hive ql classes so I cannot move it to common, and JavaUtils is called from serde, metastore, and ql.

          I will change both serde and metastore methods to require hadoop configuration, so that they can get the classloader from it.

          I will also move JavaUtils to ql.

          Show
          Zheng Shao added a comment - I was trying to make the changes and found some other problems. The SessionState is only available in ql, while JavaUtils is in common right now. SessionState depends on a bunch of Hive ql classes so I cannot move it to common, and JavaUtils is called from serde, metastore, and ql. I will change both serde and metastore methods to require hadoop configuration, so that they can get the classloader from it. I will also move JavaUtils to ql.
          Hide
          Zheng Shao added a comment -

          I couldn't figure out how to do the job cleanly after 6 hours of debugging. I would suggest to commit the current patch and open a jira for follow-up.

          We need a major refactoring of the use of HiveConf in the code.

          Show
          Zheng Shao added a comment - I couldn't figure out how to do the job cleanly after 6 hours of debugging. I would suggest to commit the current patch and open a jira for follow-up. We need a major refactoring of the use of HiveConf in the code.
          Hide
          Zheng Shao added a comment -

          Let's get this in first since it fixes the bug.

          The follow-up JIRA is HIVE-584.

          Show
          Zheng Shao added a comment - Let's get this in first since it fixes the bug. The follow-up JIRA is HIVE-584 .
          Hide
          Joydeep Sen Sarma added a comment -

          changes look good! will commit after tests.

          Show
          Joydeep Sen Sarma added a comment - changes look good! will commit after tests.
          Hide
          Joydeep Sen Sarma added a comment -

          committed - thanks Zheng!

          Show
          Joydeep Sen Sarma added a comment - committed - thanks Zheng!

            People

            • Assignee:
              Zheng Shao
              Reporter:
              Zheng Shao
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development