Hive
  1. Hive
  2. HIVE-338

Executing cli commands into thrift server

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Incomplete
    • Affects Version/s: 0.3.0
    • Fix Version/s: 0.4.0
    • Component/s: Server Infrastructure
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      - a'dd/delete jar' commands to add/remove jars from classpath used during query compilation and execution
      - cli commands (set/add/delete/dfs) commands are now executable on hive server
      Show
      - a'dd/delete jar' commands to add/remove jars from classpath used during query compilation and execution - cli commands (set/add/delete/dfs) commands are now executable on hive server

      Description

      Let thrift server support set, add/delete file/jar and normal HSQL query.

      1. hiveserver-v1.patch
        11 kB
        Min Zhou
      2. hiveserver-v2.patch
        22 kB
        Min Zhou
      3. hiveserver-v3.patch
        42 kB
        Joydeep Sen Sarma
      4. hive-338.final.patch
        55 kB
        Joydeep Sen Sarma
      5. HIVE-338.postfix.1.patch
        3 kB
        Zheng Shao

        Issue Links

          Activity

          Hide
          Min Zhou added a comment - - edited

          support add file/jar now

          python example

          #!/usr/bin/env python
          
          import sys
          
          from hive import ThriftHive
          from hive.ttypes import HiveServerException
          from thrift import Thrift
          from thrift.transport import TSocket
          from thrift.transport import TTransport
          from thrift.protocol import TBinaryProtocol
          
          try:
              transport = TSocket.TSocket('localhost', 10000)
              transport = TTransport.TBufferedTransport(transport)
              protocol = TBinaryProtocol.TBinaryProtocol(transport)
          
              client = ThriftHive.Client(protocol)
              transport.open()
          
              client.execute('ADD FILE /home/zhoumin/py/foo')
              client.execute('ADD FILE /home/zhoumin/py/streaming.py')
              query = '''
                  INSERT OVERWRITE TABLE streaming_pokes
                  MAP (pokes.foo, pokes.bar)         
                      USING 'streaming.py'
                  AS new_foo, new_bar
                  FROM pokes                                '''
          
              client.execute(query)
              row = client.fetchOne()
              print row
              
              transport.close()
          
          except Thrift.TException, tx:
              print '%s' % (tx.message)
          
          

          java example:

          package zhoumin.example;
          
          import java.util.ArrayList;
          import java.util.concurrent.Callable;
          import java.util.concurrent.ExecutionException;
          import java.util.concurrent.ExecutorService;
          import java.util.concurrent.Executors;
          import java.util.concurrent.Future;
          
          import org.apache.hadoop.hive.metastore.api.MetaException;
          import org.apache.hadoop.hive.service.HiveClient;
          import org.apache.hadoop.hive.service.HiveServerException;
          
          import com.facebook.thrift.TException;
          import com.facebook.thrift.protocol.TBinaryProtocol;
          import com.facebook.thrift.protocol.TProtocol;
          import com.facebook.thrift.transport.TSocket;
          import com.facebook.thrift.transport.TTransport;
          import com.facebook.thrift.transport.TTransportException;
          
          
          public class MyClient {
            public static final int THREADS_NUMBER = 10;
            
            public static class Worker implements Callable<String> {
              
              TTransport transport;
              TProtocol protocol;
              HiveClient client;
              
              public Worker() {
                transport = new TSocket("localhost", 10000);
                protocol = new TBinaryProtocol(transport);
                client = new HiveClient(protocol); 
              }
          
              public String call() throws Exception {
                transport.open();
                client.execute("add jar /home/zhoumin//hadoop/mapreduce/zhoumin/dist/zhoumin-0.00.1.jar");
                client.execute("CREATE TEMPORARY FUNCTION strlen AS 'hadoop.hive.udf.UdfStringLength'");
                client.execute("select strlen(mid) from log_data");
                String row = client.fetchOne();
                transport.close();
                return row;
              }
              
            }
            
            public static void main(String[] args) throws TTransportException,
                TException, HiveServerException, MetaException {
              ExecutorService exec = Executors.newCachedThreadPool();
              
              ArrayList<Future<String>> results = new ArrayList<Future<String>>();
              for(int i = 0; i < THREADS_NUMBER; i++) {
                results.add(exec.submit(new Worker()));
              }
              
              for(Future<String> fs : results) {
                try {
                  System.out.println(fs.get());
                }catch (InterruptedException e) {
                  System.out.println(e);
                } catch(ExecutionException e) {
                  System.out.println(e);
                } finally {
                  exec.shutdown();
                }
              }
            }
          }
          

          the add jar command is also supported on CLI now.

          Show
          Min Zhou added a comment - - edited support add file/jar now python example #!/usr/bin/env python import sys from hive import ThriftHive from hive.ttypes import HiveServerException from thrift import Thrift from thrift.transport import TSocket from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol try : transport = TSocket.TSocket('localhost', 10000) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = ThriftHive.Client(protocol) transport.open() client.execute('ADD FILE /home/zhoumin/py/foo') client.execute('ADD FILE /home/zhoumin/py/streaming.py') query = ''' INSERT OVERWRITE TABLE streaming_pokes MAP (pokes.foo, pokes.bar) USING 'streaming.py' AS new_foo, new_bar FROM pokes ''' client.execute(query) row = client.fetchOne() print row transport.close() except Thrift.TException, tx: print '%s' % (tx.message) java example: package zhoumin.example; import java.util.ArrayList; import java.util.concurrent.Callable; import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import org.apache.hadoop.hive.metastore.api.MetaException; import org.apache.hadoop.hive.service.HiveClient; import org.apache.hadoop.hive.service.HiveServerException; import com.facebook.thrift.TException; import com.facebook.thrift.protocol.TBinaryProtocol; import com.facebook.thrift.protocol.TProtocol; import com.facebook.thrift.transport.TSocket; import com.facebook.thrift.transport.TTransport; import com.facebook.thrift.transport.TTransportException; public class MyClient { public static final int THREADS_NUMBER = 10; public static class Worker implements Callable< String > { TTransport transport; TProtocol protocol; HiveClient client; public Worker() { transport = new TSocket( "localhost" , 10000); protocol = new TBinaryProtocol(transport); client = new HiveClient(protocol); } public String call() throws Exception { transport.open(); client.execute( "add jar /home/zhoumin //hadoop/mapreduce/zhoumin/dist/zhoumin-0.00.1.jar" ); client.execute( "CREATE TEMPORARY FUNCTION strlen AS 'hadoop.hive.udf.UdfStringLength'" ); client.execute( "select strlen(mid) from log_data" ); String row = client.fetchOne(); transport.close(); return row; } } public static void main( String [] args) throws TTransportException, TException, HiveServerException, MetaException { ExecutorService exec = Executors.newCachedThreadPool(); ArrayList<Future< String >> results = new ArrayList<Future< String >>(); for ( int i = 0; i < THREADS_NUMBER; i++) { results.add(exec.submit( new Worker())); } for (Future< String > fs : results) { try { System .out.println(fs.get()); } catch (InterruptedException e) { System .out.println(e); } catch (ExecutionException e) { System .out.println(e); } finally { exec.shutdown(); } } } } the add jar command is also supported on CLI now.
          Hide
          Joydeep Sen Sarma added a comment -

          some comments:

          HiveServer.java:

          • where is 'sp' constructed?
          • can you encapsulate the 'add'/'delete'/'list' commands in a new processor and call that from both CliDriver and HiveServer. Even though the logic is trivial - duplicating code sucks.
          • SessionState.java: addToClassPath() - this looks like the same as the one in ExecDriver.java - can you just make the latter public static and invoke that
          • metadata/Hive.java: can you tell why this change was made?
          • exec/FunctionTask.java: is it necessary to specify the loader in the Class.forName call? I thought that that the current thread context loader was the always the first loader to be tried anyway during name resolution.
          • This is missing one change in MapRedTask.java - take a look at the execute() that generates a command line that executes ExecDriver in a separate jvm (we use this mode in tests) - here we are setting -libjars option and this needs to add the ones from the jar resources as well.
          • One problem is that this will not work for hadoop-17 (at least local mode) - see ExecDriver:main() - where addToClassPath is invoked on auxjars as a workaround for hadoop-17. this would need to be done for other jars added via 'add jar' as well - except there would be no way to do this unless the list of jar file resources was also passed in as a conf variable.
          • Related point - we need a test for this. some dummy udf in a separate jar file that is added and then invoked from a query would be great (and would have revealed the above two issues).
          • finally - 'delete jar' doesn't seem to get rid of the jar from the classpath. perhaps this was not required at this time - but would be good to add this just for sake of completeness. The delete resource codepath is missing a callback (hook) - that would need to be added as well.

          thanks for taking this on - too many small hadoop related complexities here ..

          Show
          Joydeep Sen Sarma added a comment - some comments: HiveServer.java: where is 'sp' constructed? can you encapsulate the 'add'/'delete'/'list' commands in a new processor and call that from both CliDriver and HiveServer. Even though the logic is trivial - duplicating code sucks. SessionState.java: addToClassPath() - this looks like the same as the one in ExecDriver.java - can you just make the latter public static and invoke that metadata/Hive.java: can you tell why this change was made? exec/FunctionTask.java: is it necessary to specify the loader in the Class.forName call? I thought that that the current thread context loader was the always the first loader to be tried anyway during name resolution. This is missing one change in MapRedTask.java - take a look at the execute() that generates a command line that executes ExecDriver in a separate jvm (we use this mode in tests) - here we are setting -libjars option and this needs to add the ones from the jar resources as well. One problem is that this will not work for hadoop-17 (at least local mode) - see ExecDriver:main() - where addToClassPath is invoked on auxjars as a workaround for hadoop-17. this would need to be done for other jars added via 'add jar' as well - except there would be no way to do this unless the list of jar file resources was also passed in as a conf variable. Related point - we need a test for this. some dummy udf in a separate jar file that is added and then invoked from a query would be great (and would have revealed the above two issues). finally - 'delete jar' doesn't seem to get rid of the jar from the classpath. perhaps this was not required at this time - but would be good to add this just for sake of completeness. The delete resource codepath is missing a callback (hook) - that would need to be added as well. thanks for taking this on - too many small hadoop related complexities here ..
          Hide
          Min Zhou added a comment -

          oops, Some mistakes was made when migrate code from another repository.
          I was considering running CLI as client of thrift server, thus junior user can run sql every where, that would not be limited execution on the server node not opened to anybody. Thrift server would run like a multi-user interface, cli clients could respectively sumbit commands with different privileges, they can do set , add/delete commands, even upload resource download results if it's too huge to display .

          I'll upgrade this patch for solving problem you metioned later.

          Show
          Min Zhou added a comment - oops, Some mistakes was made when migrate code from another repository. I was considering running CLI as client of thrift server, thus junior user can run sql every where, that would not be limited execution on the server node not opened to anybody. Thrift server would run like a multi-user interface, cli clients could respectively sumbit commands with different privileges, they can do set , add/delete commands, even upload resource download results if it's too huge to display . metadata/Hive.java: can you tell why this change was made? pls see https://issues.apache.org/jira/browse/HIVE-324 I'll upgrade this patch for solving problem you metioned later.
          Hide
          Joydeep Sen Sarma added a comment -

          can we complete this one? or at least get the changes required to HiveServer in as part of some other Jira? (since the server is pretty broken without these thread related fixes)

          Show
          Joydeep Sen Sarma added a comment - can we complete this one? or at least get the changes required to HiveServer in as part of some other Jira? (since the server is pretty broken without these thread related fixes)
          Hide
          Min Zhou added a comment -
          1. I thought we need a more efficient RPC rather than thrift. Lots of tries has been done here, and reflected it's not very suitable for a multi-user server.
          2. SesstionState in hive must an abusage of ThreadLocal by treating its thread confinement property as a license to use global variables or as a means of creating "hidden" method arguments. Like global variables, thread-local variables can detract from reusability and introduce hidden.
          Show
          Min Zhou added a comment - I thought we need a more efficient RPC rather than thrift. Lots of tries has been done here, and reflected it's not very suitable for a multi-user server. SesstionState in hive must an abusage of ThreadLocal by treating its thread confinement property as a license to use global variables or as a means of creating "hidden" method arguments. Like global variables, thread-local variables can detract from reusability and introduce hidden.
          Hide
          Joydeep Sen Sarma added a comment -

          there's a singleton sessionstate per thread that captures session settings. thread can switch from one session to another.

          yes - the variables in SessionState are pseudo global variables - but this is easier to organize than passing those properties to each and every method. Do you consider the System object in Java to be an abuse of global variables? Or do you consider environment variables and system properties available globally to processes as things that prevent reuse? If you look at the things that are part of sessionstate - they are similar in nature (input/output streams, list of systemwide resources and so on).

          this is of course open-source and re-writes are as welcome as any other contribution. i have not been a heavy user of thrift myself so cannot comment on its speed. it would be easy enough to write more rpc layers on top of hive if required. however - the current problems in the threaded server code need immediate fixing since users are trying to use it right now. probably we will have to do this as part of a separate jira.

          Show
          Joydeep Sen Sarma added a comment - there's a singleton sessionstate per thread that captures session settings. thread can switch from one session to another. yes - the variables in SessionState are pseudo global variables - but this is easier to organize than passing those properties to each and every method. Do you consider the System object in Java to be an abuse of global variables? Or do you consider environment variables and system properties available globally to processes as things that prevent reuse? If you look at the things that are part of sessionstate - they are similar in nature (input/output streams, list of systemwide resources and so on). this is of course open-source and re-writes are as welcome as any other contribution. i have not been a heavy user of thrift myself so cannot comment on its speed. it would be easy enough to write more rpc layers on top of hive if required. however - the current problems in the threaded server code need immediate fixing since users are trying to use it right now. probably we will have to do this as part of a separate jira.
          Hide
          Min Zhou added a comment - - edited
          • exec/FunctionTask.java: is it necessary to specify the loader in the Class.forName call? I thought that that the current thread context loader was the always the first loader to be tried anyway during name resolution.
            Yes, of course. the class loader holding by HiveConf is older than that of current thread.

          this pacth support dfs, add/delete file/jar, set now.

          btw, Joydeep, would you do me a favor writing some test code that I am not familiar with? you know, ' add jar' need a separate jar, and i not quite sure how to organize them.

          Show
          Min Zhou added a comment - - edited exec/FunctionTask.java: is it necessary to specify the loader in the Class.forName call? I thought that that the current thread context loader was the always the first loader to be tried anyway during name resolution. Yes, of course. the class loader holding by HiveConf is older than that of current thread. this pacth support dfs, add/delete file/jar, set now. btw, Joydeep, would you do me a favor writing some test code that I am not familiar with? you know, ' add jar' need a separate jar, and i not quite sure how to organize them.
          Hide
          Joydeep Sen Sarma added a comment -

          thanks! - this looks pretty good. i will add a test case - and there's also an extra addtoclasspath in execdriver.main() (for hadoop-17) that needs to get added.

          will commit after adding/running some tests.

          Show
          Joydeep Sen Sarma added a comment - thanks! - this looks pretty good. i will add a test case - and there's also an extra addtoclasspath in execdriver.main() (for hadoop-17) that needs to get added. will commit after adding/running some tests.
          Hide
          Joydeep Sen Sarma added a comment -

          modified version:

          • clean up commandprocessors to use factory and put into separate directory
          • added test for add jar (reusing existing jar - TestSerDe.jar)
          • added jars should augment -libjars in command line (required for local mode execution)

          i didn't understand the comment about the FunctionTask change.

          Show
          Joydeep Sen Sarma added a comment - modified version: clean up commandprocessors to use factory and put into separate directory added test for add jar (reusing existing jar - TestSerDe.jar) added jars should augment -libjars in command line (required for local mode execution) i didn't understand the comment about the FunctionTask change.
          Hide
          Min Zhou added a comment -

          I think you should take a look at these lines of org.apache.hadoop.conf.Configuration

            private ClassLoader classLoader;
            {
              classLoader = Thread.currentThread().getContextClassLoader();
              if (classLoader == null) {
                classLoader = Configuration.class.getClassLoader();
              }
            }
          ...
          
            public Class<?> getClassByName(String name) throws ClassNotFoundException {
              return Class.forName(name, true, classLoader);
            }
          

          ClassLoader of current thread changed when adding jars into ClassPath, conf hasnot synchronously get that change.

          Show
          Min Zhou added a comment - I think you should take a look at these lines of org.apache.hadoop.conf.Configuration private ClassLoader classLoader; { classLoader = Thread .currentThread().getContextClassLoader(); if (classLoader == null ) { classLoader = Configuration.class.getClassLoader(); } } ... public Class <?> getClassByName( String name) throws ClassNotFoundException { return Class .forName(name, true , classLoader); } ClassLoader of current thread changed when adding jars into ClassPath, conf hasnot synchronously get that change.
          Hide
          Joydeep Sen Sarma added a comment -

          ok - this is making a little more sense to me now. aside from looking at the thread classloader - we should look at the current classloader (if the former is null). the tests probably don't catch this since the thread classloader is always set.

          Show
          Joydeep Sen Sarma added a comment - ok - this is making a little more sense to me now. aside from looking at the thread classloader - we should look at the current classloader (if the former is null). the tests probably don't catch this since the thread classloader is always set.
          Hide
          Joydeep Sen Sarma added a comment -

          final version:

          • add test for delete jar
          • use thread class loader preferentially in whole bunch of places (wherever configurable classes are used)
          • remove ^Ms

          also forgot to mention - addtoclasspath logic was only adding new jars and not retaining existing jars in the classpath

          Show
          Joydeep Sen Sarma added a comment - final version: add test for delete jar use thread class loader preferentially in whole bunch of places (wherever configurable classes are used) remove ^Ms also forgot to mention - addtoclasspath logic was only adding new jars and not retaining existing jars in the classpath
          Hide
          Joydeep Sen Sarma added a comment -

          committed - thanks Min!

          Show
          Joydeep Sen Sarma added a comment - committed - thanks Min!
          Hide
          Zheng Shao added a comment -

          This patch fixes the unit test failure as in:

          ant -lib testlibs test -Dtestcase=TestCliDriver -Dqfile=input16_cc.q
          

          It basically adds the resource jars to auxjars, so that in local mode, hive can add those jars to class path.

          Show
          Zheng Shao added a comment - This patch fixes the unit test failure as in: ant -lib testlibs test -Dtestcase=TestCliDriver -Dqfile=input16_cc.q It basically adds the resource jars to auxjars, so that in local mode, hive can add those jars to class path.
          Hide
          Joydeep Sen Sarma added a comment -

          hmmm - the MapRedTask changes don't seem good to me ..

          the conf variable that gets passed to MapRedTask is a shared global object i think. if we set added jars into it - then delete jar command will not work.

          also - -libjars is the proper way of passing jars to hadoop based commands. it also sets the supplied jars into the sub-process's classpath - something that we depend on hadoop to do. so if the execdriver subprocess needs to access (by any chance) classes in the added/aux jars - then things will not work. bin/hive also uses -libjars to make sure jars are added to the process classpath.

          if the problem is primarily with the file:/ uri - why do we need the mapredtask changes?

          Show
          Joydeep Sen Sarma added a comment - hmmm - the MapRedTask changes don't seem good to me .. the conf variable that gets passed to MapRedTask is a shared global object i think. if we set added jars into it - then delete jar command will not work. also - -libjars is the proper way of passing jars to hadoop based commands. it also sets the supplied jars into the sub-process's classpath - something that we depend on hadoop to do. so if the execdriver subprocess needs to access (by any chance) classes in the added/aux jars - then things will not work. bin/hive also uses -libjars to make sure jars are added to the process classpath. if the problem is primarily with the file:/ uri - why do we need the mapredtask changes?
          Hide
          Zheng Shao added a comment -

          There are two problems:

          1. The jars added by the the "add jar" command only goes to -libjars but not "-jobconf hive.aux.jars"
          2. For hadoop 0.17.0 local mode, ExecDriver reads "hive.aux.jars" and we set the classpath ourselves (instead of depending on hadoop).

          In order to let the hadoop 0.17.0 local mode work, I have to add the added jars to hive.aux.jars and pass it to ExecDriver through the command line jobconf.

          ExecDriver.java:526
              // workaround for hadoop-17 - libjars are not added to classpath. this
              // affects local
              // mode execution
              boolean localMode = HiveConf.getVar(conf, HiveConf.ConfVars.HADOOPJT)
                  .equals("local");
              if (localMode) {
                String auxJars = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEAUXJARS);
                if (StringUtils.isNotBlank(auxJars)) {
                  try {
                    Utilities.addToClassPath(StringUtils.split(auxJars, ","));
                  } catch (Exception e) {
                    throw new HiveException(e.getMessage(), e);
                  }
                }
              }
          

          Several way to fix the global object problem:
          A. copy-create a new conf variable
          B. Add a new jobconf to replicate the value of -libjars (e.g. hive.hadoop.libjars)
          C. Set the auxjars before starting the MapRed job, and reset it after the job started.

          I prefer approach B. Thoughts?

          Show
          Zheng Shao added a comment - There are two problems: 1. The jars added by the the "add jar" command only goes to -libjars but not "-jobconf hive.aux.jars" 2. For hadoop 0.17.0 local mode, ExecDriver reads "hive.aux.jars" and we set the classpath ourselves (instead of depending on hadoop). In order to let the hadoop 0.17.0 local mode work, I have to add the added jars to hive.aux.jars and pass it to ExecDriver through the command line jobconf. ExecDriver.java:526 // workaround for hadoop-17 - libjars are not added to classpath. this // affects local // mode execution boolean localMode = HiveConf.getVar(conf, HiveConf.ConfVars.HADOOPJT) .equals( "local" ); if (localMode) { String auxJars = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEAUXJARS); if (StringUtils.isNotBlank(auxJars)) { try { Utilities.addToClassPath(StringUtils.split(auxJars, "," )); } catch (Exception e) { throw new HiveException(e.getMessage(), e); } } } Several way to fix the global object problem: A. copy-create a new conf variable B. Add a new jobconf to replicate the value of -libjars (e.g. hive.hadoop.libjars) C. Set the auxjars before starting the MapRed job, and reset it after the job started. I prefer approach B. Thoughts?
          Hide
          Joydeep Sen Sarma added a comment -

          B sounds good to me as well.

          once we deprecate support for older versions of hadoop - we can get rid of this special case code in execdriver then.

          Show
          Joydeep Sen Sarma added a comment - B sounds good to me as well. once we deprecate support for older versions of hadoop - we can get rid of this special case code in execdriver then.
          Hide
          Min Zhou added a comment - - edited

          Can you exlain why you made a change at FunctionTask .java? It caused a java.lang.ClassNotFoundException when I executing my udf where mr jobs were submitted by hive cli.
          ClassLoader didnot work.

          Show
          Min Zhou added a comment - - edited Can you exlain why you made a change at FunctionTask .java? It caused a java.lang.ClassNotFoundException when I executing my udf where mr jobs were submitted by hive cli. ClassLoader didnot work.
          Hide
          Joydeep Sen Sarma added a comment -

          the initial change only used the thread context classloader for looking up the class.

          the change i made was to (in addition) use the current (default) class loader if the thread context loader was null. this is in sync with the example you pointed out in Configuration.java (and in general seems like the right way after reading a few documents on the web). Note that it's not just FunctionTask - wherever we pick up user configured classes (inputformat being another obvious place) - we need to use this logic (and this was missing from the patch as well).

          it's possible there's some other issue if we are getting classnotfound. Zheng just committed the postfix patch that he has posted here earlier in the day - it might make sense to revert to an earlier version (in case u are synced to the latest trunk) prior to this commit and check if that works.

          Show
          Joydeep Sen Sarma added a comment - the initial change only used the thread context classloader for looking up the class. the change i made was to (in addition) use the current (default) class loader if the thread context loader was null. this is in sync with the example you pointed out in Configuration.java (and in general seems like the right way after reading a few documents on the web). Note that it's not just FunctionTask - wherever we pick up user configured classes (inputformat being another obvious place) - we need to use this logic (and this was missing from the patch as well). it's possible there's some other issue if we are getting classnotfound. Zheng just committed the postfix patch that he has posted here earlier in the day - it might make sense to revert to an earlier version (in case u are synced to the latest trunk) prior to this commit and check if that works.
          Hide
          Joydeep Sen Sarma added a comment -

          it turns out that the way we did the classloader (thread and then current) is not working for hadoop-20. The issue is that local job runner runs in a separate thread. Setting the thread context classloader is useless. Based on the changes for hadoop-4612 - it seems that the way to set the classloader is to set it in the Conf object and then to always use the Conf objects classloader for locating classes. Yuck.

          For some reason it works (only) on hadoop-19. I have no idea how this stuff works in hadoop-19 (and why setting the thread context loader makes a difference in 17/18).

          it might be best if we just fixed this as part of hive-487 (so that we could address 20 related issues as well).

          Show
          Joydeep Sen Sarma added a comment - it turns out that the way we did the classloader (thread and then current) is not working for hadoop-20. The issue is that local job runner runs in a separate thread. Setting the thread context classloader is useless. Based on the changes for hadoop-4612 - it seems that the way to set the classloader is to set it in the Conf object and then to always use the Conf objects classloader for locating classes. Yuck. For some reason it works (only) on hadoop-19. I have no idea how this stuff works in hadoop-19 (and why setting the thread context loader makes a difference in 17/18). it might be best if we just fixed this as part of hive-487 (so that we could address 20 related issues as well).
          Hide
          Zheng Shao added a comment -

          It seems this one is getting bigger. I will open a new jira for this.

          Show
          Zheng Shao added a comment - It seems this one is getting bigger. I will open a new jira for this.
          Hide
          Min Zhou added a comment -

          @Joydeep
          even 0.19 didnot work here appling your patch.

          Show
          Min Zhou added a comment - @Joydeep even 0.19 didnot work here appling your patch.
          Hide
          Joydeep Sen Sarma added a comment -

          i added a test for add jar (input16*.q) - that seems to work mysteriously in hadoop-19 (where of course i ran all the tests) - and nowhere else.

          since i don't really understand why the heck its working in 19 - let's just try this once more time via 574.

          Show
          Joydeep Sen Sarma added a comment - i added a test for add jar (input16*.q) - that seems to work mysteriously in hadoop-19 (where of course i ran all the tests) - and nowhere else. since i don't really understand why the heck its working in 19 - let's just try this once more time via 574.

            People

            • Assignee:
              Min Zhou
              Reporter:
              Min Zhou
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development