Hive
  1. Hive
  2. HIVE-5518

ADD JAR should add entries to local classpath

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.12.0
    • Fix Version/s: None
    • Component/s: CLI
    • Labels:
      None

      Description

      Jars referenced in ADD JAR statements are not made available on the immediate classpath. That means they're useless for scripts which need to initialize external output formats for job submission (ie, hbase storage handler). Is this expected behavior?

      For example, the table 'pagecounts_hbase' is an hbase table defined using the HBaseStorageHandler

      $ cat foo.hql
      ADD FILE /etc/hbase/conf/hbase-site.xml;
      ADD JAR /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar;
      ADD JAR /usr/lib/hbase/lib/hbase-server-0.96.0.2.0.6.0-68-hadoop2.jar;
      ADD JAR /usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-68-hadoop2.jar;
      ADD JAR /usr/lib/hbase/lib/hbase-protocol-0.96.0.2.0.6.0-68-hadoop2.jar;
      
      FROM pgc INSERT INTO TABLE pagecounts_hbase SELECT pgc.* WHERE rowkey LIKE 'en/q%' LIMIT 10;
      $ hive -f foo.hql
      ...
      Added resource: /etc/hbase/conf/hbase-site.xml
      Added /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar to class path
      Added resource: /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar
      ...
      Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/mapreduce/TableInputFormatBase                                                                                [29/1858]
              at java.lang.ClassLoader.defineClass1(Native Method)
              at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
              at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
              at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
              at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
              at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
              at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
              at java.security.AccessController.doPrivileged(Native Method)
              at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
              at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
              at java.lang.Class.forName0(Native Method)
              at java.lang.Class.forName(Class.java:266)
              at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:305)
              at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:98)
              at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:989)
              at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:892)
              at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.<init>(BaseSemanticAnalyzer.java:730)
              at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.<init>(BaseSemanticAnalyzer.java:707)
              at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1196)
              at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1053)
              at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8342)
              at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
              at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:441)
              at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
              at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
              at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
              at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
              at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
              at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
              at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
              at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
              at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
              at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:737)
              at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
              at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
      Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
              at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
              at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
              at java.security.AccessController.doPrivileged(Native Method)
              at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
              at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
              ... 48 more
      

      The ADDed jar hbase-server.jar contains the missing class:

      $ unzip -l /usr/lib/hbase/lib/hbase-server-0.96.0.2.0.6.0-68-hadoop2.jar | grep TableInputFormatBase
           5363  10-09-2013 19:45   org/apache/hadoop/hbase/mapred/TableInputFormatBase.class
           7460  10-09-2013 19:45   org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.class
           8803  10-09-2013 19:45   org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.class
      

        Activity

        Hide
        Edward Capriolo added a comment -

        Lets look into this. I do not see a reason why the auxpath and add jar list can not be combined. It sure would make many things easier.

        Show
        Edward Capriolo added a comment - Lets look into this. I do not see a reason why the auxpath and add jar list can not be combined. It sure would make many things easier.
        Hide
        Nick Dimiduk added a comment -

        It sure would make many things easier.

        +1

        The combination of auxpath, classpath, and jar list makes this stuff pretty opaque. How do each of these impact the operations I want to run, which omissions cause immediate failure and which won't fail until I'm into the 3rd job in my plan? Seems most people I talk to populate their jars everywhere and cross their fingers.

        Show
        Nick Dimiduk added a comment - It sure would make many things easier. +1 The combination of auxpath, classpath, and jar list makes this stuff pretty opaque. How do each of these impact the operations I want to run, which omissions cause immediate failure and which won't fail until I'm into the 3rd job in my plan? Seems most people I talk to populate their jars everywhere and cross their fingers.
        Hide
        Edward Capriolo added a comment -

        Anecdotally. Anything required as part of an input format needs to be on the aux_path. They are needed to read the data, where as UDFs need not be on the aux_path as they are used inside operators. It would be great if we could unify these concepts without making the classpath needed to launch every job very large.

        Show
        Edward Capriolo added a comment - Anecdotally. Anything required as part of an input format needs to be on the aux_path. They are needed to read the data, where as UDFs need not be on the aux_path as they are used inside operators. It would be great if we could unify these concepts without making the classpath needed to launch every job very large.
        Hide
        Nick Dimiduk added a comment -

        For mapreduce jobs involving HBase, we expect the client classpath will contain all of the necessary jars. We have a utility for determining which should be shipped to the cluster based on job parameters (TableMapReduceUtils#addDependencyJars). The user can also explicitly call these methods to add additional jars by a contained class.

        IMHO, it would be nice if a Hive user didn't have to distinguish between jars needed locally and jars needed by the jobs. Just toss them all in and let the system sort it out.

        Show
        Nick Dimiduk added a comment - For mapreduce jobs involving HBase, we expect the client classpath will contain all of the necessary jars. We have a utility for determining which should be shipped to the cluster based on job parameters ( TableMapReduceUtils#addDependencyJars ). The user can also explicitly call these methods to add additional jars by a contained class. IMHO, it would be nice if a Hive user didn't have to distinguish between jars needed locally and jars needed by the jobs. Just toss them all in and let the system sort it out.
        Hide
        Nick Dimiduk added a comment -

        A related question: what's the precise behavior of using ADD JAR ? Does it copy the file from the client local machine up to the job's distributed cache and include it on the task classpaths?

        Show
        Nick Dimiduk added a comment - A related question: what's the precise behavior of using ADD JAR ? Does it copy the file from the client local machine up to the job's distributed cache and include it on the task classpaths?

          People

          • Assignee:
            Unassigned
            Reporter:
            Nick Dimiduk
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development