Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I have found some minor optimization issues in the codebase, which I would like to rectify and contribute. Specifically, these are:

      The optimizations that could be applied to Hive's code base are as follows:

      1. Use StringBuffer when appending strings - In 184 instances, the concatination operator (+=) was used when appending strings. This is inherintly inefficient - instead Java's StringBuffer or StringBuilder class should be used. 12 instances of this optimization can be applied to the GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver uses the + operator inside a loop, so does the column projection utilities class (ColumnProjectionUtils) and the aforementioned skew-join processor. Tests showed that using the StringBuilder when appending strings is 57% faster than using the + operator (using the StringBuffer took 122 milliseconds whilst the + operator took 284 milliseconds). The reason as to why using the StringBuffer class is preferred over using the + operator, is because

      String third = first + second;

      gets compiled to:

      StringBuilder builder = new StringBuilder( first );
      builder.append( second );
      third = builder.toString();

      Therefore, when building complex strings, that, for example involve loops, require many instantiations (and as discussed below, creating new objects inside loops is inefficient).

      2. Use arrays instead of List - Java's java.util.Arrays class asList method is a more efficient at creating creating lists from arrays than using loops to manually iterate over the elements (using asList is computationally very cheap, O(1), as it merely creates a wrapper object around the array; looping through the list however has a complexity of O since a new list is created and every element in the array is added to this new list). As confirmed by the experiment detailed in Appendix D, the Java compiler does not automatically optimize and replace tight-loop copying with asList: the loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is instant.

      Four instances of this optimization can be applied to Hive's codebase (two of these should be applied to the Map-Join container - MapJoinRowContainer) - lines 92 to 98:

      for (obj = other.first(); obj != null; obj = other.next()) {
      ArrayList<Object> ele = new ArrayList(obj.length);
      for (int i = 0; i < obj.length; i++)

      { ele.add(obj[i]); }

      list.add((Row) ele);
      }

      3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation could be avoided by simply using the provided static conversion methods. As noted in the PMD documentation, "using these avoids the cost of creating objects that also need to be garbage-collected later."

      For example, line 587 of the SemanticAnalyzer class, could be replaced by the more efficient parseDouble method call:

      // Inefficient:
      Double percent = Double.valueOf(value).doubleValue();
      // To be replaced by:
      Double percent = Double.parseDouble(value);

      Our test case in Appendix D confirms this: converting 10,000 strings into integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an unnecessary wrapper object) took 119 on average; using parseInt() took only 38. Therefore creating even just one unnecessary wrapper object can make your code up to 68% slower.

      4. Converting literals to strings using + "" - Converting literals to strings using + "" is quite inefficient (see Appendix D) and should be done by calling the toString() method instead: converting 1,000,000 integers to strings using + "" took, on average, 1340 milliseconds whilst using the toString() method only required 1183 milliseconds (hence adding empty strings takes nearly 12% more time).

      89 instances of this using + "" when converting literals were found in Hive's codebase - one of these are found in the JoinUtil.

      5. Avoid manual copying of arrays - Instead of copying arrays as is done in GroupByOperator on line 1040 (see below), the more efficient System.arraycopy can be used (arraycopy is a native method meaning that the entire memory block is copied using memcpy or mmove).

      // Line 1040 of the GroupByOperator
      for (int i = 0; i < keys.length; i++)

      { forwardCache[i] = keys[i]; }

      Using System.arraycopy on an array of 10,000 strings was (close to) instant whilst the manual copy took 6 milliseconds.
      11 instances of this optimization should be applied to the Hive codebase.

      6. Avoiding instantiation inside loops - As noted in the PMD documentation, "new objects created within loops should be checked to see if they can created outside them and reused.".

      Declaring variables inside a loop (i from 0 to 10,000) took 300 milliseconds
      whilst declaring them outside took only 88 milliseconds (this can be explained by the fact that when declaring a variable outside the loop, its reference will be re-used for each iteration. However when declaring variables inside a loop, new references will be created for each iteration. In our case, 10,000 references will be created by the time that this loop finishes, meaning lots of work in terms of memory allocation and garbage collection). 1623 instances of this optimization can be applied.

      To summarize, I propose to modify the code to address issue 1 and issue 6 (remaining issues (2 - 5) will be addressed later). Details are specified as sub-tasks.

        Activity

        Hide
        Xuefu Zhang added a comment -

        Good findings. While these principles (plus others) seem insignificant, it's a good thing for a developer to keep them in mind as they may add up to an overall performance issue.

        Show
        Xuefu Zhang added a comment - Good findings. While these principles (plus others) seem insignificant, it's a good thing for a developer to keep them in mind as they may add up to an overall performance issue.
        Hide
        Benjamin Jakobus added a comment -

        This would be my first contribution here - how does it usually work? Will someone assign me the issue and then I commit and have changes reviewed?

        Show
        Benjamin Jakobus added a comment - This would be my first contribution here - how does it usually work? Will someone assign me the issue and then I commit and have changes reviewed?
        Hide
        Xuefu Zhang added a comment -

        Someone should be able to put you in the contributor list. With that, you should be able to assign the JIRA to yourself. In the meantime, you can work on a patch and attach it here for review.

        Check out developer's guide at https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide.

        Show
        Xuefu Zhang added a comment - Someone should be able to put you in the contributor list. With that, you should be able to assign the JIRA to yourself. In the meantime, you can work on a patch and attach it here for review. Check out developer's guide at https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide .
        Hide
        Benjamin Jakobus added a comment -

        Cool, thanks. Will do.

        Show
        Benjamin Jakobus added a comment - Cool, thanks. Will do.
        Hide
        Thejas M Nair added a comment -

        Regarding the process of contributing more information is here - https://cwiki.apache.org/confluence/display/Hive/HowToContribute
        As Edward Capriolo mentioned in mailing list, you might want to create break this work into pieces so that you get feedback sooner. You can create subtasks jiras for this one to help manage that.
        Note that some of these files in your list are thrift generated files (see https://cwiki.apache.org/Hive/howtocontribute.html#HowToContribute-GeneratingThriftCode) . You should not be changing these files directly, as the next time they are regenerated, the changes will be lost. To fix those you would need to fix thrift.

        Show
        Thejas M Nair added a comment - Regarding the process of contributing more information is here - https://cwiki.apache.org/confluence/display/Hive/HowToContribute As Edward Capriolo mentioned in mailing list, you might want to create break this work into pieces so that you get feedback sooner. You can create subtasks jiras for this one to help manage that. Note that some of these files in your list are thrift generated files (see https://cwiki.apache.org/Hive/howtocontribute.html#HowToContribute-GeneratingThriftCode ) . You should not be changing these files directly, as the next time they are regenerated, the changes will be lost. To fix those you would need to fix thrift.
        Hide
        Edward Capriolo added a comment -

        Your not going to be able to fix any of the thrift / protobuf generated files. You will just have to ignore them.

        Show
        Edward Capriolo added a comment - Your not going to be able to fix any of the thrift / protobuf generated files. You will just have to ignore them.
        Hide
        Benjamin Jakobus added a comment -

        ok, thanks.

        Show
        Benjamin Jakobus added a comment - ok, thanks.
        Hide
        Benjamin Jakobus added a comment -

        I'm trying to test my code (just cloned a new copy) but keep on getting a runtime exception (I haven't applied my patch yet):

        1) Downloaded Hadoop 1.2.1.
        2) ant -Dhadoop.version=1.2.1 clean package (OK - no problems)
        3) export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:///tmp \
        -hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse \
        -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true'
        4) export HADOOP_HOME=~/Workspace/hadoop-1.2.1/
        5) Running Hive via CLI (/build/dist/bin/hive) - show tables; quit; (works)
        6) Trying to run some test scripts I get:

        java.lang.RuntimeException: Cannot serialize object
        at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:330)
        at org.apache.hadoop.hive.ql.exec.Utilities.serializeObject(Utilities.java:611)
        at org.apache.hadoop.hive.ql.plan.MapredWork.toXML(MapredWork.java:88)
        at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:505)
        at org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182)
        at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
        at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
        at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
        at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79)
        at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:90)
        at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.compile(MapReduceCompiler.java:292)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8333)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:341)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:966)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
        at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:737)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
        Caused by: java.lang.Exception: XMLEncoder: discarding statement XMLEncoder.writeObject(MapredWork);
        ... 32 more
        Caused by: java.lang.RuntimeException: Cannot serialize object
        at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:256)
        at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeObject1(Encoder.java:258)
        at java.beans.Encoder.cloneStatement(Encoder.java:271)
        at java.beans.Encoder.writeStatement(Encoder.java:301)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400)
        ... 31 more
        Caused by: java.lang.RuntimeException: Cannot serialize object
        at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:256)
        at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:194)
        at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:253)
        ... 44 more
        Caused by: java.lang.RuntimeException: Cannot serialize object
        at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:256)
        at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:194)
        at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:253)
        ... 52 more
        Caused by: java.lang.RuntimeException: Cannot serialize object
        at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426)
        at java.beans.DefaultPersistenceDelegate.invokeStatement(DefaultPersistenceDelegate.java:217)
        at java.beans.java_util_List_PersistenceDelegate.initialize(MetaData.java:649)
        at java.beans.PersistenceDelegate.initialize(PersistenceDelegate.java:212)
        at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:398)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:194)
        at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:253)
        ... 65 more
        Caused by: java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(ArrayList);
        ... 82 more
        Caused by: java.lang.RuntimeException: Cannot serialize object
        at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426)
        at java.beans.DefaultPersistenceDelegate.invokeStatement(DefaultPersistenceDelegate.java:217)
        at java.beans.java_util_List_PersistenceDelegate.initialize(MetaData.java:649)
        at java.beans.PersistenceDelegate.initialize(PersistenceDelegate.java:212)
        at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:398)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeObject1(Encoder.java:258)
        at java.beans.Encoder.cloneStatement(Encoder.java:271)
        at java.beans.Encoder.writeStatement(Encoder.java:301)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400)
        ... 81 more
        Caused by: java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(ASTNode);
        ... 98 more
        Caused by: java.lang.RuntimeException: Cannot serialize object
        at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:238)
        at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeObject1(Encoder.java:258)
        at java.beans.Encoder.cloneStatement(Encoder.java:271)
        at java.beans.Encoder.writeStatement(Encoder.java:301)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400)
        ... 97 more
        Caused by: java.lang.RuntimeException: Cannot serialize object
        at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.Encoder.getValue(Encoder.java:108)
        at java.beans.Encoder.get(Encoder.java:252)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:112)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:232)
        ... 110 more
        Caused by: java.lang.InstantiationException: org.antlr.runtime.CommonToken
        at java.lang.Class.newInstance0(Class.java:359)
        at java.lang.Class.newInstance(Class.java:327)
        at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
        at sun.reflect.GeneratedMethodAccessor81.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
        at java.beans.Statement.invokeInternal(Statement.java:292)
        at java.beans.Statement.access$000(Statement.java:58)
        at java.beans.Statement$2.run(Statement.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.beans.Statement.invoke(Statement.java:182)
        at java.beans.Expression.getValue(Expression.java:153)
        at java.beans.Encoder.getValue(Encoder.java:105)
        ... 122 more
        FAILED: SemanticException Generate Map Join Task Error: Cannot serialize object

        Show
        Benjamin Jakobus added a comment - I'm trying to test my code (just cloned a new copy) but keep on getting a runtime exception (I haven't applied my patch yet): 1) Downloaded Hadoop 1.2.1. 2) ant -Dhadoop.version=1.2.1 clean package (OK - no problems) 3) export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf fs.default.name= file:///tmp \ -hiveconf hive.metastore.warehouse.dir= file:///tmp/warehouse \ -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true' 4) export HADOOP_HOME=~/Workspace/hadoop-1.2.1/ 5) Running Hive via CLI (/build/dist/bin/hive) - show tables; quit; (works) 6) Trying to run some test scripts I get: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:330) at org.apache.hadoop.hive.ql.exec.Utilities.serializeObject(Utilities.java:611) at org.apache.hadoop.hive.ql.plan.MapredWork.toXML(MapredWork.java:88) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:505) at org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:90) at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.compile(MapReduceCompiler.java:292) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8333) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:341) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:966) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:737) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.lang.Exception: XMLEncoder: discarding statement XMLEncoder.writeObject(MapredWork); ... 32 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:256) at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeObject1(Encoder.java:258) at java.beans.Encoder.cloneStatement(Encoder.java:271) at java.beans.Encoder.writeStatement(Encoder.java:301) at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400) ... 31 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:256) at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:194) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:253) ... 44 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:256) at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:194) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:253) ... 52 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426) at java.beans.DefaultPersistenceDelegate.invokeStatement(DefaultPersistenceDelegate.java:217) at java.beans.java_util_List_PersistenceDelegate.initialize(MetaData.java:649) at java.beans.PersistenceDelegate.initialize(PersistenceDelegate.java:212) at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:398) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:194) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:253) ... 65 more Caused by: java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(ArrayList); ... 82 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426) at java.beans.DefaultPersistenceDelegate.invokeStatement(DefaultPersistenceDelegate.java:217) at java.beans.java_util_List_PersistenceDelegate.initialize(MetaData.java:649) at java.beans.PersistenceDelegate.initialize(PersistenceDelegate.java:212) at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:398) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeObject1(Encoder.java:258) at java.beans.Encoder.cloneStatement(Encoder.java:271) at java.beans.Encoder.writeStatement(Encoder.java:301) at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400) ... 81 more Caused by: java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(ASTNode); ... 98 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:238) at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeObject1(Encoder.java:258) at java.beans.Encoder.cloneStatement(Encoder.java:271) at java.beans.Encoder.writeStatement(Encoder.java:301) at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400) ... 97 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.Encoder.getValue(Encoder.java:108) at java.beans.Encoder.get(Encoder.java:252) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:112) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:232) ... 110 more Caused by: java.lang.InstantiationException: org.antlr.runtime.CommonToken at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor81.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at java.beans.Statement.invokeInternal(Statement.java:292) at java.beans.Statement.access$000(Statement.java:58) at java.beans.Statement$2.run(Statement.java:185) at java.security.AccessController.doPrivileged(Native Method) at java.beans.Statement.invoke(Statement.java:182) at java.beans.Expression.getValue(Expression.java:153) at java.beans.Encoder.getValue(Encoder.java:105) ... 122 more FAILED: SemanticException Generate Map Join Task Error: Cannot serialize object
        Hide
        Thejas M Nair added a comment -

        Based on the comments in HIVE-3739, it might be related to your jdk version. If you are using jdk7, you might want to check if jdk6 helps. Please let us know on jira if you find a way out.

        Show
        Thejas M Nair added a comment - Based on the comments in HIVE-3739 , it might be related to your jdk version. If you are using jdk7, you might want to check if jdk6 helps. Please let us know on jira if you find a way out.
        Hide
        Benjamin Jakobus added a comment -

        Thanks. Yes, that did the trick!

        Show
        Benjamin Jakobus added a comment - Thanks. Yes, that did the trick!
        Hide
        Benjamin Jakobus added a comment -

        Mhh, another silly question: my changes don't seem to take effect after compiling.
        1) Edit file (e.g. add console.printInfo(">>>>> DEBUG: exec time: " + ((end - offset) / 1000) ); )
        2) ant -Dhadoop.version=1.2.1 clean package
        3) Run test script. But no output written to log or console)

        Any advice?

        Show
        Benjamin Jakobus added a comment - Mhh, another silly question: my changes don't seem to take effect after compiling. 1) Edit file (e.g. add console.printInfo(">>>>> DEBUG: exec time: " + ((end - offset) / 1000) ); ) 2) ant -Dhadoop.version=1.2.1 clean package 3) Run test script. But no output written to log or console) Any advice?
        Hide
        Benjamin Jakobus added a comment -

        Never mind - resolved. Problem was me being an idiot.

        Show
        Benjamin Jakobus added a comment - Never mind - resolved. Problem was me being an idiot.
        Hide
        Benjamin Jakobus added a comment -

        However is there a faster way to compile - or do I need to rely on ivy, maven etc every time?
        ant -Dhadoop.version=1.2.1 clean package takes about 3 minutes every time.

        Show
        Benjamin Jakobus added a comment - However is there a faster way to compile - or do I need to rely on ivy, maven etc every time? ant -Dhadoop.version=1.2.1 clean package takes about 3 minutes every time.
        Hide
        Edward Capriolo added a comment -

        Our build process is slow. Technically do not need clean 'every' time mostly you only need it when changing the hadoop version or updating one of the libs. However the build is still 'slow' regardless of running clean first. Its just something we have to deal with for a bit until we re factor everything.

        Show
        Edward Capriolo added a comment - Our build process is slow. Technically do not need clean 'every' time mostly you only need it when changing the hadoop version or updating one of the libs. However the build is still 'slow' regardless of running clean first. Its just something we have to deal with for a bit until we re factor everything.
        Hide
        Benjamin Jakobus added a comment -

        OK, thanks.

        Show
        Benjamin Jakobus added a comment - OK, thanks.
        Hide
        Benjamin Jakobus added a comment -

        Bump

        Show
        Benjamin Jakobus added a comment - Bump
        Hide
        Thejas M Nair added a comment -

        Preparing for 0.12 release. Removing fix version of 0.12 for those that are not in 0.12 branch.

        Show
        Thejas M Nair added a comment - Preparing for 0.12 release. Removing fix version of 0.12 for those that are not in 0.12 branch.

          People

          • Assignee:
            Benjamin Jakobus
            Reporter:
            Benjamin Jakobus
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 48h
              48h
              Remaining:
              Remaining Estimate - 48h
              48h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development