Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
NPE is thrown if both hive.optimize.skewjoin.compiletime and hive.auto.convert.join are enabled:
java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:329) at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:236) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.convertTaskToMapJoinTask(CommonJoinTaskDispatcher.java:181) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:463) at org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107) at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:275) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:223) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10028) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:415) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1068) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1130) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1005) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:995) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
A simple way to reproduce this issue is to add set hive.auto.convert.join=true to one of the skew join qfile, e.g. skewjoinopt2.q.
While reduce side join can produce correct results, we kind of lost the point of skew join optimization - join skewed data via a map join to avoid one reducer getting too much records.
Attachments
Attachments
Issue Links
- is related to
-
HIVE-8535 Enable compile time skew join optimization for spark [Spark Branch]
- Resolved
- links to