Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9718

Insert into dynamic partitions with same column structure in the "distibute by" clause barfs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 0.14.0, 1.0.0, 1.1.0
    • 1.2.0
    • Logical Optimizer
    • None

    Description

      Sample reproducible query:

      SET hive.exec.dynamic.partition.mode=nonstrict;
      SET hive.exec.dynamic.partition=true;
      
       insert overwrite table nation_new_p partition (some)
      select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3;
      

      Note: Make sure there is data in the source table to reproduce the issue.

      During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, an optimization of deduplication of columns is done. But, when one of the columns is used as part of partitioned/distribute by, its not taken care of.

      The above query produces exception as follows:

      Diagnostic Messages for this Task:
      java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. carefully final deposits detect slyly agai"}
      	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
      	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
      	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
      	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
      	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
      	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
      	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. carefully final deposits detect slyly agai"}
      	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
      	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
      	... 12 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
      	at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
      	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
      	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
      	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
      	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
      	... 13 more
      Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
      	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
      	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
      	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
      	at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
      	... 19 more
      

      Tables used are:

      CREATE EXTERNAL TABLE `nation`(
        `n_nationkey` int,
        `n_name` string,
        `n_regionkey` int,
        `n_comment` string)
      ROW FORMAT DELIMITED
        FIELDS TERMINATED BY '|'
      STORED AS INPUTFORMAT
        'org.apache.hadoop.mapred.TextInputFormat'
      OUTPUTFORMAT
        'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
      

      and

      CREATE TABLE `nation_new_p`(
        `n_name1` string,
        `n_name2` string)
      PARTITIONED BY (
        `some` string)
      ROW FORMAT DELIMITED
        FIELDS TERMINATED BY '|'
      STORED AS INPUTFORMAT
        'org.apache.hadoop.mapred.TextInputFormat'
      OUTPUTFORMAT
        'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
      

      Sample data for the table is provided by the file attached with.

      Attachments

        1. HIVE-9718.patch
          101 kB
          Pavan Srinivas
        2. HIVE-9718-0.14.patch
          101 kB
          Pavan Srinivas
        3. HIVE-9718-1.0.patch
          101 kB
          Pavan Srinivas
        4. nation.tbl
          2 kB
          Pavan Srinivas
        5. patch.txt
          101 kB
          Pavan Srinivas

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pavan101 Pavan Srinivas
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: