Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10316

same query works with TEXTFILE and fails with ORC

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.14.0
    • None
    • Compression
    • None
    • hortonworks HDP 2.2 running on Linux

    Description

      See also related answer in mailing list :
      http://mail-archives.apache.org/mod_mbox/hive-user/201504.mbox/%3CD15184D6.27779%25gopal%40hortonworks.com%3E

      I’m getting an error in Hive when executing a query on a table in ORC format.
      After several trials, I succeeded to run the same query on the same table in TEXTFILE format.
      I ‘ve been able to reproduce the error with the simple sql script below.
      I create the same table in TEXFILE and in ORC and I run a SELECT …GROUP BY on the tables.
      The first SELECT issued on the TEXTFILE table succeeds.

      The second SELECT issued on the ORC table fails.
      NB : There is a CONCAT in the query. If I remove the CONCAT the query is running ok with both tables …

      Example script to reproduce the error :

      USE pvr_temp;
      DROP TABLE IF EXISTS students_text;
      CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS TEXTFILE;
      INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
      SELECT CONCAT(TO_DATE(datetime), ''), SUM(gpa) FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '');
      DROP TABLE IF EXISTS students_orc;
      CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC;
      INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13 SELECT CONCAT(TO_DATE(datetime), ''), SUM(gpa) FROM students_orc GROUP BY CONCAT(TO_DATE(datetime), '');
      13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);

      Log where you can see the error :

      [pvr@tpcalr01s ~]$ cat test.log
      scan complete in 9ms
      Connecting to jdbc:hive2://tpcrmm03s:10000
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
      Connected to: Apache Hive (version 0.14.0.2.2.0.0-2041)
      Driver: Hive JDBC (version 0.14.0.2.2.0.0-2041)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      0: jdbc:hive2://tpcrmm03s:10000> USE pvr_temp;
      No rows affected (0.061 seconds)
      0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_text;
      No rows affected (0.12 seconds)
      0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS TEXTFILE;
      No rows affected (0.057 seconds)
      0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
      INFO : Tez session hasn't been created yet. Opening session
      INFO :

      INFO : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

      INFO : Map 1: /
      INFO : Map 1: 0/1
      No rows affected (14.134 seconds)
      INFO : Map 1: 0/1
      INFO : Map 1: 0(+1)/1
      INFO : Map 1: 0(+1)/1
      INFO : Map 1: 1/1
      INFO : Loading data to table pvr_temp.students_text from hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-08_445_2811483497310651606-20/-ext-10000
      INFO : Table pvr_temp.students_text stats: [numFiles=1, numRows=2, totalSize=86, rawDataSize=84]
      0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), ''), SUM(gpa) FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '');
      INFO : Session is already open
      INFO :

      INFO : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

      INFO : Map 1: / Reducer 2: 0/1
      INFO : Map 1: 0/1 Reducer 2: 0/1
      INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
      INFO : Map 1: 1/1 Reducer 2: 0(+1)/1
      INFO : Map 1: 1/1 Reducer 2: 1/1
      ------------------

      _c0 _c1

      ------------------

      2015-04-13- 3.6

      ------------------
      1 row selected (3.258 seconds)
      0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_orc;
      No rows affected (0.109 seconds)
      0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC;
      No rows affected (0.063 seconds)
      0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
      No rows affected (2.125 seconds)
      INFO : Session is already open
      INFO :

      INFO : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

      INFO : Map 1: 0/1
      INFO : Map 1: 0(+1)/1
      INFO : Map 1: 1/1
      INFO : Loading data to table pvr_temp.students_orc from hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-26_056_1247475009666467472-20/-ext-10000
      INFO : Table pvr_temp.students_orc stats: [numFiles=1, numRows=2, totalSize=590, rawDataSize=508]
      0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), ''), SUM(gpa) FROM students_orc GROUP BY CONCAT(TO_DATE(datetime), '');
      INFO : Session is already open
      INFO :

      INFO : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

      INFO : Map 1: / Reducer 2: 0/1
      INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
      INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/1
      INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/1
      INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/1
      INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/1
      INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/1
      INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/1
      ERROR : Status: Failed
      ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1428656093356_0047_4_00, diagnostics=[Task failed, taskId=task_1428656093356_0047_4_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
      at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: Map operator initialization failed
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
      ... 13 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
      at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
      at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
      at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
      at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
      at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
      at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
      at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
      ... 14 more
      ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
      at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: Map operator initialization failed
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
      ... 13 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
      at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
      at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
      at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
      at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
      at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
      at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
      at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
      ... 14 more
      ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
      at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: Map operator initialization failed
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
      ... 13 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
      at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
      at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
      at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
      at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
      at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
      at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
      at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
      ... 14 more
      ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
      at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: Map operator initialization failed
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
      ... 13 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
      at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
      at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
      at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
      at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
      at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
      at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
      at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
      ... 14 more
      ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1428656093356_0047_4_00 [Map 1] killed/failed due to:null]
      ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1428656093356_0047_4_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1428656093356_0047_4_01 [Reducer 2] killed/failed due to:null]
      ERROR : DAG failed due to vertex failure. failedVertices:1 killedVertices:1
      Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=2)

      Closing: 0: jdbc:hive2://tpcrmm03s:10000

      Attachments

        Activity

          People

            Unassigned Unassigned
            Philippe Verhaeghe Philippe Verhaeghe
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: