Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15756

Update/deletes on ACID table throws ArrayIndexOutOfBoundsException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Bug
    • 2.0.0
    • None
    • Transactions
    • None

    Description

      Update and delete queries on ACID tables fail throwing ArrayIndexOutOfBoundsException.

      hive> update customer_acid set c_comment = 'foo bar' where c_custkey % 100 = 1;
      Query ID = cstm-hdfs_20170128005823_efa1cdb7-2ad2-4371-ac80-0e35868ad17c
      Total jobs = 1
      Launching Job 1 out of 1
      Tez session was closed. Reopening...
      Session re-established.
      
      
      Status: Running (Executing on YARN cluster with App id application_1485331877667_0036)
      
      --------------------------------------------------------------------------------
              VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
      --------------------------------------------------------------------------------
      Map 1 ..........   SUCCEEDED     14         14        0        0       0       0
      Reducer 2             FAILED      1          0        0        1       1       0
      --------------------------------------------------------------------------------
      VERTICES: 01/02  [========================>>--] 93%   ELAPSED TIME: 23.68 s    
      --------------------------------------------------------------------------------
      Status: Failed
      Vertex failed, vertexName=Reducer 2, vertexId=vertex_1485331877667_0036_1_01, diagnostics=[Task failed, taskId=task_1485331877667_0036_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
      	... 14 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
      	... 16 more
      Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:780)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
      	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
      	... 17 more
      ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1485331877667_0036_1_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]
      DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
      FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_1485331877667_0036_1_01, diagnostics=[Task failed, taskId=task_1485331877667_0036_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
      	... 14 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
      	... 16 more
      Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:780)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
      	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
      	... 17 more
      ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1485331877667_0036_1_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
      
      hive> explain extended update customer_acid set c_comment = 'foo bar' where c_custkey % 100 = 1;
      OK
      ABSTRACT SYNTAX TREE:
        
      TOK_UPDATE_TABLE
         TOK_TABNAME
            customer_acid
         TOK_SET_COLUMNS_CLAUSE
            =
               TOK_TABLE_OR_COL
                  c_comment
               'foo bar'
         TOK_WHERE
            =
               %
                  TOK_TABLE_OR_COL
                     c_custkey
                  100
               1
      
      
      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-2 depends on stages: Stage-1
        Stage-0 depends on stages: Stage-2
        Stage-3 depends on stages: Stage-0
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            DagId: cstm-hdfs_20170128012834_4d41e184-1e40-443c-9990-147cfdc6ea15:5
            Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE)
            DagName: 
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: customer_acid
                        filterExpr: ((c_custkey % 100) = 1) (type: boolean)
                        Statistics: Num rows: 25219 Data size: 8700894 Basic stats: COMPLETE Column stats: NONE
                        GatherStats: false
                        Filter Operator
                          isSamplingPred: false
                          predicate: ((c_custkey % 100) = 1) (type: boolean)
                          Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
                          Select Operator
                            expressions: ROW__ID (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>), c_custkey (type: int), c_name (type: string), c_address (type: string), c_nationkey (type: int), c_phone (type: char(15)), c_acctbal (type: decimal(15,2)), c_mktsegment (type: char(10))
                            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                            Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              key expressions: _col0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>)
                              sort order: +
                              Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
                              tag: -1
                              value expressions: _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: char(15)), _col6 (type: decimal(15,2)), _col7 (type: char(10))
                              auto parallelism: true
                  Path -> Alias:
                    hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid [customer_acid]
                  Path -> Partition:
                    hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid 
                      Partition
                        base file name: customer_acid
                        input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                        output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                        properties:
                          bucket_count 8
                          bucket_field_name c_custkey
                          columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
                          columns.comments 
                          columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
                          file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                          file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                          location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
                          name tpch.customer_acid
                          numFiles 12
                          numRows 0
                          rawDataSize 0
                          serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
                          serialization.format 1
                          serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                          totalSize 8700894
                          transactional true
                          transient_lastDdlTime 1485548417
                        serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      
                          input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                          output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                          properties:
                            bucket_count 8
                            bucket_field_name c_custkey
                            columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
                            columns.comments 
                            columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
                            file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                            file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                            location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
                            name tpch.customer_acid
                            numFiles 12
                            numRows 0
                            rawDataSize 0
                            serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
                            serialization.format 1
                            serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                            totalSize 8700894
                            transactional true
                            transient_lastDdlTime 1485548417
                          serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                          name: tpch.customer_acid
                        name: tpch.customer_acid
                  Truncated Path -> Alias:
                    /tpch.db/customer_acid [customer_acid]
              Reducer 2 
                  Needs Tagging: false
                  Reduce Operator Tree:
                    Select Operator
                      expressions: KEY.reducesinkkey0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>), VALUE._col0 (type: int), VALUE._col1 (type: string), VALUE._col2 (type: string), VALUE._col3 (type: int), VALUE._col4 (type: char(15)), VALUE._col5 (type: decimal(15,2)), VALUE._col6 (type: char(10)), 'foo bar' (type: string)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                      Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
                      File Output Operator
                        compressed: false
                        GlobalTableId: 1
                        directory: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000
                        NumFilesPerFileSink: 1
                        Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
                        Stats Publishing Key Prefix: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000/
                        table:
                            input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                            output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                            properties:
                              bucket_count 8
                              bucket_field_name c_custkey
                              columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
                              columns.comments 
                              columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
                              file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                              file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                              location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
                              name tpch.customer_acid
                              numFiles 12
                              numRows 0
                              rawDataSize 0
                              serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
                              serialization.format 1
                              serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                              totalSize 8700894
                              transactional true
                              transient_lastDdlTime 1485548417
                            serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                            name: tpch.customer_acid
                        TotalFiles: 1
                        GatherStats: true
                        MultiFileSpray: false
      
        Stage: Stage-2
          Dependency Collection
      
        Stage: Stage-0
          Move Operator
            tables:
                replace: false
                source: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000
                table:
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    properties:
                      bucket_count 8
                      bucket_field_name c_custkey
                      columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
                      columns.comments 
                      columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
                      file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
                      name tpch.customer_acid
                      numFiles 12
                      numRows 0
                      rawDataSize 0
                      serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
                      serialization.format 1
                      serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      totalSize 8700894
                      transactional true
                      transient_lastDdlTime 1485548417
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: tpch.customer_acid
      
        Stage: Stage-3
          Stats-Aggr Operator
            Stats Aggregation Key Prefix: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000/
      
      Time taken: 0.422 seconds, Fetched: 189 row(s)
      
      

      Attachments

        Issue Links

          Activity

            People

              ekoifman Eugene Koifman
              kavansuresh@gmail.com Kavan Suresh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: