[HIVE-15756] Update/deletes on ACID table throws ArrayIndexOutOfBoundsException - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Not A Bug
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: Transactions
Labels:
None

Description

Update and delete queries on ACID tables fail throwing ArrayIndexOutOfBoundsException.

hive> update customer_acid set c_comment = 'foo bar' where c_custkey % 100 = 1;
Query ID = cstm-hdfs_20170128005823_efa1cdb7-2ad2-4371-ac80-0e35868ad17c
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.


Status: Running (Executing on YARN cluster with App id application_1485331877667_0036)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED     14         14        0        0       0       0
Reducer 2             FAILED      1          0        0        1       1       0
--------------------------------------------------------------------------------
VERTICES: 01/02  [========================>>--] 93%   ELAPSED TIME: 23.68 s    
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1485331877667_0036_1_01, diagnostics=[Task failed, taskId=task_1485331877667_0036_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
	... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
	... 16 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:780)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
	... 17 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1485331877667_0036_1_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_1485331877667_0036_1_01, diagnostics=[Task failed, taskId=task_1485331877667_0036_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
	... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
	... 16 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:780)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
	... 17 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1485331877667_0036_1_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

hive> explain extended update customer_acid set c_comment = 'foo bar' where c_custkey % 100 = 1;
OK
ABSTRACT SYNTAX TREE:
  
TOK_UPDATE_TABLE
   TOK_TABNAME
      customer_acid
   TOK_SET_COLUMNS_CLAUSE
      =
         TOK_TABLE_OR_COL
            c_comment
         'foo bar'
   TOK_WHERE
      =
         %
            TOK_TABLE_OR_COL
               c_custkey
            100
         1


STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: cstm-hdfs_20170128012834_4d41e184-1e40-443c-9990-147cfdc6ea15:5
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
      DagName: 
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: customer_acid
                  filterExpr: ((c_custkey % 100) = 1) (type: boolean)
                  Statistics: Num rows: 25219 Data size: 8700894 Basic stats: COMPLETE Column stats: NONE
                  GatherStats: false
                  Filter Operator
                    isSamplingPred: false
                    predicate: ((c_custkey % 100) = 1) (type: boolean)
                    Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: ROW__ID (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>), c_custkey (type: int), c_name (type: string), c_address (type: string), c_nationkey (type: int), c_phone (type: char(15)), c_acctbal (type: decimal(15,2)), c_mktsegment (type: char(10))
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                      Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>)
                        sort order: +
                        Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
                        tag: -1
                        value expressions: _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: char(15)), _col6 (type: decimal(15,2)), _col7 (type: char(10))
                        auto parallelism: true
            Path -> Alias:
              hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid [customer_acid]
            Path -> Partition:
              hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid 
                Partition
                  base file name: customer_acid
                  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                  output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                  properties:
                    bucket_count 8
                    bucket_field_name c_custkey
                    columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
                    columns.comments 
                    columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
                    file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
                    name tpch.customer_acid
                    numFiles 12
                    numRows 0
                    rawDataSize 0
                    serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
                    serialization.format 1
                    serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    totalSize 8700894
                    transactional true
                    transient_lastDdlTime 1485548417
                  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    properties:
                      bucket_count 8
                      bucket_field_name c_custkey
                      columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
                      columns.comments 
                      columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
                      file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
                      name tpch.customer_acid
                      numFiles 12
                      numRows 0
                      rawDataSize 0
                      serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
                      serialization.format 1
                      serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      totalSize 8700894
                      transactional true
                      transient_lastDdlTime 1485548417
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: tpch.customer_acid
                  name: tpch.customer_acid
            Truncated Path -> Alias:
              /tpch.db/customer_acid [customer_acid]
        Reducer 2 
            Needs Tagging: false
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>), VALUE._col0 (type: int), VALUE._col1 (type: string), VALUE._col2 (type: string), VALUE._col3 (type: int), VALUE._col4 (type: char(15)), VALUE._col5 (type: decimal(15,2)), VALUE._col6 (type: char(10)), 'foo bar' (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  GlobalTableId: 1
                  directory: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000
                  NumFilesPerFileSink: 1
                  Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
                  Stats Publishing Key Prefix: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000/
                  table:
                      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      properties:
                        bucket_count 8
                        bucket_field_name c_custkey
                        columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
                        columns.comments 
                        columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
                        file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                        file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                        location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
                        name tpch.customer_acid
                        numFiles 12
                        numRows 0
                        rawDataSize 0
                        serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
                        serialization.format 1
                        serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                        totalSize 8700894
                        transactional true
                        transient_lastDdlTime 1485548417
                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      name: tpch.customer_acid
                  TotalFiles: 1
                  GatherStats: true
                  MultiFileSpray: false

  Stage: Stage-2
    Dependency Collection

  Stage: Stage-0
    Move Operator
      tables:
          replace: false
          source: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              properties:
                bucket_count 8
                bucket_field_name c_custkey
                columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
                columns.comments 
                columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
                file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
                name tpch.customer_acid
                numFiles 12
                numRows 0
                rawDataSize 0
                serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
                serialization.format 1
                serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                totalSize 8700894
                transactional true
                transient_lastDdlTime 1485548417
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: tpch.customer_acid

  Stage: Stage-3
    Stats-Aggr Operator
      Stats Aggregation Key Prefix: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000/

Time taken: 0.422 seconds, Fetched: 189 row(s)

Attachments

Issue Links

relates to

HIVE-15844 Make ReduceSinkOperator independent of Acid

Resolved

Activity

People

Assignee:: Eugene Koifman

Reporter:: Kavan Suresh

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 30/Jan/17 21:01

Updated:: 18/Aug/17 11:03

Resolved:: 21/Feb/17 18:14