Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23688

Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      start and length are empty arrays in MapColumnVector.values(BytesColumnVector) when values in map contain null

      reproduce in master branch:

      set hive.vectorized.execution.enabled=true; 
      
      CREATE TABLE parquet_map_type (id int,stringMap map<string, string>) 
      stored as parquet; 
      
      insert overwrite table parquet_map_typeSELECT 1, MAP('k1'null'k2''bar'); 
      
      select id, stringMap['k1'] from parquet_map_type group by 1,2;
      

      query explain:

      Stage-0
        Fetch Operator
          limit:-1
          Stage-1
            Reducer 2 vectorized
            File Output Operator [FS_12]
              Group By Operator [GBY_11] (rows=5 width=2)
                Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
              <-Map 1 [SIMPLE_EDGE] vectorized
                SHUFFLE [RS_10]
                  PartitionCols:_col0, _col1
                  Group By Operator [GBY_9] (rows=10 width=2)
                    Output:["_col0","_col1"],keys:_col0, _col1
                    Select Operator [SEL_8] (rows=10 width=2)
                      Output:["_col0","_col1"]
                      TableScan [TS_0] (rows=10 width=2)
                        temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
      

      runtime error:

      Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1592040015150_0001_3_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row 
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
      	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
      	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
      	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
      	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row 
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:403)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
      	... 16 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row 
      	at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:970)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
      	... 19 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating id
      	at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149)
      	at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:918)
      	at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842)
      	... 20 more
      Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
      	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:506)
      	at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:83)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
      	... 24 more
      

      Attachments

        1. HIVE-23688.patch
          10 kB
          Yizhen Fan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            abstractdog László Bodor Assign to me
            sparksfyz Yizhen Fan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 4h 50m
              4h 50m

              Slack

                Issue deployment