Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21966

Llap external client - Arrow Serializer throws ArrayIndexOutOfBoundsException in some cases

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When we submit query through llap-ext-client, arrow serializer throws ArrayIndexOutOfBoundsException when 1), 2) and 3) below are satisfied.

      1) hive.vectorized.execution.filesink.arrow.native.enabled=true to take arrow serializer code path.
      2) Query contains a filter or limit clause which enforces VectorizedRowBatch#selectedInUse=true
      3) Projection involves a column of type MultiValuedColumnVector.

      Sample stacktrace:

      Caused by: java.lang.ArrayIndexOutOfBoundsException: 150
      	at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:679)
      	at org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:518)
      	at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:276)
      	at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:342)
      	at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:282)
      	at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:365)
      	at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:279)
      	at org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:199)
      	at org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135)
      	... 30 more
      

      It can be reproduced by:

      from beeline:

      CREATE TABLE complex_tbl(c1 array<struct<f1:string,f2:string>>) STORED AS ORC;
      INSERT INTO complex_tbl SELECT ARRAY(NAMED_STRUCT('f1','v11', 'f2','v21'), NAMED_STRUCT('f1','v21', 'f2','v22'));
      

      and when we fire query: select * from complex_tbl limit 1 through llap-ext-client.

      Attachments

        1. HIVE-21966.2.patch
          14 kB
          Shubham Chaurasia
        2. HIVE-21966.1.patch
          14 kB
          Shubham Chaurasia

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ShubhamChaurasia Shubham Chaurasia Assign to me
            ShubhamChaurasia Shubham Chaurasia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1.5h
              1.5h

              Slack

                Issue deployment