Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23022

Arrow deserializer should ensure size of hive vector equal to arrow vector

    XMLWordPrintableJSON

    Details

      Description

      Arrow deserializer - org.apache.hadoop.hive.ql.io.arrow.Deserializer in some cases does not set the size of hive vector correctly. Size of hive vector should be set at least equal to arrow vector to be able to read (accommodate) it fully.

      Following exception can be seen when we try to read (using LlapArrowRowInputFormat ) some table which contains complex types (struct nested in array to be specific) and number of rows in table is more than default (1024) batch/vector size.

          Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
        at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440)
        at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143)
        at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394)
        at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137)
        at org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122)
        at org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284)
        at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75)
        ... 23 more
      

        Attachments

        1. HIVE-23022.01.patch
          7 kB
          Shubham Chaurasia

          Issue Links

            Activity

              People

              • Assignee:
                ShubhamChaurasia Shubham Chaurasia
                Reporter:
                ShubhamChaurasia Shubham Chaurasia
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h