Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23022

Arrow deserializer should ensure size of hive vector equal to arrow vector

    XMLWordPrintableJSON

Details

    Description

      Arrow deserializer - org.apache.hadoop.hive.ql.io.arrow.Deserializer in some cases does not set the size of hive vector correctly. Size of hive vector should be set at least equal to arrow vector to be able to read (accommodate) it fully.

      Following exception can be seen when we try to read (using LlapArrowRowInputFormat ) some table which contains complex types (struct nested in array to be specific) and number of rows in table is more than default (1024) batch/vector size.

          Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
        at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440)
        at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143)
        at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394)
        at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137)
        at org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122)
        at org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284)
        at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75)
        ... 23 more
      

      Attachments

        1. HIVE-23022.01.patch
          7 kB
          Shubham Chaurasia

        Issue Links

          Activity

            People

              ShubhamChaurasia Shubham Chaurasia
              ShubhamChaurasia Shubham Chaurasia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h