Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6382

PATCHED_BLOB encoding in ORC will corrupt data in some cases

    Details

      Description

      In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of long that stores gap (g) between the values that are patched and the patch value (p). The maximum distance of gap can be 511 that require 8 bits to encode. And patch values can take more than 56 bits. When patch values take more than 56 bits, p + g will become > 64 bits which cannot be packed to a long. This will result in data corruption under the case where patch values are > 56 bits.

      Stack trace will look like:

      Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593)
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541)
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:746)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:744)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1320)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1849)
      at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:75)
      at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:638)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
      at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
      at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:249)
      ... 7 more
      

        Attachments

        1. HIVE-6382.6.patch
          61 kB
          Prasanth Jayachandran
        2. HIVE-6382.5.patch
          33 kB
          Prasanth Jayachandran
        3. HIVE-6382.4.patch
          33 kB
          Prasanth Jayachandran
        4. HIVE-6382.3.patch
          33 kB
          Prasanth Jayachandran
        5. HIVE-6382.2.patch
          13 kB
          Prasanth Jayachandran
        6. HIVE-6382.1.patch
          12 kB
          Prasanth Jayachandran

          Issue Links

            Activity

              People

              • Assignee:
                prasanth_j Prasanth Jayachandran
                Reporter:
                prasanth_j Prasanth Jayachandran
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: