Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6382

PATCHED_BLOB encoding in ORC will corrupt data in some cases

    XMLWordPrintableJSON

Details

    Description

      In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of long that stores gap (g) between the values that are patched and the patch value (p). The maximum distance of gap can be 511 that require 8 bits to encode. And patch values can take more than 56 bits. When patch values take more than 56 bits, p + g will become > 64 bits which cannot be packed to a long. This will result in data corruption under the case where patch values are > 56 bits.

      Stack trace will look like:

      Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593)
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541)
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:746)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:744)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1320)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1849)
      at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:75)
      at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:638)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
      at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
      at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:249)
      ... 7 more
      

      Attachments

        1. HIVE-6382.6.patch
          61 kB
          Prasanth Jayachandran
        2. HIVE-6382.5.patch
          33 kB
          Prasanth Jayachandran
        3. HIVE-6382.4.patch
          33 kB
          Prasanth Jayachandran
        4. HIVE-6382.3.patch
          33 kB
          Prasanth Jayachandran
        5. HIVE-6382.2.patch
          13 kB
          Prasanth Jayachandran
        6. HIVE-6382.1.patch
          12 kB
          Prasanth Jayachandran

        Issue Links

          Activity

            People

              prasanth_j Prasanth Jayachandran
              prasanth_j Prasanth Jayachandran
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: