Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6382

PATCHED_BLOB encoding in ORC will corrupt data in some cases

Log workAgile BoardRank to TopRank to BottomVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of long that stores gap (g) between the values that are patched and the patch value (p). The maximum distance of gap can be 511 that require 8 bits to encode. And patch values can take more than 56 bits. When patch values take more than 56 bits, p + g will become > 64 bits which cannot be packed to a long. This will result in data corruption under the case where patch values are > 56 bits.

      Stack trace will look like:

      Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593)
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541)
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:746)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:744)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1320)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1849)
      at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:75)
      at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:638)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
      at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
      at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:249)
      ... 7 more
      

      Attachments

        1. HIVE-6382.6.patch
          61 kB
          Prasanth Jayachandran
        2. HIVE-6382.5.patch
          33 kB
          Prasanth Jayachandran
        3. HIVE-6382.4.patch
          33 kB
          Prasanth Jayachandran
        4. HIVE-6382.3.patch
          33 kB
          Prasanth Jayachandran
        5. HIVE-6382.2.patch
          13 kB
          Prasanth Jayachandran
        6. HIVE-6382.1.patch
          12 kB
          Prasanth Jayachandran

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            prasanth_j Prasanth Jayachandran Assign to me
            prasanth_j Prasanth Jayachandran
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment