Hive
  1. Hive
  2. HIVE-6382

PATCHED_BLOB encoding in ORC will corrupt data in some cases

    Details

      Description

      In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of long that stores gap (g) between the values that are patched and the patch value (p). The maximum distance of gap can be 511 that require 8 bits to encode. And patch values can take more than 56 bits. When patch values take more than 56 bits, p + g will become > 64 bits which cannot be packed to a long. This will result in data corruption under the case where patch values are > 56 bits.

      Stack trace will look like:

      Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593)
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541)
      at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:746)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:744)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1320)
      at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1849)
      at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:75)
      at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:638)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
      at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
      at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501)
      at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:249)
      ... 7 more
      
      1. HIVE-6382.1.patch
        12 kB
        Prasanth Jayachandran
      2. HIVE-6382.2.patch
        13 kB
        Prasanth Jayachandran
      3. HIVE-6382.3.patch
        33 kB
        Prasanth Jayachandran
      4. HIVE-6382.4.patch
        33 kB
        Prasanth Jayachandran
      5. HIVE-6382.5.patch
        33 kB
        Prasanth Jayachandran
      6. HIVE-6382.6.patch
        61 kB
        Prasanth Jayachandran

        Issue Links

          Activity

          Hide
          Prasanth Jayachandran added a comment -

          Initial version of patch.

          Show
          Prasanth Jayachandran added a comment - Initial version of patch.
          Hide
          Prasanth Jayachandran added a comment -

          HIVE-6347 adds hive configuration to ORC reader interface. It is required for this patch to determine whether to skip corrupt data or throw exception.

          Show
          Prasanth Jayachandran added a comment - HIVE-6347 adds hive configuration to ORC reader interface. It is required for this patch to determine whether to skip corrupt data or throw exception.
          Hide
          Prasanth Jayachandran added a comment -

          Still have to verify if this patch fixes HIVE-6369.

          Show
          Prasanth Jayachandran added a comment - Still have to verify if this patch fixes HIVE-6369 .
          Hide
          Prasanth Jayachandran added a comment -

          Making it patch available for HIVE QA to run precommit tests.

          Show
          Prasanth Jayachandran added a comment - Making it patch available for HIVE QA to run precommit tests.
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12627290/HIVE-6382.1.patch

          SUCCESS: +1 5039 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1220/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1220/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12627290

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12627290/HIVE-6382.1.patch SUCCESS: +1 5039 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1220/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1220/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12627290
          Hide
          Prasanth Jayachandran added a comment -

          In this patch, added a safeguard length of 1 to gap and patch list array to avoid off-by-one errors or throwing exception.

          Show
          Prasanth Jayachandran added a comment - In this patch, added a safeguard length of 1 to gap and patch list array to avoid off-by-one errors or throwing exception.
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12627461/HIVE-6382.2.patch

          SUCCESS: +1 5039 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1226/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1226/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12627461

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12627461/HIVE-6382.2.patch SUCCESS: +1 5039 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1226/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1226/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12627461
          Hide
          Prasanth Jayachandran added a comment -

          Added configuration to orc readers to support skipping of corrupted data.

          Show
          Prasanth Jayachandran added a comment - Added configuration to orc readers to support skipping of corrupted data.
          Hide
          Prasanth Jayachandran added a comment -

          Addressed Sergey Shelukhin's review comments.

          Show
          Prasanth Jayachandran added a comment - Addressed Sergey Shelukhin 's review comments.
          Hide
          Sergey Shelukhin added a comment -

          wrt 64 bits that's what I meant by relying on internals of that func... maybe it can check and throw assertion error if it is >56 but not 64... anyway that's nit.
          Otherwise +1

          Show
          Sergey Shelukhin added a comment - wrt 64 bits that's what I meant by relying on internals of that func... maybe it can check and throw assertion error if it is >56 but not 64... anyway that's nit. Otherwise +1
          Hide
          Prasanth Jayachandran added a comment -

          Gotcha! Fixed it in this patch.

          Show
          Prasanth Jayachandran added a comment - Gotcha! Fixed it in this patch.
          Hide
          Prasanth Jayachandran added a comment -

          Earlier patch was not compiling clean as conf object of some orc reader interface was missing. Fixed it in this patch.

          Show
          Prasanth Jayachandran added a comment - Earlier patch was not compiling clean as conf object of some orc reader interface was missing. Fixed it in this patch.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12629962/HIVE-6382.6.patch

          ERROR: -1 due to 1 failed/errored test(s), 5172 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1425/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1425/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12629962

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12629962/HIVE-6382.6.patch ERROR: -1 due to 1 failed/errored test(s), 5172 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1425/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1425/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12629962
          Hide
          Prasanth Jayachandran added a comment -

          The test failure does not seem to be related to this patch.

          Show
          Prasanth Jayachandran added a comment - The test failure does not seem to be related to this patch.
          Hide
          Gunther Hagleitner added a comment -

          Committed to trunk. Thanks Prasanth Jayachandran and Sergey Shelukhin!

          Show
          Gunther Hagleitner added a comment - Committed to trunk. Thanks Prasanth Jayachandran and Sergey Shelukhin !
          Hide
          Lefty Leverenz added a comment -

          For the record: this adds the configuration parameter hive.exec.orc.skip.corrupt.data to HiveConf.java and hive-default.xml.template.

          Show
          Lefty Leverenz added a comment - For the record: this adds the configuration parameter hive.exec.orc.skip.corrupt.data to HiveConf.java and hive-default.xml.template.
          Hide
          Lefty Leverenz added a comment -

          hive.exec.orc.skip.corrupt.data is documented in the wiki:

          Show
          Lefty Leverenz added a comment - hive.exec.orc.skip.corrupt.data is documented in the wiki: Configuration Properties – hive.exec.orc.skip.corrupt.data

            People

            • Assignee:
              Prasanth Jayachandran
              Reporter:
              Prasanth Jayachandran
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development