Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9235

Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Vectorization
    • Labels:
      None

      Description

      Title was: Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR

      Support for doing vector column assign is missing for some data types.

      1. HIVE-9235.01.patch
        3 kB
        Matt McCline
      2. HIVE-9235.02.patch
        3 kB
        Matt McCline

        Issue Links

          Activity

          Hide
          brocknoland Brock Noland added a comment -

          Can you describe the issue you see?

          Show
          brocknoland Brock Noland added a comment - Can you describe the issue you see?
          Hide
          mmccline Matt McCline added a comment -

          First issue (vectorization of Parquet):
          Missing cases in VectorColumnAssignFactory.java's public static VectorColumnAssign[] buildAssigners(VectorizedRowBatch outputBatch,
          Writable[] writables) for HiveCharWritable, HiveVarcharWritable, DateWritable, HiveDecimalWriter.

          Example of exception caused:

          Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
          	at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:136)
          	at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:49)
          	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
          	... 21 more
          Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
          	at org.apache.hadoop.hive.ql.exec.vector.VectorColumnAssignFactory.buildAssigners(VectorColumnAssignFactory.java:528)
          	at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:127)
          	... 23 more
          

          Added code to fix that.

          Then, I copied a half dozen q vectorized tests using ORC tables and tried converted them to use PARQUET, but encountered another issue in non-vectorized mode. I was trying to establish base query outputs that I could use to verify the vectorized query output. This indicated a basic non-vectorized case of CHAR data type usage wasn't working for PARQUET.

          Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
          	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
          	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
          	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
          	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
          	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
          	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
          	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
          	... 10 more
          Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
          	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
          	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
          	at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
          	at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
          	at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
          	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
          	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
          	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
          	... 16 more
          

          I filed this problem under HIVE-9371: Execution error for Parquet table and GROUP BY involving CHAR data type

          At that point we concluded we should temporarily disable vectorization of PARQUET since there is only one test that doesn't provide complete coverage of data types.

          FYI: Gunther Hagleitner

          Show
          mmccline Matt McCline added a comment - First issue (vectorization of Parquet): Missing cases in VectorColumnAssignFactory.java's public static VectorColumnAssign[] buildAssigners(VectorizedRowBatch outputBatch, Writable[] writables) for HiveCharWritable, HiveVarcharWritable, DateWritable, HiveDecimalWriter. Example of exception caused: Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:136) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable at org.apache.hadoop.hive.ql.exec.vector.VectorColumnAssignFactory.buildAssigners(VectorColumnAssignFactory.java:528) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:127) ... 23 more Added code to fix that. Then, I copied a half dozen q vectorized tests using ORC tables and tried converted them to use PARQUET, but encountered another issue in non-vectorized mode. I was trying to establish base query outputs that I could use to verify the vectorized query output. This indicated a basic non-vectorized case of CHAR data type usage wasn't working for PARQUET. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 10 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809) ... 16 more I filed this problem under HIVE-9371 : Execution error for Parquet table and GROUP BY involving CHAR data type At that point we concluded we should temporarily disable vectorization of PARQUET since there is only one test that doesn't provide complete coverage of data types. FYI: Gunther Hagleitner
          Hide
          brocknoland Brock Noland added a comment -

          Thank you Matt, linking to HIVE-8128 which is related.

          Show
          brocknoland Brock Noland added a comment - Thank you Matt, linking to HIVE-8128 which is related.
          Hide
          hiveqa Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12692387/HIVE-9235.02.patch

          ERROR: -1 due to 2 failed/errored test(s), 7311 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map
          org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2372/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2372/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2372/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 2 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12692387 - PreCommit-HIVE-TRUNK-Build

          Show
          hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12692387/HIVE-9235.02.patch ERROR: -1 due to 2 failed/errored test(s), 7311 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2372/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2372/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2372/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12692387 - PreCommit-HIVE-TRUNK-Build
          Hide
          mmccline Matt McCline added a comment -

          Test failures are unrelated.

          Show
          mmccline Matt McCline added a comment - Test failures are unrelated.
          Hide
          gopalv Gopal V added a comment -

          I see only 1 test being affected overall.

          +1 - LGTM.

          Show
          gopalv Gopal V added a comment - I see only 1 test being affected overall. +1 - LGTM.
          Hide
          vikram.dixit Vikram Dixit K added a comment -

          +1 for branch 1.0 as well.

          Show
          vikram.dixit Vikram Dixit K added a comment - +1 for branch 1.0 as well.
          Hide
          vikram.dixit Vikram Dixit K added a comment -

          Committed to trunk and branches.

          Show
          vikram.dixit Vikram Dixit K added a comment - Committed to trunk and branches.

            People

            • Assignee:
              mmccline Matt McCline
              Reporter:
              mmccline Matt McCline
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development