Hive
  1. Hive
  2. HIVE-4160 Vectorized Query Execution in Hive
  3. HIVE-4989

Consolidate and simplify vectorization code and test generation

    Details

      Description

      The current code generation is unwieldy to use and prone to errors. This change consolidates all the code and test generation into a single location, and removes the need to manually place files which can lead to missing or incomplete code or tests.

      1. HIVE-4989-vectorization.patch
        367 kB
        Tony Murphy
      2. HIVE-4989.revert.patch
        354 kB
        Jitendra Nath Pandey
      3. HIVE-4989.2-vectorization.patch
        756 kB
        Jitendra Nath Pandey
      4. HIVE-4989.1-vectorization.patch
        741 kB
        Tony Murphy

        Activity

        Hide
        Ashutosh Chauhan added a comment -

        Committed to trunk. Thanks, Tony!
        Jitendra Nath Pandey Lets quickly follow up on HIVE-5226

        Show
        Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Tony! Jitendra Nath Pandey Lets quickly follow up on HIVE-5226
        Hide
        Jitendra Nath Pandey added a comment -

        The tests fail even without the patch. I have filed HIVE-5226 to address this issue. The issue is unrelated to this patch.

        Show
        Jitendra Nath Pandey added a comment - The tests fail even without the patch. I have filed HIVE-5226 to address this issue. The issue is unrelated to this patch.
        Hide
        Ashutosh Chauhan added a comment -

        I randomly ran few tests from the above list (namely orc_dictionary_threshold.q,orc_create.q,orc_empty_strings.q,orc_ends_with_nulls.q). All of them failed with follow trace:

           [junit] Failed query: orc_dictionary_threshold.q
            [junit] java.lang.AssertionError
            [junit] 	at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:260)
            [junit] 	at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:248)
            [junit] 	at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:242)
            [junit] 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:170)
            [junit] 	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:439)
            [junit] 	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:519)
            [junit] 	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:495)
            [junit] 	at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
            [junit] 	at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1487)
            [junit] 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
            [junit] 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
            [junit] 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
            [junit] 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
            [junit] 	at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:815)
            [junit] 	at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:135)
            [junit] 	at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_dictionary_threshold(TestCliDriver.java:111)
            [junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            [junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
            [junit] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
            [junit] 	at java.lang.reflect.Method.invoke(Method.java:597)
            [junit] 	at junit.framework.TestCase.runTest(TestCase.java:154)
            [junit] 	at junit.framework.TestCase.runBare(TestCase.java:127)
            [junit] 	at junit.framework.TestResult$1.protect(TestResult.java:106)
            [junit] 	at junit.framework.TestResult.runProtected(TestResult.java:124)
            [junit] 	at junit.framework.TestResult.run(TestResult.java:109)
            [junit] 	at junit.framework.TestCase.run(TestCase.java:118)
            [junit] 	at junit.framework.TestSuite.runTest(TestSuite.java:208)
            [junit] 	at junit.framework.TestSuite.run(TestSuite.java:203)
            [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
            [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
            [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)
        
        

        I didn't run them on branch without patch. If they are failing even on branch, we can take those up in separate jira.

        Show
        Ashutosh Chauhan added a comment - I randomly ran few tests from the above list (namely orc_dictionary_threshold.q,orc_create.q,orc_empty_strings.q,orc_ends_with_nulls.q). All of them failed with follow trace: [junit] Failed query: orc_dictionary_threshold.q [junit] java.lang.AssertionError [junit] at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:260) [junit] at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:248) [junit] at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:242) [junit] at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:170) [junit] at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:439) [junit] at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:519) [junit] at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:495) [junit] at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) [junit] at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1487) [junit] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271) [junit] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) [junit] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) [junit] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) [junit] at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:815) [junit] at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:135) [junit] at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_dictionary_threshold(TestCliDriver.java:111) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at junit.framework.TestCase.runTest(TestCase.java:154) [junit] at junit.framework.TestCase.runBare(TestCase.java:127) [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.framework.TestResult.run(TestResult.java:109) [junit] at junit.framework.TestCase.run(TestCase.java:118) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) [junit] at junit.framework.TestSuite.run(TestSuite.java:203) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906) I didn't run them on branch without patch. If they are failing even on branch, we can take those up in separate jira.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12601466/HIVE-4989.2-vectorization.patch

        ERROR: -1 due to 23 failed/errored test(s), 3716 tests executed
        Failed tests:

        org.apache.hcatalog.pig.TestE2EScenarios.testReadOrcAndRCFromPig
        org.apache.hcatalog.pig.TestOrcHCatLoaderComplexSchema.testTupleInBagInTupleInBag
        org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testEmptyFile
        org.apache.hcatalog.pig.TestOrcHCatStorer.testStoreBasicTable
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_serde
        org.apache.hcatalog.pig.TestOrcHCatStorer.testStorePartitionedTable
        org.apache.hcatalog.pig.TestOrcHCatLoader.testReadPartitionedBasic
        org.apache.hcatalog.pig.TestOrcHCatLoader.testReadDataBasic
        org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testDefaultTypes
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ends_with_nulls
        org.apache.hcatalog.pig.TestOrcHCatLoaderComplexSchema.testMapWithComplexData
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_empty_strings
        org.apache.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTable
        org.apache.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask
        org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testMROutput
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_create
        org.apache.hcatalog.pig.TestOrcHCatStorer.testStoreTableMulti
        org.apache.hcatalog.pig.TestOrcHCatLoader.testProjectionsBasic
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4
        org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testInOutFormat
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_dictionary_threshold
        org.apache.hcatalog.pig.TestOrcHCatLoaderComplexSchema.testSyntheticComplexSchema
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json
        

        Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/617/testReport
        Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/617/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests failed with: TestsFailedException: 23 tests failed
        

        This message is automatically generated.

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12601466/HIVE-4989.2-vectorization.patch ERROR: -1 due to 23 failed/errored test(s), 3716 tests executed Failed tests: org.apache.hcatalog.pig.TestE2EScenarios.testReadOrcAndRCFromPig org.apache.hcatalog.pig.TestOrcHCatLoaderComplexSchema.testTupleInBagInTupleInBag org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testEmptyFile org.apache.hcatalog.pig.TestOrcHCatStorer.testStoreBasicTable org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_serde org.apache.hcatalog.pig.TestOrcHCatStorer.testStorePartitionedTable org.apache.hcatalog.pig.TestOrcHCatLoader.testReadPartitionedBasic org.apache.hcatalog.pig.TestOrcHCatLoader.testReadDataBasic org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testDefaultTypes org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ends_with_nulls org.apache.hcatalog.pig.TestOrcHCatLoaderComplexSchema.testMapWithComplexData org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_empty_strings org.apache.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTable org.apache.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testMROutput org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_create org.apache.hcatalog.pig.TestOrcHCatStorer.testStoreTableMulti org.apache.hcatalog.pig.TestOrcHCatLoader.testProjectionsBasic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4 org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testInOutFormat org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_dictionary_threshold org.apache.hcatalog.pig.TestOrcHCatLoaderComplexSchema.testSyntheticComplexSchema org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/617/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/617/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 23 tests failed This message is automatically generated.
        Hide
        Jitendra Nath Pandey added a comment -

        Re-uploading Tony's original patch rebased against the latest state of the vectorization branch.

        Show
        Jitendra Nath Pandey added a comment - Re-uploading Tony's original patch rebased against the latest state of the vectorization branch.
        Hide
        Ashutosh Chauhan added a comment -

        Committed revert patch to branch. Thanks, Jitendra! Jitendra, can you also review Tony's latest patch.

        Show
        Ashutosh Chauhan added a comment - Committed revert patch to branch. Thanks, Jitendra! Jitendra, can you also review Tony's latest patch.
        Hide
        Tony Murphy added a comment -

        Not sure what happened with the last patch, but the formatting looks bad. i've regenerated the patch, manually inspected it, and successfully applied it after the revert patch.

        Show
        Tony Murphy added a comment - Not sure what happened with the last patch, but the formatting looks bad. i've regenerated the patch, manually inspected it, and successfully applied it after the revert patch.
        Hide
        Jitendra Nath Pandey added a comment -

        The revert patch is attached.

        Show
        Jitendra Nath Pandey added a comment - The revert patch is attached.
        Hide
        Jitendra Nath Pandey added a comment -

        The patch seems to have some format issue, that it doesn't apply all the changes. I think we should revert it for now. I am uploading a patch that reverts this change. We can get it committed again when format is fixed.

        Show
        Jitendra Nath Pandey added a comment - The patch seems to have some format issue, that it doesn't apply all the changes. I think we should revert it for now. I am uploading a patch that reverts this change. We can get it committed again when format is fixed.
        Hide
        Teddy Choi added a comment -

        Ashutosh Chauhan, I could not compile the latest code on vectorization branch. I have double checked it. It seems like there was an error in applying the patch. Please check it again.

        Show
        Teddy Choi added a comment - Ashutosh Chauhan , I could not compile the latest code on vectorization branch. I have double checked it. It seems like there was an error in applying the patch. Please check it again. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java : the old location. ql/src/gen/vectorization/org/apache/hadoop/hive/ql/exec/vector/gen/CodeGen.java : expected location on https://reviews.apache.org/r/13274/diff/ ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java : actual location in the vectorization branch on https://github.com/apache/hive/commit/e6f59f5d0711c52badc89868e4178a1b2ef54e53
        Hide
        Ashutosh Chauhan added a comment -

        Committed to branch. Thanks, Tony!

        Show
        Ashutosh Chauhan added a comment - Committed to branch. Thanks, Tony!
        Show
        Tony Murphy added a comment - https://reviews.apache.org/r/13274/
        Hide
        Tony Murphy added a comment -

        This patch should be good to go. HIVE-4971 covers the testVectorUDFUnixTimeStampLong failure.

        Show
        Tony Murphy added a comment - This patch should be good to go. HIVE-4971 covers the testVectorUDFUnixTimeStampLong failure.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12595673/HIVE-4989-vectorization.patch

        ERROR: -1 due to 1 failed/errored test(s), 3591 tests executed
        Failed tests:

        org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.testVectorUDFUnixTimeStampLong
        

        Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/291/testReport
        Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/291/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests failed with: TestsFailedException: 1 tests failed
        

        This message is automatically generated.

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595673/HIVE-4989-vectorization.patch ERROR: -1 due to 1 failed/errored test(s), 3591 tests executed Failed tests: org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.testVectorUDFUnixTimeStampLong Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/291/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/291/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed This message is automatically generated.
        Hide
        Tony Murphy added a comment -

        New usage:
        From ql\src\gen\vectorization:
        javac org\apache\hadoop\hive\ql\exec\vector\gen*.java
        java org.apache.hadoop.hive.ql.exec.vector.gen.CodeGen

        Additionally, I've fixed some incomplete\broken test generations.

        Show
        Tony Murphy added a comment - New usage: From ql\src\gen\vectorization: javac org\apache\hadoop\hive\ql\exec\vector\gen*.java java org.apache.hadoop.hive.ql.exec.vector.gen.CodeGen Additionally, I've fixed some incomplete\broken test generations.

          People

          • Assignee:
            Tony Murphy
            Reporter:
            Tony Murphy
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development