Hive
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-7702

Start running .q file tests on spark [Spark Branch]

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: spark-branch
    • Component/s: Spark
    • Labels:
      None

      Description

      Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project.

      A good starting point might be the udf*, varchar*, or alter* tests:

      https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive

      To generate the output file for test XXX.q, you'd do:

      mvn clean install -DskipTests -Phadoop-2
      cd itests
      mvn clean install -DskipTests -Phadoop-2
      cd qtest-spark
      mvn test -Dtest= TestSparkCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2
      

      which would generate XXX.q.out which we can check-in to source control as a "golden file".

      Multiple tests can be run at a give time as so:

      mvn test -Dtest= TestSparkCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2
      
      1. HIVE-7702-spark.patch
        109 kB
        Chinna Rao Lalam
      2. HIVE-7702.1-spark.patch
        104 kB
        Chinna Rao Lalam

        Issue Links

          Activity

          Hide
          Brock Noland added a comment -

          FYI -Dqfile= is not usable until HIVE-7739 is resolved. The testconfigration.properties file can be used.

          Show
          Brock Noland added a comment - FYI -Dqfile= is not usable until HIVE-7739 is resolved. The testconfigration.properties file can be used.
          Hide
          Brock Noland added a comment -

          After looking at this more, I think we should start with the 100 or so test that tez executes:

          https://github.com/apache/hive/blob/spark/itests/src/test/resources/testconfiguration.properties#L49

          Show
          Brock Noland added a comment - After looking at this more, I think we should start with the 100 or so test that tez executes: https://github.com/apache/hive/blob/spark/itests/src/test/resources/testconfiguration.properties#L49
          Hide
          Brock Noland added a comment -

          Let's try and add the following tests in this JIRA:

            enforce_order.q,\
            filter_join_breaktask.q,\
            filter_join_breaktask2.q,\
            groupby1.q,\
            groupby2.q,\
            groupby3.q,\
            having.q,\
            insert1.q,\
            insert_into1.q,\
            insert_into2.q,\
          
          Show
          Brock Noland added a comment - Let's try and add the following tests in this JIRA: enforce_order.q,\ filter_join_breaktask.q,\ filter_join_breaktask2.q,\ groupby1.q,\ groupby2.q,\ groupby3.q,\ having.q,\ insert1.q,\ insert_into1.q,\ insert_into2.q,\
          Hide
          Chinna Rao Lalam added a comment -

          Join related query files will handle in this jira HIVE-7816

          filter_join_breaktask.q,\
          filter_join_breaktask2.q

          Show
          Chinna Rao Lalam added a comment - Join related query files will handle in this jira HIVE-7816 filter_join_breaktask.q,\ filter_join_breaktask2.q
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12663390/HIVE-7702-spark.patch

          ERROR: -1 due to 4 failed/errored test(s), 5984 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
          org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into2
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/console
          Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-75/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 4 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12663390

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663390/HIVE-7702-spark.patch ERROR: -1 due to 4 failed/errored test(s), 5984 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into2 Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-75/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12663390
          Hide
          Brock Noland added a comment -

          Nice work Chinna Rao Lalam!! Looks like insert_into2 fails. Looking at the DIFF I see a bunch of odd characters at the bottom. Thank you!!

          Show
          Brock Noland added a comment - Nice work Chinna Rao Lalam !! Looks like insert_into2 fails. Looking at the DIFF I see a bunch of odd characters at the bottom. Thank you!!
          Hide
          Chinna Rao Lalam added a comment -

          insert_into2.q.out is corrected..

          Show
          Chinna Rao Lalam added a comment - insert_into2.q.out is corrected..
          Hide
          Brock Noland added a comment -

          Hi Chinna,
          Thank you! Using git and the following command I was able to compare the results against MR

          git status | awk '/new file:/ {print $NF}' | xargs -I {} sh -c 'diff {} $(echo {} | perl -pe "s@/spark@@g")'
          

          Do you know if the differences are due to sorting order or correctness?

          Show
          Brock Noland added a comment - Hi Chinna, Thank you! Using git and the following command I was able to compare the results against MR git status | awk '/new file:/ {print $NF}' | xargs -I {} sh -c 'diff {} $(echo {} | perl -pe "s@/spark@@g")' Do you know if the differences are due to sorting order or correctness?
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12663458/HIVE-7702.1-spark.patch

          ERROR: -1 due to 4 failed/errored test(s), 5985 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
          org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/76/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/76/console
          Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-76/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 4 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12663458

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663458/HIVE-7702.1-spark.patch ERROR: -1 due to 4 failed/errored test(s), 5985 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/76/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/76/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-76/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12663458
          Hide
          Chinna Rao Lalam added a comment -

          Hi Brock Noland,

          Compare against MR most of the times differences are due to sorting order only.

          Show
          Chinna Rao Lalam added a comment - Hi Brock Noland , Compare against MR most of the times differences are due to sorting order only.
          Hide
          Brock Noland added a comment -

          Thank you Chinna!! I agree, I used the script below and all of the result differences are due to sorting. Thank you!

          +1

          #!/bin/bash
          while read file
          do
            mr=$(echo $file | perl -pe "s@/spark@@g")
            spark=$file
            mrSorted=/tmp/$(basename $mr)-mr.sorted
            sparkSorted=/tmp/$(basename $spark)-spark.sorted
            sort $mr > $mrSorted
            sort $spark > $sparkSorted
            diff -y -W 150 $mrSorted $sparkSorted
          done
          
          Show
          Brock Noland added a comment - Thank you Chinna!! I agree, I used the script below and all of the result differences are due to sorting. Thank you! +1 #!/bin/bash while read file do mr=$(echo $file | perl -pe "s@/spark@@g") spark=$file mrSorted=/tmp/$(basename $mr)-mr.sorted sparkSorted=/tmp/$(basename $spark)-spark.sorted sort $mr > $mrSorted sort $spark > $sparkSorted diff -y -W 150 $mrSorted $sparkSorted done
          Hide
          Brock Noland added a comment -

          Thank you so much Chinna! I have committed this to spark!

          Show
          Brock Noland added a comment - Thank you so much Chinna! I have committed this to spark!

            People

            • Assignee:
              Chinna Rao Lalam
              Reporter:
              Brock Noland
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development