Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-1450

Eliminate broken hive test artifacts in favor of smoke-tests.

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: tests
    • Labels:
      None

      Description

      Overall: The hive tests in test-artifacts are prone to failures from missing data sets and generally need a thorough review

      When testing bigtop 0.8.0 release candidate, I found that I got some errors

      [--- /dev/fd/63  2014-09-16 10:12:54.579647323 +0000, +++ /dev/fd/62     2014-09-16 10:12:54.579647323 +0000, @@ -14,4 +14,4 @@,  INSERT OVERWRITE DIRECTORY '/tmp/count',  SELECT COUNT(1) FROM u_data,  dfs -cat /tmp/count/*, -0, +100000] err=[14/09/16 10:12:17 WARN mapred.JobConf: The variable mapred.child.ulimit is no longer used., , Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties, OK, Time taken: 2.609 seconds, OK, Time taken: 0.284 seconds, Total jobs = 1, Launching Job 1 out of 1, Number of reduce tasks determined at compile time: 1, In order to change the average load for a reducer (in bytes):,   set hive.exec.reducers.bytes.per.reducer=<number>, In order to limit the maximum number of reducers:,   set hive.exec.reducers.max=<number>, In order to set a constant number of reducers:,   set mapreduce.job.reduces=<number>, Starting Job = job_1410830363557_0019, Tracking URL = http://bigtop1.vagrant:20888/proxy/application_1410830363557_0019/, Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1410830363557_0019, Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1, 2014-09-16 10:12:38,870 Stage-1 map = 0%,  reduce = 0%, 2014-09-16 10:12:45,516 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.81 sec, 2014-09-16 10:12:53,036 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.73 sec, MapReduce Total cumulative CPU time: 1 seconds 730 msec, Ended Job = job_1410830363557_0019, Moving data to: /tmp/count, MapReduce Jobs Launched: , Job 0: Map: 1  Reduce: 1   Cumulative CPU: 1.73 sec   HDFS Read: 272 HDFS Write: 2 SUCCESS, Total MapReduce CPU Time Spent: 1 seconds 730 msec, OK, Time taken: 24.594 seconds
      
      

      I know there is a diff error in here - some kind of diff is going on , but I forgot how the actual,output,and filter are working.

      In any case, I think these tests can be simplified to just grep for a output string and check error code, or else, at least add some very clear assertions as to what failures may be.

        Issue Links

          Activity

          Hide
          rvs Roman Shaposhnik added a comment -

          Great progress! Thanks jay vyas

          Show
          rvs Roman Shaposhnik added a comment - Great progress! Thanks jay vyas
          Hide
          jayunit100 jay vyas added a comment -

          okay ! commited. for people monitoring this thread, or generally interested in testing hive using bigtops test suite,

          • you can now run tests by going to smoke-tests and following the instructions in the README for hive.
          • Adding new tests there will be great, and keep a monitor on HIVE-8553 to see if hive community will be adding their own integration tests (or working with us to maintain them),.
          Show
          jayunit100 jay vyas added a comment - okay ! commited. for people monitoring this thread, or generally interested in testing hive using bigtops test suite, you can now run tests by going to smoke-tests and following the instructions in the README for hive. Adding new tests there will be great, and keep a monitor on HIVE-8553 to see if hive community will be adding their own integration tests (or working with us to maintain them),.
          Hide
          cos Konstantin Boudnik added a comment -

          Yes, +1

          Show
          cos Konstantin Boudnik added a comment - Yes, +1
          Hide
          jayunit100 jay vyas added a comment - - edited

          Okay Konstantin Boudnik is that a +1 to commit?

          Users can still test hive w/ our smoke-tests if they want to .

          Show
          jayunit100 jay vyas added a comment - - edited Okay Konstantin Boudnik is that a +1 to commit? Users can still test hive w/ our smoke-tests if they want to .
          Hide
          cos Konstantin Boudnik added a comment -

          Am I reading this right that you're basically removing all existing hive tests from test-artifacts and their references? If so - yes, let's kill the damn thing: too complex to manage and always broken

          Show
          cos Konstantin Boudnik added a comment - Am I reading this right that you're basically removing all existing hive tests from test-artifacts and their references? If so - yes, let's kill the damn thing: too complex to manage and always broken
          Hide
          jayunit100 jay vyas added a comment -
          • heres a patch for this ! let the cleanup continue ! less code -> less bugs
          • also I've created HIVE-8553 to start curating a test inside of hive itself for us. hope to hear some feedback , if not will reach out to their mailing list.
          Show
          jayunit100 jay vyas added a comment - heres a patch for this ! let the cleanup continue ! less code -> less bugs also I've created HIVE-8553 to start curating a test inside of hive itself for us. hope to hear some feedback , if not will reach out to their mailing list.
          Hide
          jayunit100 jay vyas added a comment - - edited

          Roman Shaposhnik : Shall i submit a patch to (1) remove these smokes, and then (2) create a HIVE JIRA to see if the hive folks want to support smokes internally ?

          Show
          jayunit100 jay vyas added a comment - - edited Roman Shaposhnik : Shall i submit a patch to (1) remove these smokes, and then (2) create a HIVE JIRA to see if the hive folks want to support smokes internally ?
          Hide
          jayunit100 jay vyas added a comment -

          Aha ok ... Thanks roman that makes sense.
          So, as a step forward we can split hive cleanup into two taskS?

          1) remove the manually maintained test-artifacts from bigtop, since they are hard to maintain For now, just use smoke-tests which are easy to maintain.

          2) work within the hive community (create a hive jira) to build some easy to run smoke
          Tests which will be bundled in the hive jars after 0.15 (or whatever version is pending).

          Does that sound ok with you? It will allow us to clean up without waiting on the hive folks ...

          Show
          jayunit100 jay vyas added a comment - Aha ok ... Thanks roman that makes sense. So, as a step forward we can split hive cleanup into two taskS? 1) remove the manually maintained test-artifacts from bigtop, since they are hard to maintain For now, just use smoke-tests which are easy to maintain. 2) work within the hive community (create a hive jira) to build some easy to run smoke Tests which will be bundled in the hive jars after 0.15 (or whatever version is pending). Does that sound ok with you? It will allow us to clean up without waiting on the hive folks ...
          Hide
          rvs Roman Shaposhnik added a comment -

          I was one of the dudes who implemented the Hive tests in Bigtop. What we did was simple: we took existing tests from Hive and stuck them into Bigtop. That's it. Of course, in Hive the tests get maintained and they seem to have bitroted in Bigtop. I actually do like a suggestion of gutting them out and replacing with more representative smoke tests. But the proof is in the puddin'^H^H^H^H^Hpatch

          Anyway, another thing that would be super cool is to somehow collaborate on making Hive tests from Apache Hive projects itself be able to execute against a real cluster. That's what Pig lets us do for example. Any takers?

          Show
          rvs Roman Shaposhnik added a comment - I was one of the dudes who implemented the Hive tests in Bigtop. What we did was simple: we took existing tests from Hive and stuck them into Bigtop. That's it. Of course, in Hive the tests get maintained and they seem to have bitroted in Bigtop. I actually do like a suggestion of gutting them out and replacing with more representative smoke tests. But the proof is in the puddin'^H^H^H^H^Hpatch Anyway, another thing that would be super cool is to somehow collaborate on making Hive tests from Apache Hive projects itself be able to execute against a real cluster. That's what Pig lets us do for example. Any takers?
          Hide
          jayunit100 jay vyas added a comment -

          great, any other opinions on this ?

          Show
          jayunit100 jay vyas added a comment - great, any other opinions on this ?
          Hide
          jbx josh baer added a comment -

          That looks much simpler. I'd be can definitely start moving some tests that we run regularly over to that framework. I'll talk to my team about it tomorrow.

          Show
          jbx josh baer added a comment - That looks much simpler. I'd be can definitely start moving some tests that we run regularly over to that framework. I'll talk to my team about it tomorrow.
          Hide
          jayunit100 jay vyas added a comment - - edited

          josh baer agreed... So, Shall we just gut them entirely... and evolve the hive tests in *smoke-tests* over time ? In favor of the new and easy to run smoke-tests framework ? After all, we now have a simple hive validation in there, which can be extended organically over time.

          The new smoke-tests are here https://github.com/apache/bigtop/tree/master/bigtop-tests/smoke-tests ... I beleive ive discussed this with you before breifly, and would love to migrate entirely over to them (no jar required, no complex directory bifurcation, totally modifiable and hackable, uses gradle instead of maven, and so on),.

          Show
          jayunit100 jay vyas added a comment - - edited josh baer agreed... So, Shall we just gut them entirely... and evolve the hive tests in * smoke-tests * over time ? In favor of the new and easy to run smoke-tests framework ? After all, we now have a simple hive validation in there, which can be extended organically over time. The new smoke-tests are here https://github.com/apache/bigtop/tree/master/bigtop-tests/smoke-tests ... I beleive ive discussed this with you before breifly, and would love to migrate entirely over to them (no jar required, no complex directory bifurcation, totally modifiable and hackable, uses gradle instead of maven, and so on),.
          Hide
          jbx josh baer added a comment -

          I think most of the hive smoke tests are way too in-depth to actually be considered smoke tests-- several of them are checking the output of an explain, which differs considerably depending on the version of hive, instead of just doing the operation and checking the results. They also take wayyyy too long to run in comparison to other smoke tests.

          Why not drop most of them and stick to the validation of the most common use-cases?

          Show
          jbx josh baer added a comment - I think most of the hive smoke tests are way too in-depth to actually be considered smoke tests-- several of them are checking the output of an explain, which differs considerably depending on the version of hive, instead of just doing the operation and checking the results. They also take wayyyy too long to run in comparison to other smoke tests. Why not drop most of them and stick to the validation of the most common use-cases?
          Hide
          jayunit100 jay vyas added a comment -

          So, regarding this ... Do we want to maintain:

          • all of these tests ?
          • some of them ?
          • Or just simplify them (or possibly just use the super easy to run test in smoke-tests instead as our hive test framework, and remove these tests entirely?

          Let me know - if we want to keep these i will go through them one by one (i dont mind doing so if i need to, but it will be a lengthy exersize)...

          Show
          jayunit100 jay vyas added a comment - So, regarding this ... Do we want to maintain: all of these tests ? some of them ? Or just simplify them (or possibly just use the super easy to run test in smoke-tests instead as our hive test framework, and remove these tests entirely? Let me know - if we want to keep these i will go through them one by one (i dont mind doing so if i need to, but it will be a lengthy exersize)...
          Hide
          jayunit100 jay vyas added a comment - - edited

          i just realized there is alot more work to do w/ hive tests. lets keep this as an umbrella jira as we update them and track related issues here.

          • looks like removal of movielens data sets broke at least 1 test (that is being fixed now in BIGTOP-1392), and meanwhile,
          • debugging other tests which can fail (like index_creation is quite hard. Finally
          • the smoke-tests need to be insulated from this complexity. We dont want the plethora of precise, diff based tests in test-artifacts to make it impossible to run a simple validation that hive is working for a quick simple query
          Show
          jayunit100 jay vyas added a comment - - edited i just realized there is alot more work to do w/ hive tests. lets keep this as an umbrella jira as we update them and track related issues here. looks like removal of movielens data sets broke at least 1 test (that is being fixed now in BIGTOP-1392 ), and meanwhile, debugging other tests which can fail (like index_creation is quite hard. Finally the smoke-tests need to be insulated from this complexity. We dont want the plethora of precise, diff based tests in test-artifacts to make it impossible to run a simple validation that hive is working for a quick simple query
          Hide
          jayunit100 jay vyas added a comment - - edited

          ive added BIGTOP-1392 as a subtask, as it fixes a critical bug that is kind of hard to trace due to the readability of these tests.

          in the meantime, I think it ill be best to add a "simple" test to the hive smoke-tests which doesnt require 4 different files just to verify that a hive query ran properly.

          I'll add that right now, hile we wait on the completion of BIGTOP-1392 (which is mostly pretty simple - just that the patch doesnt seem to apply in its current state)

          Show
          jayunit100 jay vyas added a comment - - edited ive added BIGTOP-1392 as a subtask, as it fixes a critical bug that is kind of hard to trace due to the readability of these tests. in the meantime, I think it ill be best to add a "simple" test to the hive smoke-tests which doesnt require 4 different files just to verify that a hive query ran properly. I'll add that right now, hile we wait on the completion of BIGTOP-1392 (which is mostly pretty simple - just that the patch doesnt seem to apply in its current state)
          Hide
          jayunit100 jay vyas added a comment -

          and yes Roman Shaposhnik , I will try to get to this asap - maybe tonite or tomorrow !

          Show
          jayunit100 jay vyas added a comment - and yes Roman Shaposhnik , I will try to get to this asap - maybe tonite or tomorrow !
          Hide
          jayunit100 jay vyas added a comment -

          applying 1392 maybe could be done first, as its possible the hive smoke tests may have stopped working properly.

          Show
          jayunit100 jay vyas added a comment - applying 1392 maybe could be done first, as its possible the hive smoke tests may have stopped working properly.
          Hide
          rvs Roman Shaposhnik added a comment -

          jay vyas any progress would be appreciated. Personally, I'm pretty short on cycles myself – trying to get 0.8.0 out is my top priority.

          Show
          rvs Roman Shaposhnik added a comment - jay vyas any progress would be appreciated. Personally, I'm pretty short on cycles myself – trying to get 0.8.0 out is my top priority.
          Hide
          jayunit100 jay vyas added a comment -

          Roman Shaposhnik sounds like you got the hint. . So for this JIRA lets:

          • make the pattern matching explicit, rather than diff'ing a whole file, and do it using java/groovy apis that are readable. and break the diff into some readable assertions.
          • In the process, we can migrate this tests into smoke-tests , since Hive doesnt make api calls it doesnt need to be in a jar, i dont think it needs to be in test-artifacts

          I can try to get to this sometime soon if no one beats me to it (but am busy so will be great if anyone else is free to lend a hand)

          Show
          jayunit100 jay vyas added a comment - Roman Shaposhnik sounds like you got the hint. . So for this JIRA lets: make the pattern matching explicit, rather than diff'ing a whole file, and do it using java/groovy apis that are readable. and break the diff into some readable assertions. In the process, we can migrate this tests into smoke-tests , since Hive doesnt make api calls it doesnt need to be in a jar, i dont think it needs to be in test-artifacts I can try to get to this sometime soon if no one beats me to it (but am busy so will be great if anyone else is free to lend a hand)
          Hide
          rvs Roman Shaposhnik added a comment -

          I think there's a really good reason to implement the pattern matching natively in Groovy rather than relying on exec'ing external commands for that. Could be worth considering.

          Show
          rvs Roman Shaposhnik added a comment - I think there's a really good reason to implement the pattern matching natively in Groovy rather than relying on exec'ing external commands for that. Could be worth considering.

            People

            • Assignee:
              jayunit100 jay vyas
              Reporter:
              jayunit100 jay vyas
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development