Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-1222

Simplify and gradleize a subset of the bigtop smokes

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: build, tests
    • Labels:
      None

      Description

      (Rewritten the description for clarity)

      We need an easier way to run bigtop smoke tests, and gradle provides this:

      1) Easy to script/modify
      2) Human readable
      3) equally oriented towards both groovy and plain old java

      The advantage of this method to running smokes :

      1) No need to compile a jar : this is a costly step and not much value added, also creates indirection which can make debugging a broken test very hard.

      2) Simple: A smoke test doesnt need to make low level API calls or be compiled against the right APIs - rather, it should test the end user interface ("hive -q ....", "pig -x ....", "hadoop jar ....", and so on).

      3) Customizable: The smoke tests shouldnt require users to have to write XML and debug environmental variables / grep around for System properties etc. Rather, a high level controller should do all that checking for you.

      The initial idea was to write a python/bash implementation wrapper of scripts, but that was replaced by the idea of using gradle. The advantage of gradle is that we don't need to manually set the classpath and run groovy commands: Gradle wraps groovy scripts in their native java context quite nicely - but it doesnt add any other unnecessary overhead (xml, jar files, no need for complex xml tag wrappers for simple tasks - just plain groovy code).

      So, here the goal is just to create a nice, clean, extensible non-jar, non-API dependent gradle runner for the smoke tests which exersizes the hadoop cluster the same way a typical end-user would.

      1. BIGTOP-1222.patch
        49 kB
        jay vyas
      2. BIGTOP-1222.patch
        49 kB
        jay vyas
      3. BIGTOP-1222.patch
        50 kB
        jay vyas
      4. BIGTOP-1222.patch
        51 kB
        jay vyas
      5. BIGTOP-1222.patch
        51 kB
        jay vyas
      6. BIGTOP-1222.patch
        43 kB
        jay vyas
      7. BIGTOP-1222.patch
        47 kB
        jay vyas
      8. BIGTOP-1222.patch
        47 kB
        jay vyas
      9. BIGTOP-1222.patch
        47 kB
        jay vyas
      10. BIGTOP-1222.patch
        64 kB
        jay vyas
      11. BIGTOP-1222.patch
        64 kB
        jay vyas
      12. BIGTOP-1222.patch
        49 kB
        jay vyas
      13. BIGTOP-1222.patch
        5 kB
        Dawson Choong
      14. BIGTOP-1222.patch
        35 kB
        jay vyas
      15. BIGTOP-1222.patch
        26 kB
        jay vyas
      16. BIGTOP-1222.patch
        17 kB
        jay vyas
      17. BIGTOP-1222.patch
        17 kB
        jay vyas
      18. BIGTOP-1222.patch
        17 kB
        jay vyas
      19. BIGTOP-1222-2.patch
        23 kB
        jay vyas
      20. newsmokes.png
        40 kB
        jay vyas

        Issue Links

          Activity

          Hide
          cos Konstantin Boudnik added a comment -

          smokes should be easily runnable as scripts, with no need for jar file intermediates

          I am not very clean about this. Could you please elaborate a bit?

          Show
          cos Konstantin Boudnik added a comment - smokes should be easily runnable as scripts, with no need for jar file intermediates I am not very clean about this. Could you please elaborate a bit?
          Hide
          jayunit100 jay vyas added a comment -

          Well, the entire control structure is based on maven. So, for me , I do funny things like editing XML Files to turn tests on / off. Instead, lets take a script based approach, something like this:

          groovy -classpath /usr/lib/hadoop/hadoop-common-2.0.6-alpha.jar:/root/.m2/repository/org/apache/bigtop/itest/itest-common/0.8.0-SNAPSHOT/itest-common-0.8.0-SNAPSHOT.jar bigtop-tests/test-artifacts/hadoop/src/main/groovy/org/apache/bigtop/itest/hadoop/hdfs/TestFuseDFS.groovy
          

          And generalize it so that we can run the entire suite of smokes just from a gradle task or groovy script. The gradle task will

          • run the groovy scripts directly from source
          • report failures/success

          This is ALOT easier then doing it through maven. Unless im missing something...

          This obviates BIGTOP-1195, do you see the connection now?

          Show
          jayunit100 jay vyas added a comment - Well, the entire control structure is based on maven. So, for me , I do funny things like editing XML Files to turn tests on / off. Instead, lets take a script based approach, something like this: groovy -classpath /usr/lib/hadoop/hadoop-common-2.0.6-alpha.jar:/root/.m2/repository/org/apache/bigtop/itest/itest-common/0.8.0-SNAPSHOT/itest-common-0.8.0-SNAPSHOT.jar bigtop-tests/test-artifacts/hadoop/src/main/groovy/org/apache/bigtop/itest/hadoop/hdfs/TestFuseDFS.groovy And generalize it so that we can run the entire suite of smokes just from a gradle task or groovy script. The gradle task will run the groovy scripts directly from source report failures/success This is ALOT easier then doing it through maven. Unless im missing something... This obviates BIGTOP-1195 , do you see the connection now?
          Hide
          cos Konstantin Boudnik added a comment -

          Well, maven provides quite a bt of benefits like properly setting up classpaths and such. Also, the integration into CI is seamless, etc. We have chosen maven for a reason you know

          As for your use case: look into pom.xml files of common module - there's a number of properties that will allow you to achieve things you need. Like changing default include/exclude set, etc.

          The Gradle has a very deep integration into maven life-cycle and would provide a way better way of running tests than just rely on direct call to groovy interpreter. I think what we need to is to step back and collect the requirements. I think a lof of the use cases are already covered and trying to patch things sporadically isn't the best way to improve the framework. You see where I am coming from?

          Show
          cos Konstantin Boudnik added a comment - Well, maven provides quite a bt of benefits like properly setting up classpaths and such. Also, the integration into CI is seamless, etc. We have chosen maven for a reason you know As for your use case: look into pom.xml files of common module - there's a number of properties that will allow you to achieve things you need. Like changing default include/exclude set, etc. The Gradle has a very deep integration into maven life-cycle and would provide a way better way of running tests than just rely on direct call to groovy interpreter. I think what we need to is to step back and collect the requirements. I think a lof of the use cases are already covered and trying to patch things sporadically isn't the best way to improve the framework. You see where I am coming from?
          Hide
          jayunit100 jay vyas added a comment -

          I totally agree that sporadic patches may not be the right way forward.

          Let me try to clarify my thoughts here ... Hopefully without ranting too much....

          The simple issue is this: The complexity of running bigtop smoke tests is limiting adoption, I think. Maybe Take a look at intel's HiBench as an example of a script based testing framework with no xml editing required.

          ... forcing users to wade through the many pom files in the bigtop smokes, and craft regular expressions to filter tests nested within 10s of directories, and then run maven compile commands just to run something that can be done in bash is probably overkill.. The main customizable portions of the pom files can easily all be expressed in some kind of wrapper script... Don't you think? Be it gradle or bash or whatever.

          So if we don't want to use bash to wrap smokes... As I suggested in bigtop-1195...Maybe we can use gradle to drive the smokes, which brings the benefits if bash (clear, imperative , concise) with those of maven (java centric, and java aware).

          In any case, the end goal is making the smokes easy to run and customize: I'm not debating that they currently can be customized by maven pom hacking.... But id like to see the bigtop smokes become more transparent and directly runnable, editable, and easily modifiable. The current accepted way of running them which involves several maven / compile steps, is just too cumbersome for a typical hadoop tester or newcomer to bother with. - they will just run terasort instead....

          We want bigtops smokes to be a comprehensive alternative to the current ad hoc tests that people run, and that will only happen I think if the barrier to adoption is lower.

          Show
          jayunit100 jay vyas added a comment - I totally agree that sporadic patches may not be the right way forward. Let me try to clarify my thoughts here ... Hopefully without ranting too much.... The simple issue is this: The complexity of running bigtop smoke tests is limiting adoption, I think. Maybe Take a look at intel's HiBench as an example of a script based testing framework with no xml editing required. ... forcing users to wade through the many pom files in the bigtop smokes, and craft regular expressions to filter tests nested within 10s of directories, and then run maven compile commands just to run something that can be done in bash is probably overkill.. The main customizable portions of the pom files can easily all be expressed in some kind of wrapper script... Don't you think? Be it gradle or bash or whatever. So if we don't want to use bash to wrap smokes... As I suggested in bigtop-1195...Maybe we can use gradle to drive the smokes, which brings the benefits if bash (clear, imperative , concise) with those of maven (java centric, and java aware). In any case, the end goal is making the smokes easy to run and customize: I'm not debating that they currently can be customized by maven pom hacking.... But id like to see the bigtop smokes become more transparent and directly runnable, editable, and easily modifiable. The current accepted way of running them which involves several maven / compile steps, is just too cumbersome for a typical hadoop tester or newcomer to bother with. - they will just run terasort instead.... We want bigtops smokes to be a comprehensive alternative to the current ad hoc tests that people run, and that will only happen I think if the barrier to adoption is lower.
          Hide
          cos Konstantin Boudnik added a comment - - edited

          Jay, the point is that you don't need to hack pom files to run tests - the filtering mechanism is built into the execution framework. All you need to do is to set a system property (check the common module for the names, I don't remember them off hand) in the command line. It might be simplified somewhat - there's no question about it.

          Let me sched a bit of light on why something as structural as Maven - or a similar approach that provides tight control over the dependencies - is beneficial in the case of Bigtop. Bigtop makes guarantees that the tests one runs against a stack have the same version of dependencies as the stack itself. Thus the guarantee that we are comparing carrots to carrots and not to potatoes. Because Bigtop has been designed with having multiplicity of target stacks in mind, we had to pick a mechanism with a strickter level of control. I admit again - while Maven model provides great contol over the dependencies - it has a long way to go wrt UX. In other words - XML sucks and blows at the same time. But there's a big but here: providing the same level of guarantees of the coherent dependencies management using something like shell would be challenging (to say it politely).

          So to rephrase: I agreen that Maven thing needs to be improved. No one in sound mind will argue with that. However, I am highty doubtful that shell is the right and/or better alternative. So, looks like Gradle is a sensible middle ground - as you've mentioned above. Now, let's focus on making it happen. Am I making sense?

          Show
          cos Konstantin Boudnik added a comment - - edited Jay, the point is that you don't need to hack pom files to run tests - the filtering mechanism is built into the execution framework. All you need to do is to set a system property (check the common module for the names, I don't remember them off hand) in the command line. It might be simplified somewhat - there's no question about it. Let me sched a bit of light on why something as structural as Maven - or a similar approach that provides tight control over the dependencies - is beneficial in the case of Bigtop. Bigtop makes guarantees that the tests one runs against a stack have the same version of dependencies as the stack itself. Thus the guarantee that we are comparing carrots to carrots and not to potatoes. Because Bigtop has been designed with having multiplicity of target stacks in mind, we had to pick a mechanism with a strickter level of control. I admit again - while Maven model provides great contol over the dependencies - it has a long way to go wrt UX. In other words - XML sucks and blows at the same time. But there's a big but here: providing the same level of guarantees of the coherent dependencies management using something like shell would be challenging (to say it politely). So to rephrase: I agreen that Maven thing needs to be improved. No one in sound mind will argue with that. However, I am highty doubtful that shell is the right and/or better alternative. So, looks like Gradle is a sensible middle ground - as you've mentioned above. Now, let's focus on making it happen. Am I making sense?
          Hide
          jayunit100 jay vyas added a comment -

          Hi cos: okay, so you've ** almost ** talked me down off the ledge, but not quite yet . ** Thanks for all your feedback **. So to clarify, as this has been a long conversation, can you answer these questions:

          • Can we replace "maven verify" with a gradle equivalent , or not?
          • Can you start a requirements page with the finer points here (i.e. about bigtop gaurantees) for the smoke tests somewhere ? That way I can match my JIRAs to the requirements before proposing one-off patches which are contrary to the overall goals of bigtop.
          Show
          jayunit100 jay vyas added a comment - Hi cos: okay, so you've ** almost ** talked me down off the ledge, but not quite yet . ** Thanks for all your feedback **. So to clarify, as this has been a long conversation, can you answer these questions: Can we replace "maven verify" with a gradle equivalent , or not? Can you start a requirements page with the finer points here (i.e. about bigtop gaurantees) for the smoke tests somewhere ? That way I can match my JIRAs to the requirements before proposing one-off patches which are contrary to the overall goals of bigtop.
          Hide
          cos Konstantin Boudnik added a comment -

          Yes, for both. The second one will come later today or tomorrow. Perhaps, we can do a call to iron out the wrinkles. I think we are on the same page, but there's a bit of miscommunication is happening.

          Show
          cos Konstantin Boudnik added a comment - Yes, for both. The second one will come later today or tomorrow. Perhaps, we can do a call to iron out the wrinkles. I think we are on the same page, but there's a bit of miscommunication is happening.
          Hide
          jayunit100 jay vyas added a comment -

          yeah i agree thats a good idea. lets definetly have a call about it. How about friday at 2EST, which is like 11 in california ?

          Show
          jayunit100 jay vyas added a comment - yeah i agree thats a good idea. lets definetly have a call about it. How about friday at 2EST, which is like 11 in california ?
          Hide
          cos Konstantin Boudnik added a comment - - edited

          Sure, I will send you an invite via email

          Anyone else want to be on the call? Lemme know so I can add you.

          Show
          cos Konstantin Boudnik added a comment - - edited Sure, I will send you an invite via email Anyone else want to be on the call? Lemme know so I can add you.
          Hide
          cos Konstantin Boudnik added a comment -

          To summarize the result of today's call between jay vyas, Roman Shaposhnik, and myself:

          • yes, we absolutely need to provide a better way of managing tests and their execution. Essentially, it means to expose tons of tests' configurations and properties at the top level, so a user doesn't need to dig up all these details
          • however, keeping this info in an aux. config file will imminently increase the amount of tests maintenance wheres all configuration parameters will have to be logged twice: in the test and in the config file. This issue can be work-arounded by making tests more declarative. One of the ways to do so is to use annotations in the tests, so their intentions and expectations are available to anyone who cares to ask (see BIGTOP-685, and BIGTOP-874)
          • Gradle - being deeply integrated into JVM platform - would provide a seamless way of querying tests contracts in the runtime, effectively providing a way of creating a dynamic set of default configurations. As a next steps, these defaults can be overridden/merged with user-provided custom sets of parameters. Shell script - while a possible solution - seems to be a more difficult choice from JVM-integration standpoint.
          Show
          cos Konstantin Boudnik added a comment - To summarize the result of today's call between jay vyas , Roman Shaposhnik , and myself: yes, we absolutely need to provide a better way of managing tests and their execution. Essentially, it means to expose tons of tests' configurations and properties at the top level, so a user doesn't need to dig up all these details however, keeping this info in an aux. config file will imminently increase the amount of tests maintenance wheres all configuration parameters will have to be logged twice: in the test and in the config file. This issue can be work-arounded by making tests more declarative. One of the ways to do so is to use annotations in the tests, so their intentions and expectations are available to anyone who cares to ask (see BIGTOP-685 , and BIGTOP-874 ) Gradle - being deeply integrated into JVM platform - would provide a seamless way of querying tests contracts in the runtime, effectively providing a way of creating a dynamic set of default configurations. As a next steps, these defaults can be overridden/merged with user-provided custom sets of parameters. Shell script - while a possible solution - seems to be a more difficult choice from JVM-integration standpoint.
          Hide
          mbukatov Martin Bukatovic added a comment -

          Since I find this issue important, I'm adding my point of view here:

          I agree with jay vyas that we need a better way to execute integration tests while understand why bigtop project decided to pick maven for this task.

          Why I don't like test execution via maven?

          • It's harder to integrate with other tests.
            I would like to use bigtop tests when checking integration of hadoop in specific environment and needs to run particular test cases from bigtop while doing something else at the same time. With maven this is hard, because one have to configure maven properly to run just particular test case and then fight maven to produce reasonable output... It can be done somehow, but it's hardly optimal for this usecase.
          • Maven creates a higher entry barrier: you can run full test suite as it is, but when you decide to dig deeper and eg. modify and run just single testcase, you need to recompile it and then setup maven to run just this case.

          That said maven is really useful for preparation of the test environment: we need it to manage dependencies (resolve and download jars into local maven repository), create classpath for test to run and so on. Even when I would decide to run some bigtop tests via shell script, I would need maven to do this setup. But when it comes to the test execution, I can imagine a easier ways to do this.

          So if we decide to go with gradle, I would like to propose to have a some simple plain mode of execution where:

          • maven/gradle sets up the enviroment (jars, classpath) for the case (so we will be sure that the we are not comparing carrots with potatoes)
          • you will run just single case easily (ideally just by specifying a groovy script name)
          • without additional logging and maven plumbing during execution
          • groovy scripts runs without the compilation directly

          This approach would be great for just looking around, experimenting and integrating with other environments. I have no knowledge of gradle, but I'm willing to learn about it help this to be implemented if bigtop projects decides to go with gradle.

          Show
          mbukatov Martin Bukatovic added a comment - Since I find this issue important, I'm adding my point of view here: I agree with jay vyas that we need a better way to execute integration tests while understand why bigtop project decided to pick maven for this task. Why I don't like test execution via maven? It's harder to integrate with other tests. I would like to use bigtop tests when checking integration of hadoop in specific environment and needs to run particular test cases from bigtop while doing something else at the same time. With maven this is hard, because one have to configure maven properly to run just particular test case and then fight maven to produce reasonable output... It can be done somehow, but it's hardly optimal for this usecase. Maven creates a higher entry barrier: you can run full test suite as it is, but when you decide to dig deeper and eg. modify and run just single testcase, you need to recompile it and then setup maven to run just this case. That said maven is really useful for preparation of the test environment: we need it to manage dependencies (resolve and download jars into local maven repository), create classpath for test to run and so on. Even when I would decide to run some bigtop tests via shell script, I would need maven to do this setup. But when it comes to the test execution, I can imagine a easier ways to do this. So if we decide to go with gradle, I would like to propose to have a some simple plain mode of execution where: maven/gradle sets up the enviroment (jars, classpath) for the case (so we will be sure that the we are not comparing carrots with potatoes) you will run just single case easily (ideally just by specifying a groovy script name) without additional logging and maven plumbing during execution groovy scripts runs without the compilation directly This approach would be great for just looking around, experimenting and integrating with other environments. I have no knowledge of gradle, but I'm willing to learn about it help this to be implemented if bigtop projects decides to go with gradle.
          Hide
          cos Konstantin Boudnik added a comment -

          Hi Martin. Thanks for listing the requirements! A view comments:

          Maven creates a higher entry barrier: you can run full test suite as it is, but when you decide to dig deeper and eg. modify and run just single testcase, you need to recompile it and then setup maven to run just this case

          Re-compilation isn't specific to Maven. The reason it is done is because

          • tests and their execution are separated
          • tests are written as a Java or Groovy programs, not scripts

          maven/gradle sets up the enviroment (jars, classpath) for the case (so we will be sure that the we are not comparing carrots with potatoes)

          That's exactly what maven does (and gradle shall seamlessly step into this later, hopefully without a disruptive change in the pom structure).

          you will run just single case easily (ideally just by specifying a groovy script name)

          As I've pointed out a couple time already - there's already a way to do this by using -Dorg.apache.maven-failsafe-plugin.testInclude sysprop. I guess the name can be shorter/cleaner.

          without additional logging and maven plumbing during execution

          I am not really sure what you're referring to. One of the benefits of plumbing is to have a nice plug into CI infrastructure. If you are talking about having a different entry-point into the test system to make ad-hoc experiments easier - I am all for it. And I think gradle is the right way to go.

          groovy scripts runs without the compilation directly

          This is clearly possible only if tests are written as scripts, not as classes. As an example look at what has been done in BIGTOP-952 with the provisioning script.

          I got a little bit oversubscribed with a couple of timeboxed things, but I should be spending more time on this Gradle stuff starting next week. In the meanwhile it would be a great starting point if someone wants to drop in some proof-of-concept patches. Gradle is pretty easy - I have learned its sophistications in a less than a day working on BIGTOP-1201.

          Show
          cos Konstantin Boudnik added a comment - Hi Martin. Thanks for listing the requirements! A view comments: Maven creates a higher entry barrier: you can run full test suite as it is, but when you decide to dig deeper and eg. modify and run just single testcase, you need to recompile it and then setup maven to run just this case Re-compilation isn't specific to Maven. The reason it is done is because tests and their execution are separated tests are written as a Java or Groovy programs, not scripts maven/gradle sets up the enviroment (jars, classpath) for the case (so we will be sure that the we are not comparing carrots with potatoes) That's exactly what maven does (and gradle shall seamlessly step into this later, hopefully without a disruptive change in the pom structure). you will run just single case easily (ideally just by specifying a groovy script name) As I've pointed out a couple time already - there's already a way to do this by using -Dorg.apache.maven-failsafe-plugin.testInclude sysprop. I guess the name can be shorter/cleaner. without additional logging and maven plumbing during execution I am not really sure what you're referring to. One of the benefits of plumbing is to have a nice plug into CI infrastructure. If you are talking about having a different entry-point into the test system to make ad-hoc experiments easier - I am all for it. And I think gradle is the right way to go. groovy scripts runs without the compilation directly This is clearly possible only if tests are written as scripts, not as classes. As an example look at what has been done in BIGTOP-952 with the provisioning script. I got a little bit oversubscribed with a couple of timeboxed things, but I should be spending more time on this Gradle stuff starting next week. In the meanwhile it would be a great starting point if someone wants to drop in some proof-of-concept patches. Gradle is pretty easy - I have learned its sophistications in a less than a day working on BIGTOP-1201 .
          Hide
          mbukatov Martin Bukatovic added a comment -

          Re-compilation isn't specific to Maven. The reason it is done is because

          • tests and their execution are separated
          • tests are written as a Java or Groovy programs, not scripts

          I'm not a groovy expert, so maybe I'm missing something, but when I was experimenting with it, I was able to modify TestNode.groovy and run it via shell wrapper without explicit maven recompilation. That said your objections confirms my feeling that this is not quite correct approach, right?

          As I've pointed out a couple time already - there's already a way to do this by using -Dorg.apache.maven-failsafe-plugin.testInclude sysprop.

          Thanks for pointing this our, I need to recheck how this one works.

          If you are talking about having a different entry-point into the test system to make ad-hoc experiments easier - I am all for it. And I think gradle is the right way to go.

          Yes, that the usecase I have in mind. Another one would be a situation when you are running particular bigtop testcase in from other test scripts for integration testing.

          Anyway, I will try to look at your gradle work on BIGTOP-1201 later this week to better understand it.

          Show
          mbukatov Martin Bukatovic added a comment - Re-compilation isn't specific to Maven. The reason it is done is because tests and their execution are separated tests are written as a Java or Groovy programs, not scripts I'm not a groovy expert, so maybe I'm missing something, but when I was experimenting with it, I was able to modify TestNode.groovy and run it via shell wrapper without explicit maven recompilation. That said your objections confirms my feeling that this is not quite correct approach, right? As I've pointed out a couple time already - there's already a way to do this by using -Dorg.apache.maven-failsafe-plugin.testInclude sysprop. Thanks for pointing this our, I need to recheck how this one works. If you are talking about having a different entry-point into the test system to make ad-hoc experiments easier - I am all for it. And I think gradle is the right way to go. Yes, that the usecase I have in mind. Another one would be a situation when you are running particular bigtop testcase in from other test scripts for integration testing. Anyway, I will try to look at your gradle work on BIGTOP-1201 later this week to better understand it.
          Hide
          cos Konstantin Boudnik added a comment -

          BIGTOP-1201 is mostly about replacing packaging functionality right now covered by make.
          I have done some primitive stuff to wrap maven into gradle build, so all test compilation/installation could be done from one command without a need to run through the multiple steps. It is already in the workspace under build.gradle. There's a long way to go with it, apparently, but at least we've started

          Show
          cos Konstantin Boudnik added a comment - BIGTOP-1201 is mostly about replacing packaging functionality right now covered by make. I have done some primitive stuff to wrap maven into gradle build, so all test compilation/installation could be done from one command without a need to run through the multiple steps. It is already in the workspace under build.gradle. There's a long way to go with it, apparently, but at least we've started
          Hide
          jayunit100 jay vyas added a comment - - edited

          (rehashing my above question: still need a wiki page or something about this)

          hi cos! can you help us understand what carrots and potatoes your talking about ?

          for the smoke tests, as i think about it : maven, groovy, itest, gradle are all only simulating what a data scientist or analyst will enter into the terminal when trying to solve a real world problem...

          I still dont understand fully why, for smoke tests, we care about maven dependencies at all? To me, they are just a means to an end (we need ITest on the classpath to run hadoop commands and leverage the awesome bigtop smoke tests , so we go ahead and find a way to add it - but we dont care much about having the exact right version of it)....

          Smoke tests should test the command line hadoop invocations like

           hadoop fs -ls
           hbase shell -d create 't1','f1' 
           pig -x ....
          

          These should gather their runtime info from /usr/lib/hadoop/lib, /urs/pig/lib, etc.

          Regarding carrots to potatoes: The end user's invocation of a hadoop command which delivers them some real insight into their big data problem space is the "real potatoes" that the smoke tests should be testing. Right?

          Show
          jayunit100 jay vyas added a comment - - edited (rehashing my above question: still need a wiki page or something about this) hi cos! can you help us understand what carrots and potatoes your talking about ? for the smoke tests, as i think about it : maven, groovy, itest, gradle are all only simulating what a data scientist or analyst will enter into the terminal when trying to solve a real world problem... I still dont understand fully why, for smoke tests, we care about maven dependencies at all? To me, they are just a means to an end (we need ITest on the classpath to run hadoop commands and leverage the awesome bigtop smoke tests , so we go ahead and find a way to add it - but we dont care much about having the exact right version of it).... Smoke tests should test the command line hadoop invocations like hadoop fs -ls hbase shell -d create 't1','f1' pig -x .... These should gather their runtime info from /usr/lib/hadoop/lib, /urs/pig/lib, etc. Regarding carrots to potatoes: The end user's invocation of a hadoop command which delivers them some real insight into their big data problem space is the "real potatoes" that the smoke tests should be testing. Right?
          Hide
          mbukatov Martin Bukatovic added a comment - - edited

          I agree with jay vyas that we should use hadoop instance already installed on the system (which is quite a requirement for a this kind of test anyway) instead of having maven to fetch "some" hadoop libraries. I know that some of hadoop dependencies are specified in the smoke execution pom files, which is something I don't like but forgot to mention here - so thanks Jay for pointing this out.

          I still dont understand fully why, for smoke tests, we care about maven dependencies at all? To me, they are just a means to an end (we need ITest on the classpath to run hadoop commands and leverage the awesome bigtop smoke tests , so we go ahead and find a way to add it - but we dont care much about having the exact right version of it)....

          What I mean by dependencies are all the libraries needed to run groovy and the testing framework, eg. following list are maven generated runtime dependencies for groovy runtime on one of my machines:

          /opt/m2-local-repo/xmlpull/xmlpull/1.1.3.1/xmlpull-1.1.3.1.jar
          /opt/m2-local-repo/org/apache/ant/ant-launcher/1.8.2/ant-launcher-1.8.2.jar
          /opt/m2-local-repo/junit/junit/4.10/junit-4.10.jar
          /opt/m2-local-repo/org/apache/ant/ant-antlr/1.8.2/ant-antlr-1.8.2.jar
          /opt/m2-local-repo/org/codehaus/jsr166-mirror/jsr166y/1.7.0/jsr166y-1.7.0.jar
          /opt/m2-local-repo/org/hamcrest/hamcrest-core/1.1/hamcrest-core-1.1.jar
          /opt/m2-local-repo/com/thoughtworks/xstream/xstream/1.4.1/xstream-1.4.1.jar
          /opt/m2-local-repo/org/codehaus/gpars/gpars/0.12/gpars-0.12.jar
          /opt/m2-local-repo/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
          /opt/m2-local-repo/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar
          /opt/m2-local-repo/org/codehaus/jsr166-mirror/extra166y/1.7.0/extra166y-1.7.0.jar
          /opt/m2-local-repo/bsf/bsf/2.4.0/bsf-2.4.0.jar
          /opt/m2-local-repo/org/apache/ant/ant/1.8.2/ant-1.8.2.jar
          /opt/m2-local-repo/org/fusesource/jansi/jansi/1.7/jansi-1.7.jar
          /opt/m2-local-repo/org/apache/ant/ant-junit/1.8.2/ant-junit-1.8.2.jar
          /opt/m2-local-repo/javax/servlet/servlet-api/2.4/servlet-api-2.4.jar
          /opt/m2-local-repo/jline/jline/0.9.94/jline-0.9.94.jar
          /opt/m2-local-repo/javax/servlet/jsp-api/2.0/jsp-api-2.0.jar
          /opt/m2-local-repo/org/codehaus/groovy/groovy-all/1.8.6/groovy-all-1.8.6.jar
          /opt/m2-local-repo/commons-cli/commons-cli/1.2/commons-cli-1.2.jar
          

          And the testing framework requires this:

          /opt/m2-local-repo/commons-logging/commons-logging/1.1/commons-logging-1.1.jar
          /opt/m2-local-repo/org/codehaus/groovy/groovy-all/1.8.6/groovy-all-1.8.6.jar
          /opt/m2-local-repo/junit/junit/4.11/junit-4.11.jar
          /opt/m2-local-repo/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar
          /opt/m2-local-repo/org/apache/ant/ant-launcher/1.8.2/ant-launcher-1.8.2.jar
          /opt/m2-local-repo/org/apache/ant/ant/1.8.2/ant-1.8.2.jar
          /opt/m2-local-repo/log4j/log4j/1.2.14/log4j-1.2.14.jar
          /opt/m2-local-repo/javax/servlet/servlet-api/2.3/servlet-api-2.3.jar
          /opt/m2-local-repo/logkit/logkit/1.0.1/logkit-1.0.1.jar
          /opt/m2-local-repo/org/apache/ant/ant-junit/1.8.2/ant-junit-1.8.2.jar
          /opt/m2-local-repo/avalon-framework/avalon-framework/4.1.3/avalon-framework-4.1.3.jar
          /opt/m2-local-repo/org/apache/bigtop/itest/itest-common/0.8.0-SNAPSHOT/itest-common-0.8.0-SNAPSHOT.jar
          

          Looking at those lists, I don't find it a good idea to construct them by hand, so I'm ok with manen (or some other tool) generating them for me. Actually I find the manual way kind of scary, considering the number of dependencies and how java handles multiple versions of the same library. (Also it seems I misused the potato/carrot analogy for this, sorry for this

          Show
          mbukatov Martin Bukatovic added a comment - - edited I agree with jay vyas that we should use hadoop instance already installed on the system (which is quite a requirement for a this kind of test anyway) instead of having maven to fetch "some" hadoop libraries. I know that some of hadoop dependencies are specified in the smoke execution pom files, which is something I don't like but forgot to mention here - so thanks Jay for pointing this out. I still dont understand fully why, for smoke tests, we care about maven dependencies at all? To me, they are just a means to an end (we need ITest on the classpath to run hadoop commands and leverage the awesome bigtop smoke tests , so we go ahead and find a way to add it - but we dont care much about having the exact right version of it).... What I mean by dependencies are all the libraries needed to run groovy and the testing framework, eg. following list are maven generated runtime dependencies for groovy runtime on one of my machines: /opt/m2-local-repo/xmlpull/xmlpull/1.1.3.1/xmlpull-1.1.3.1.jar /opt/m2-local-repo/org/apache/ant/ant-launcher/1.8.2/ant-launcher-1.8.2.jar /opt/m2-local-repo/junit/junit/4.10/junit-4.10.jar /opt/m2-local-repo/org/apache/ant/ant-antlr/1.8.2/ant-antlr-1.8.2.jar /opt/m2-local-repo/org/codehaus/jsr166-mirror/jsr166y/1.7.0/jsr166y-1.7.0.jar /opt/m2-local-repo/org/hamcrest/hamcrest-core/1.1/hamcrest-core-1.1.jar /opt/m2-local-repo/com/thoughtworks/xstream/xstream/1.4.1/xstream-1.4.1.jar /opt/m2-local-repo/org/codehaus/gpars/gpars/0.12/gpars-0.12.jar /opt/m2-local-repo/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar /opt/m2-local-repo/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar /opt/m2-local-repo/org/codehaus/jsr166-mirror/extra166y/1.7.0/extra166y-1.7.0.jar /opt/m2-local-repo/bsf/bsf/2.4.0/bsf-2.4.0.jar /opt/m2-local-repo/org/apache/ant/ant/1.8.2/ant-1.8.2.jar /opt/m2-local-repo/org/fusesource/jansi/jansi/1.7/jansi-1.7.jar /opt/m2-local-repo/org/apache/ant/ant-junit/1.8.2/ant-junit-1.8.2.jar /opt/m2-local-repo/javax/servlet/servlet-api/2.4/servlet-api-2.4.jar /opt/m2-local-repo/jline/jline/0.9.94/jline-0.9.94.jar /opt/m2-local-repo/javax/servlet/jsp-api/2.0/jsp-api-2.0.jar /opt/m2-local-repo/org/codehaus/groovy/groovy-all/1.8.6/groovy-all-1.8.6.jar /opt/m2-local-repo/commons-cli/commons-cli/1.2/commons-cli-1.2.jar And the testing framework requires this: /opt/m2-local-repo/commons-logging/commons-logging/1.1/commons-logging-1.1.jar /opt/m2-local-repo/org/codehaus/groovy/groovy-all/1.8.6/groovy-all-1.8.6.jar /opt/m2-local-repo/junit/junit/4.11/junit-4.11.jar /opt/m2-local-repo/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar /opt/m2-local-repo/org/apache/ant/ant-launcher/1.8.2/ant-launcher-1.8.2.jar /opt/m2-local-repo/org/apache/ant/ant/1.8.2/ant-1.8.2.jar /opt/m2-local-repo/log4j/log4j/1.2.14/log4j-1.2.14.jar /opt/m2-local-repo/javax/servlet/servlet-api/2.3/servlet-api-2.3.jar /opt/m2-local-repo/logkit/logkit/1.0.1/logkit-1.0.1.jar /opt/m2-local-repo/org/apache/ant/ant-junit/1.8.2/ant-junit-1.8.2.jar /opt/m2-local-repo/avalon-framework/avalon-framework/4.1.3/avalon-framework-4.1.3.jar /opt/m2-local-repo/org/apache/bigtop/itest/itest-common/0.8.0-SNAPSHOT/itest-common-0.8.0-SNAPSHOT.jar Looking at those lists, I don't find it a good idea to construct them by hand, so I'm ok with manen (or some other tool) generating them for me. Actually I find the manual way kind of scary, considering the number of dependencies and how java handles multiple versions of the same library. (Also it seems I misused the potato/carrot analogy for this, sorry for this
          Hide
          jayunit100 jay vyas added a comment -

          Ok so we are all in agreement... . Cos is just saying
          "maven grabs a bunch of random jars for test automation/logging " etc .

          So what are the next steps in this dizzying (but nevertheless very informing) jira ?

          Show
          jayunit100 jay vyas added a comment - Ok so we are all in agreement... . Cos is just saying "maven grabs a bunch of random jars for test automation/logging " etc . So what are the next steps in this dizzying (but nevertheless very informing) jira ?
          Hide
          cos Konstantin Boudnik added a comment -

          Guys, I think we need to make a distinction here between what we right now call 'smoke' tests and what you're envisioning as a proxy for a customer's - a data scientist in your example - use case. Let's call them functional tests for now (not really true, but still). The existing smoke tests are really of an integration kind or integration smoke tests. The reason all the dependencies have to be resolved to create a test artifact of the latter kind is that some of the tests - not all though - are using low level APIs. HBase tests are the most illustrative in this sense.

          Most of the Hadoop integration tests, on the other hand, are implemented by using CLI calls into Hadoop cluster hence they might looks like functional tests. For functional tests we don't need as much of the dependencies being resolved at the compile time: all one needs is to be able to locate certain installed jar files (jobclient, etc.) and driver scripts like hadoop or hdfs. And that is where the confusion is coming from, I believe. For example look at how pig smoke are picking up additional jars via from executor's pom.xml. In other words - these tests are of the second type: a true smokes. And they can be simplified for sure.

          I think if the second type of tests - functional tests - can be rewritten into Groovy scripts which will provide two folds benefit:

          • they can be compiled and run in the same fashion as now with all Maven install/verify life cycle
          • they could be quickly fired up from Gradle environment without most of the tribal dancing required by Maven

          Does it reconcile all our points or not?

          Show
          cos Konstantin Boudnik added a comment - Guys, I think we need to make a distinction here between what we right now call 'smoke' tests and what you're envisioning as a proxy for a customer's - a data scientist in your example - use case. Let's call them functional tests for now (not really true, but still). The existing smoke tests are really of an integration kind or integration smoke tests . The reason all the dependencies have to be resolved to create a test artifact of the latter kind is that some of the tests - not all though - are using low level APIs. HBase tests are the most illustrative in this sense. Most of the Hadoop integration tests , on the other hand, are implemented by using CLI calls into Hadoop cluster hence they might looks like functional tests . For functional tests we don't need as much of the dependencies being resolved at the compile time: all one needs is to be able to locate certain installed jar files ( jobclient , etc.) and driver scripts like hadoop or hdfs . And that is where the confusion is coming from, I believe. For example look at how pig smoke are picking up additional jars via from executor's pom.xml. In other words - these tests are of the second type: a true smokes. And they can be simplified for sure. I think if the second type of tests - functional tests - can be rewritten into Groovy scripts which will provide two folds benefit: they can be compiled and run in the same fashion as now with all Maven install/verify life cycle they could be quickly fired up from Gradle environment without most of the tribal dancing required by Maven Does it reconcile all our points or not?
          Hide
          jayunit100 jay vyas added a comment - - edited

          great point: Smoke functional tests don't need maven to run ! w00t.

          So Konstantin Boudnik : Shall we have the codebase reflect this by creating smokes/functional tests, which are non maven, raw groovy scripts, splitting the smokes like this:

          1) smokes/integration (direct API calling tests, like what happens in test-artifacts/..../hbase/smoke/TestImportTsv.groovy) (this can stay the exact same , use maven, and keep directory structure)

          2) smokes/functional (iTest based tests which don't make direct API calls, For example TestMahoutExamples.groovy and TestHadoopExamples.groovy) (this can be a new folder or subproject in bigtop?)

          The target of this JIRA could be the LAtter. Hows that sound? if you agree we can come up with a plan for how this can be done

          Show
          jayunit100 jay vyas added a comment - - edited great point: Smoke functional tests don't need maven to run ! w00t. So Konstantin Boudnik : Shall we have the codebase reflect this by creating smokes/functional tests, which are non maven, raw groovy scripts, splitting the smokes like this: 1) smokes/integration (direct API calling tests, like what happens in test-artifacts/..../hbase/smoke/TestImportTsv.groovy) (this can stay the exact same , use maven, and keep directory structure) 2) smokes/functional (iTest based tests which don't make direct API calls, For example TestMahoutExamples.groovy and TestHadoopExamples.groovy) (this can be a new folder or subproject in bigtop?) The target of this JIRA could be the LAtter. Hows that sound? if you agree we can come up with a plan for how this can be done
          Hide
          cos Konstantin Boudnik added a comment -

          I think I am pretty much on board. With a small addendum to point 2) above: it would be great to retain the ability of running functional smokes as a part of the CI. What do you think?

          And yes - let's push it into Backlog, as it might be a bit too much for 0.8.release.

          Show
          cos Konstantin Boudnik added a comment - I think I am pretty much on board. With a small addendum to point 2) above: it would be great to retain the ability of running functional smokes as a part of the CI. What do you think? And yes - let's push it into Backlog, as it might be a bit too much for 0.8.release.
          Hide
          jayunit100 jay vyas added a comment -

          Are you saying you want the functional smokes to stay buried in the maven packages? I guess if so I can write and maintain a gradle wrapper which calls them. From outside of any maven context. In okay with that. ..

          Sounds like were converging to a very simple and elegant idea for easy to run functional smoke testss

          Show
          jayunit100 jay vyas added a comment - Are you saying you want the functional smokes to stay buried in the maven packages? I guess if so I can write and maintain a gradle wrapper which calls them. From outside of any maven context. In okay with that. .. Sounds like were converging to a very simple and elegant idea for easy to run functional smoke testss
          Hide
          cos Konstantin Boudnik added a comment -

          I am not married to the idea of keeping the tests inside of the maven structure. I just want to be able to run all the tests from the CI and have there results represented in the same place. At the same time, it'd be great not to duplicate the tests if possible (e,g, one for integration and another one for functional). Makes sense?

          Show
          cos Konstantin Boudnik added a comment - I am not married to the idea of keeping the tests inside of the maven structure. I just want to be able to run all the tests from the CI and have there results represented in the same place. At the same time, it'd be great not to duplicate the tests if possible (e,g, one for integration and another one for functional). Makes sense?
          Hide
          jayunit100 jay vyas added a comment -

          Update on this: I guess we have a plan~! As per conversation w/ Konstantin Boudnik

          • I'm going to dig around and separate the classes into "functional" versus "integration" smoke tests.
          • Then we can craft a patch which uses gradle to (1) import iTest and some other simple jars and (2) runs the "functional" smokes directly
          Show
          jayunit100 jay vyas added a comment - Update on this: I guess we have a plan~! As per conversation w/ Konstantin Boudnik I'm going to dig around and separate the classes into "functional" versus "integration" smoke tests. Then we can craft a patch which uses gradle to (1) import iTest and some other simple jars and (2) runs the "functional" smokes directly
          Hide
          jayunit100 jay vyas added a comment -

          Looks like, after working some more on this that the "unpackTestResources" feature in the tests makes it so that compiled jars are needed to run the bigtop smokes.

          I really would like to modify the tests so that the unpack functionality is not required (if we know where the bigtop source directory is, we can just as easily do an fs.put) .

          Thoughts?

          Show
          jayunit100 jay vyas added a comment - Looks like, after working some more on this that the "unpackTestResources" feature in the tests makes it so that compiled jars are needed to run the bigtop smokes. I really would like to modify the tests so that the unpack functionality is not required (if we know where the bigtop source directory is, we can just as easily do an fs.put) . Thoughts?
          Hide
          jayunit100 jay vyas added a comment - - edited

          Made some progress with a raw gradle test runner which runs pure groovy scripts, no jar required (couple of hacks required to comment out the jar copy resource directive and replace with a manual copy of the input tests files for TestHadoopExamples.groovy, but those will be easy to solve with an try/catch block update to TestHadoopExamples.groovy which i can put in the final patch.

          Very raw, just a prototype of how it might work.

          This snippet runs the hadoop examples tests, but you have to manually copy over data from the resources first and "comment out" the jar oriented build utilities hooks which extract the files from the jar.

          apply plugin: 'groovy'
          
          repositories {
              mavenCentral()
          }
          
          dependencies {
          
              //needed to avoid groovy not on classpath error.
              testCompile module('org.codehaus.groovy:groovy:1.8.0')
              testCompile group:'org.apache.bigtop.itest', name:'itest-common', version:'0.7.0',transitive:'true'
              testCompile group:'org.apache.hadoop',name:'hadoop-common', version:'2.0.6-alpha',transitive:'true'
              testCompile group:'org.apache.hadoop',name:'hadoop-hdfs', version:'2.1.0-beta',transitive:'true'
          }
          
          
          def doExclude(filename) {
              print("Exclude? ${filename} ... ")
              def tests_to_include = [
                  "TestHadoopExamples.groovy"
              ];
              def keep_this_test=false;
              tests_to_include.each() {
                if(filename.contains(".groovy")){
                  if(filename.contains(it))
                     keep_this_test=true
                  else
                     keep_this_test=false
                }
                else
                  keep_this_test=true;
              };
          
              println("Keep = ${keep_this_test} "+filename);
          
              return !keep_this_test ;
          }
          
          sourceSets {
           test {
                  /**
                  * This will put the input files into the jar.
                  * TODO: We should refactor tests, over time, not to require running from jar.
                  */
          
                  resources {
                        srcDirs =
                        [
                         //'/opt/bigtop/bigtop-tests/test-artifacts/hadoop/src/main/resources/',
                         //In here, we put the log4j.properties, so we have fine grained logging control.
                         'conf/']
                  }
          
                  groovy {
                    srcDirs = ['/opt/bigtop/bigtop-tests/test-artifacts/hadoop/']
                    exclude 'src/main/groovy/org/apache/bigtop/itest/hadoop/hdfs/**'
                    //exclude 'src/main/groovy/org/apache/bigtop/itest/hadoop/hcfs/**'
                    //exclude 'src/main/groovy/org/apache/bigtop/itest/hadoop/yarn/**'
                    exclude { FileTreeElement elem -> ( doExclude(elem.getName()) ) }
                 }
           }
          }
          
          

          (pasting a new updated to the code above,,, will make a patch soon,,, this allows for gradle style filtering of tests that is easy for any user to configure: You just enter the tests you wanted to run, and gradle handles the rest.).

          Show
          jayunit100 jay vyas added a comment - - edited Made some progress with a raw gradle test runner which runs pure groovy scripts, no jar required (couple of hacks required to comment out the jar copy resource directive and replace with a manual copy of the input tests files for TestHadoopExamples.groovy, but those will be easy to solve with an try/catch block update to TestHadoopExamples.groovy which i can put in the final patch. Very raw, just a prototype of how it might work. This snippet runs the hadoop examples tests, but you have to manually copy over data from the resources first and "comment out" the jar oriented build utilities hooks which extract the files from the jar. apply plugin: 'groovy' repositories { mavenCentral() } dependencies { //needed to avoid groovy not on classpath error. testCompile module('org.codehaus.groovy:groovy:1.8.0') testCompile group:'org.apache.bigtop.itest', name:'itest-common', version:'0.7.0',transitive:'true' testCompile group:'org.apache.hadoop',name:'hadoop-common', version:'2.0.6-alpha',transitive:'true' testCompile group:'org.apache.hadoop',name:'hadoop-hdfs', version:'2.1.0-beta',transitive:'true' } def doExclude(filename) { print("Exclude? ${filename} ... ") def tests_to_include = [ "TestHadoopExamples.groovy" ]; def keep_this_test=false; tests_to_include.each() { if(filename.contains(".groovy")){ if(filename.contains(it)) keep_this_test=true else keep_this_test=false } else keep_this_test=true; }; println("Keep = ${keep_this_test} "+filename); return !keep_this_test ; } sourceSets { test { /** * This will put the input files into the jar. * TODO: We should refactor tests, over time, not to require running from jar. */ resources { srcDirs = [ //'/opt/bigtop/bigtop-tests/test-artifacts/hadoop/src/main/resources/', //In here, we put the log4j.properties, so we have fine grained logging control. 'conf/'] } groovy { srcDirs = ['/opt/bigtop/bigtop-tests/test-artifacts/hadoop/'] exclude 'src/main/groovy/org/apache/bigtop/itest/hadoop/hdfs/**' //exclude 'src/main/groovy/org/apache/bigtop/itest/hadoop/hcfs/**' //exclude 'src/main/groovy/org/apache/bigtop/itest/hadoop/yarn/**' exclude { FileTreeElement elem -> ( doExclude(elem.getName()) ) } } } } (pasting a new updated to the code above,,, will make a patch soon,,, this allows for gradle style filtering of tests that is easy for any user to configure: You just enter the tests you wanted to run, and gradle handles the rest.).
          Hide
          jayunit100 jay vyas added a comment - - edited

          (good news !!!!) Update on this: I now have the gradleized smoke tests working for TestHadoopExamples. the key was to add a BIGTOP_HOME enviornmental variable so that , if not runnign from jar, the groovy scripts can directly copy the stuff in resources/ to the DFS.

          Will post a patch once I validate that this approach works for at least one other ecosystem components (i.e. pig,hive, etc..)

          https://gist.github.com/jayunit100/a0fa4e70b69151aa7151

          Show
          jayunit100 jay vyas added a comment - - edited (good news !!!!) Update on this: I now have the gradleized smoke tests working for TestHadoopExamples. the key was to add a BIGTOP_HOME enviornmental variable so that , if not runnign from jar, the groovy scripts can directly copy the stuff in resources/ to the DFS. Will post a patch once I validate that this approach works for at least one other ecosystem components (i.e. pig,hive, etc..) https://gist.github.com/jayunit100/a0fa4e70b69151aa7151
          Hide
          jayunit100 jay vyas added a comment - - edited

          FINALLY ! A patch so we can run bigtop tests DIRECTLY from source , no jar required !

          To run this, just do :

          gradle clean compileGroovy test --info
          

          The results are quite readable, and its super easy to include new tests ! Just add the groovy scripts into the array collection at the top. This patch DOESNT require Jars to run ! There are a couple of VERY MINOR modifications to TestHadoopExampels, and TestHive Smoke tests which try the jar method, and if it doesnt work, they simply fallback to using a BIGTOP_HOME environmental variable, which points to the bigtop source code .

          Show
          jayunit100 jay vyas added a comment - - edited FINALLY ! A patch so we can run bigtop tests DIRECTLY from source , no jar required ! To run this, just do : gradle clean compileGroovy test --info The results are quite readable, and its super easy to include new tests ! Just add the groovy scripts into the array collection at the top. This patch DOESNT require Jars to run ! There are a couple of VERY MINOR modifications to TestHadoopExampels, and TestHive Smoke tests which try the jar method, and if it doesnt work, they simply fallback to using a BIGTOP_HOME environmental variable, which points to the bigtop source code .
          Hide
          jayunit100 jay vyas added a comment -

          minor update

          Show
          jayunit100 jay vyas added a comment - minor update
          Hide
          jayunit100 jay vyas added a comment - - edited

          Adding a simple pig smoke: Right now, i dont knwo how to run the "real" pig smokes from a jar in gradle.

          Obviously this patch is a little rough (3 commits, has a workaround for pig), but basically its an easy way to validate:

          -mapreduce
          -pig
          -hive
          -mahout

          all in one pass.

          Also, these tests are "certified" as HCFS compliant, API neutral, and not dependent on any binaries. Over time we can continue patching the other bigtop tests to make them HCFS compatible, and portable... (for right now Flume depends on HDFS, and Sqoop depends on MySQL, so there is more work to do on that front).

          Any thoughts on the major points? After that ill refine it (remove hardcoded paths, add a few more defensive checks).

          Show
          jayunit100 jay vyas added a comment - - edited Adding a simple pig smoke: Right now, i dont knwo how to run the "real" pig smokes from a jar in gradle. Obviously this patch is a little rough (3 commits, has a workaround for pig), but basically its an easy way to validate: -mapreduce -pig -hive -mahout all in one pass. Also, these tests are "certified" as HCFS compliant, API neutral, and not dependent on any binaries. Over time we can continue patching the other bigtop tests to make them HCFS compatible, and portable... (for right now Flume depends on HDFS, and Sqoop depends on MySQL, so there is more work to do on that front). Any thoughts on the major points? After that ill refine it (remove hardcoded paths, add a few more defensive checks).
          Hide
          cos Konstantin Boudnik added a comment -

          btw, if you keep using the same name for all the attachments JIRA will sort them for you properly.

          Show
          cos Konstantin Boudnik added a comment - btw, if you keep using the same name for all the attachments JIRA will sort them for you properly.
          Hide
          kaiyzen Nate DAmico added a comment -

          As discussed in the meetup today at Redhat, created new item: https://issues.apache.org/jira/browse/BIGTOP-1333

          This is for work to have the build.gradle file read params/inputs from external parametrized file

          Show
          kaiyzen Nate DAmico added a comment - As discussed in the meetup today at Redhat, created new item: https://issues.apache.org/jira/browse/BIGTOP-1333 This is for work to have the build.gradle file read params/inputs from external parametrized file
          Hide
          jayunit100 jay vyas added a comment -

          OKay, attached is a patch for the FlumeNG Test, along with mahout, pig, mapreduce, and hive.

          Very close ! I think I'll put in a sqoop smoke test also (Hsql based) for the first iteration of these light functional smokes.

          Show
          jayunit100 jay vyas added a comment - OKay, attached is a patch for the FlumeNG Test, along with mahout, pig, mapreduce, and hive. Very close ! I think I'll put in a sqoop smoke test also (Hsql based) for the first iteration of these light functional smokes.
          Hide
          jayunit100 jay vyas added a comment - - edited

          Okay ! I'm (finally) done testing the first iteration of these functional smokes

          0) They use gradle for compiling stnadalone classes and adding some light dependencies - but REQUIRE NO JAR files - so they are totally hackable and extensible by the community.
          1) They work for mahout, sqoop, flume, hive, pig, mapreduce
          2) And they are 100% HCFS compliant.
          3) Did I say They require NO JAR FILES -
          4) They create an EMBEDDED database for Sqoop ETL (no more MySQL dependency), by leveraging the HSQL libs.
          5) They generate data for the pig phase - and are a proper integration test (existing pig integration from pig.apache.org tests dont seem to be cluster scale smokes).
          6) The check for all environmental variables upfront in the gradle script.

          Anyone wants to play around with this patch and leave feedback let me know. Some cleanup is required... thanks.

          Show
          jayunit100 jay vyas added a comment - - edited Okay ! I'm (finally) done testing the first iteration of these functional smokes 0) They use gradle for compiling stnadalone classes and adding some light dependencies - but REQUIRE NO JAR files - so they are totally hackable and extensible by the community. 1) They work for mahout, sqoop, flume, hive, pig, mapreduce 2) And they are 100% HCFS compliant . 3) Did I say They require NO JAR FILES - 4) They create an EMBEDDED database for Sqoop ETL (no more MySQL dependency), by leveraging the HSQL libs. 5) They generate data for the pig phase - and are a proper integration test (existing pig integration from pig.apache.org tests dont seem to be cluster scale smokes). 6) The check for all environmental variables upfront in the gradle script. Anyone wants to play around with this patch and leave feedback let me know. Some cleanup is required... thanks.
          Hide
          cos Konstantin Boudnik added a comment -

          Still going through, a few catches:

          • hadoop dep versions seems to be out of order: 2.0.6 vs 2.1.0 Shall it be parametrized?
          • commenting out list of the tests to execute looks too heavy. Shall all be enabled with an option to exclude some by a mask or otherwise?
          • perhaps you meant endsWith in filename.contains(".groovy") ?
          • hardcoded paths like /opt/bigtop/bigtop-tests/test-artifacts shouldn't be used
          • 2-4 space indentation
            I will keep poking around
          Show
          cos Konstantin Boudnik added a comment - Still going through, a few catches: hadoop dep versions seems to be out of order: 2.0.6 vs 2.1.0 Shall it be parametrized? commenting out list of the tests to execute looks too heavy. Shall all be enabled with an option to exclude some by a mask or otherwise? perhaps you meant endsWith in filename.contains(".groovy") ? hardcoded paths like /opt/bigtop/bigtop-tests/test-artifacts shouldn't be used 2-4 space indentation I will keep poking around
          Hide
          jayunit100 jay vyas added a comment - - edited

          thanks cos.

          commenting out list of the tests to execute .........
          Sure - we can do that, or else, we can just publish it side by side as the Reactor-8 folks ( working on BIGTOP-1333 ) have suggested.

          And regarding other comments - i agree with all of them . its still a little raw so I'll attach a cleaned up patch shortly.

          Show
          jayunit100 jay vyas added a comment - - edited thanks cos. commenting out list of the tests to execute ......... Sure - we can do that, or else, we can just publish it side by side as the Reactor-8 folks ( working on BIGTOP-1333 ) have suggested. And regarding other comments - i agree with all of them . its still a little raw so I'll attach a cleaned up patch shortly.
          Hide
          jayunit100 jay vyas added a comment -

          Hi guys. Any more look at this ?

          I'm about to roll another patch this week with the cleanups cos suggests....

          But I realize people will be busy with bigtop-0.8.0 , so I'll avoid to creating too much harassment

          Show
          jayunit100 jay vyas added a comment - Hi guys. Any more look at this ? I'm about to roll another patch this week with the cleanups cos suggests.... But I realize people will be busy with bigtop-0.8.0 , so I'll avoid to creating too much harassment
          Hide
          dawson.choong Dawson Choong added a comment -

          The patch looks promising so far. However, if I run "gradle clean compileGroovy test" in the bigtop/bigtop-smoke-tests directory, I am getting an exception "undeclared env variable: PIG_HOME," despite the fact that I may not want to test pig. It would be nice if the user could write which tests they want to run on the command line (ex: $ gradle hadoop cnode) and gradle will do the rest of the work as opposed to manually selecting tests in build.gradle. Does this seem feasible? I would be happy to work on this with you jay vyas

          Show
          dawson.choong Dawson Choong added a comment - The patch looks promising so far. However, if I run "gradle clean compileGroovy test" in the bigtop/bigtop-smoke-tests directory, I am getting an exception "undeclared env variable: PIG_HOME," despite the fact that I may not want to test pig. It would be nice if the user could write which tests they want to run on the command line (ex: $ gradle hadoop cnode) and gradle will do the rest of the work as opposed to manually selecting tests in build.gradle. Does this seem feasible? I would be happy to work on this with you jay vyas
          Hide
          kaiyzen Nate DAmico added a comment -

          To Dawson Choong comment, and previous one from me, have this item to start to move test running to more parameterized manner: https://issues.apache.org/jira/browse/BIGTOP-1333

          jay vyas would you want to tackle the externalizing which tests are run and params post 0.8.0 release?

          Show
          kaiyzen Nate DAmico added a comment - To Dawson Choong comment, and previous one from me, have this item to start to move test running to more parameterized manner: https://issues.apache.org/jira/browse/BIGTOP-1333 jay vyas would you want to tackle the externalizing which tests are run and params post 0.8.0 release?
          Hide
          jayunit100 jay vyas added a comment - - edited

          build.gradle expects all the usual suspects ( PIG_HOME,HADOOP_CONF_DIR and so on) to be declared, or it wont run.

          I can probably make it a little smarter, so that it only requires PIG_HOME if indeed pig tests are being run, and so on.

          Dawson Choong are you interested in helping me on the patch? That would be awesome ! Im busy right now again, and wont be able to finish this patch until later in the week,but if your ready - Maybe you can pull the patch down, add a few extensions to it, and create a second patch of your own. Then to complete the JIRA, we can just commit both patches at once in the proper order.

          Show
          jayunit100 jay vyas added a comment - - edited build.gradle expects all the usual suspects ( PIG_HOME,HADOOP_CONF_DIR and so on) to be declared, or it wont run. I can probably make it a little smarter, so that it only requires PIG_HOME if indeed pig tests are being run, and so on. Dawson Choong are you interested in helping me on the patch? That would be awesome ! Im busy right now again, and wont be able to finish this patch until later in the week,but if your ready - Maybe you can pull the patch down, add a few extensions to it, and create a second patch of your own. Then to complete the JIRA, we can just commit both patches at once in the proper order.
          Hide
          jayunit100 jay vyas added a comment -

          Nate DAmico I think you can use the patch as is and start your BIGTOP-1333 work ... or else, you can wait until post 0.8.0 release. Either way i think the basics are pretty stable, so that your parameterization wont change much. Its up to you. Thanks again for jumping in on this.

          Show
          jayunit100 jay vyas added a comment - Nate DAmico I think you can use the patch as is and start your BIGTOP-1333 work ... or else, you can wait until post 0.8.0 release. Either way i think the basics are pretty stable, so that your parameterization wont change much. Its up to you. Thanks again for jumping in on this.
          Hide
          cos Konstantin Boudnik added a comment -

          build.gradle expects all the usual suspects ( PIG_HOME,HADOOP_CONF_DIR and so on) to be declared, or it wont run.

          I think it should be possible to make build modular in the sense of having a tiny build.gradle per component test directory, which will enforce all needed env. requirements to run the test - much like current maven build does. These requirements can be gathered (I don't know how yet by the top level build.gradle to express a stack-wise test env. requirements.

          Does it make sense?

          Show
          cos Konstantin Boudnik added a comment - build.gradle expects all the usual suspects ( PIG_HOME,HADOOP_CONF_DIR and so on) to be declared, or it wont run. I think it should be possible to make build modular in the sense of having a tiny build.gradle per component test directory, which will enforce all needed env. requirements to run the test - much like current maven build does. These requirements can be gathered (I don't know how yet by the top level build.gradle to express a stack-wise test env. requirements. Does it make sense?
          Hide
          jayunit100 jay vyas added a comment -

          Yes We can always have a separate build.gradle for subtests ..?i assume that is a normal idiom in gradle projects

          Show
          jayunit100 jay vyas added a comment - Yes We can always have a separate build.gradle for subtests ..?i assume that is a normal idiom in gradle projects
          Hide
          dawson.choong Dawson Choong added a comment -

          jay vyas sounds good. I'll work on a patch that focuses on parametrization.

          Show
          dawson.choong Dawson Choong added a comment - jay vyas sounds good. I'll work on a patch that focuses on parametrization.
          Hide
          jayunit100 jay vyas added a comment - - edited

          Awesome. To be clear on the plan. Let me know if im mistaken but:
          1) We will apply the latest patch here which has basic gradle functinality.
          2) We will apply your extension, which allows for separate build.gradle file for each test directory, something like

          bigtop-smoke-tests
            build.gradle
             pig/
               build.gradle
             mahout/
               build.gradle
             hive/
            ...
          

          3) Review the commit of 1 and 2 - which will be a two commit patch - and test it . then commit it as the resoution of this jira.
          4) Then Nate will work on BIGTOP-1333.

          Show
          jayunit100 jay vyas added a comment - - edited Awesome. To be clear on the plan. Let me know if im mistaken but: 1) We will apply the latest patch here which has basic gradle functinality. 2) We will apply your extension, which allows for separate build.gradle file for each test directory, something like bigtop-smoke-tests build.gradle pig/ build.gradle mahout/ build.gradle hive/ ... 3) Review the commit of 1 and 2 - which will be a two commit patch - and test it . then commit it as the resoution of this jira. 4) Then Nate will work on BIGTOP-1333 .
          Hide
          dawson.choong Dawson Choong added a comment -

          Sounds like a plan. Let me finish up some other work and I'll start on the patch today. Thanks Jay

          Show
          dawson.choong Dawson Choong added a comment - Sounds like a plan. Let me finish up some other work and I'll start on the patch today. Thanks Jay
          Hide
          jayunit100 jay vyas added a comment -

          hi Dawson Choong. are you still planning on taking this on?

          Show
          jayunit100 jay vyas added a comment - hi Dawson Choong . are you still planning on taking this on?
          Hide
          dawson.choong Dawson Choong added a comment - - edited

          Yes, I've been working on a parametrization patch. Basically what I did was write a function testSelector() that grabs the available directories to be tested, such as the ones listed in bigtop-smoke-tests. testSelector will then generate tasks called "test-artifact" based on the user's command line input. You can run individual tests such as: gradle clean compileGroovy test-pig test-pig test-sqoop. Here's a snippet of what I got so far:

          def testSelector = {
            File srcDir
            srcDir = file("${BASE_DIR}")
            def artifactFiles = files {srcDir.listFiles()}
            def artifactList = []
          
            artifactFiles.each { File file ->
              artifactList.add(file.name)
            }
            artifactList.each { artifact ->
              task "test-${artifact}" << {
                artifactsToTest.add("${artifact}")
              }
            }
            printTestEnv()
          }
          

          The functions then adds all the artifacts specified on command line to an array artifactsToTest. This array can be used for anything. Right now I'm using it for printTestEnv(), which is a function that does the same thing as jay vyas's test(), except it prints the environments of each object in artifactsToTest. This way you won't have to comment/uncomment the objects you want/don't want each time you test. The only issue with my implementation is that the only things you can test are the directories that are available in bigtop-smoke-tests (ie conf, flume, pig, sqoop). Therefore, as of now, objects like "HADOOP_MAPRED_HOME" can't be used.

          Let me know your thoughts.

          Show
          dawson.choong Dawson Choong added a comment - - edited Yes, I've been working on a parametrization patch. Basically what I did was write a function testSelector() that grabs the available directories to be tested, such as the ones listed in bigtop-smoke-tests. testSelector will then generate tasks called "test-artifact" based on the user's command line input. You can run individual tests such as: gradle clean compileGroovy test-pig test-pig test-sqoop. Here's a snippet of what I got so far: def testSelector = { File srcDir srcDir = file( "${BASE_DIR}" ) def artifactFiles = files {srcDir.listFiles()} def artifactList = [] artifactFiles.each { File file -> artifactList.add(file.name) } artifactList.each { artifact -> task "test-${artifact}" << { artifactsToTest.add( "${artifact}" ) } } printTestEnv() } The functions then adds all the artifacts specified on command line to an array artifactsToTest. This array can be used for anything. Right now I'm using it for printTestEnv(), which is a function that does the same thing as jay vyas 's test(), except it prints the environments of each object in artifactsToTest. This way you won't have to comment/uncomment the objects you want/don't want each time you test. The only issue with my implementation is that the only things you can test are the directories that are available in bigtop-smoke-tests (ie conf, flume, pig, sqoop). Therefore, as of now, objects like "HADOOP_MAPRED_HOME" can't be used. Let me know your thoughts.
          Hide
          dawson.choong Dawson Choong added a comment -

          Uploaded patch with the features I mentioned above.

          Show
          dawson.choong Dawson Choong added a comment - Uploaded patch with the features I mentioned above.
          Hide
          jayunit100 jay vyas added a comment -

          hi Dawson Choong . looking into this now will let you know how far i get

          Show
          jayunit100 jay vyas added a comment - hi Dawson Choong . looking into this now will let you know how far i get
          Hide
          jayunit100 jay vyas added a comment - - edited

          (( UPDATE)) My first question is : Does your patch build off of mine? It sais you created a new build.gradle. So I'm wondering how we can apply both patches?

          Ah nvm , i see - yes your patch applies mine first, and then deletes the build.gradle and replaces it with your own. carrying on

          Show
          jayunit100 jay vyas added a comment - - edited (( UPDATE)) My first question is : Does your patch build off of mine? It sais you created a new build.gradle. So I'm wondering how we can apply both patches? Ah nvm , i see - yes your patch applies mine first, and then deletes the build.gradle and replaces it with your own. carrying on
          Hide
          dawson.choong Dawson Choong added a comment -

          Yes, it does build off of yours.

          Show
          dawson.choong Dawson Choong added a comment - Yes, it does build off of yours.
          Hide
          jayunit100 jay vyas added a comment -

          got it. playing w/ it now.

          Show
          jayunit100 jay vyas added a comment - got it. playing w/ it now.
          Hide
          jayunit100 jay vyas added a comment -

          Hi Dawson!

          When i applied it it actually seemed to have duplicate methods . maybe i applied the patch wrong. i did

          git am BIGTOP-1222.patch # mine
          patch -p0 < BIGTOP-1222.1.patch # yours 
          

          Will look again in the morning. but in any case, I see there is still work to be done :

          • see how to make tests run properly after the new task generator based on your parsed
          • need to add the directories
            bigtop-smoke-tests
              build.gradle
               pig/
                 build.gradle
               mahout/
                 ....
            

          And so on. Im thinking I can finish this patch up in the next couple of days, unless you were planning on it?
          Thanks again for helping me push this thing along. We'll get there soon.

          Show
          jayunit100 jay vyas added a comment - Hi Dawson! When i applied it it actually seemed to have duplicate methods . maybe i applied the patch wrong. i did git am BIGTOP-1222.patch # mine patch -p0 < BIGTOP-1222.1.patch # yours Will look again in the morning. but in any case, I see there is still work to be done : see how to make tests run properly after the new task generator based on your parsed need to add the directories bigtop-smoke-tests build.gradle pig/ build.gradle mahout/ .... And so on. Im thinking I can finish this patch up in the next couple of days, unless you were planning on it? Thanks again for helping me push this thing along. We'll get there soon.
          Hide
          jayunit100 jay vyas added a comment - - edited

          okay. so i found somehow that i had duplicate code after applying the patch. i manually deleting the bottom lines. it looks like it works. Will add in the directories etc now.

          Show
          jayunit100 jay vyas added a comment - - edited okay. so i found somehow that i had duplicate code after applying the patch. i manually deleting the bottom lines. it looks like it works. Will add in the directories etc now.
          Hide
          dawson.choong Dawson Choong added a comment -

          I'll leave it up to you to finish up the patch, but definitely let me know if you need help with anything for the remainder of it

          Show
          dawson.choong Dawson Choong added a comment - I'll leave it up to you to finish up the patch, but definitely let me know if you need help with anything for the remainder of it
          Hide
          jayunit100 jay vyas added a comment -
          • implements the directory structure cos suggested using subprojects.
          • parses args in settings.gradle in a simlar way to Dawson's, but i reimplemented it in a way so that it was all handled in settings (necessary when using gradle subprojects i think)
          • abstracts some core functionality (env variable asserts, for example) into parent build.gradle, and leaves specifics to subprojects.
          • one subproject per ecosystem component.

          So this is Very close .

          Show
          jayunit100 jay vyas added a comment - implements the directory structure cos suggested using subprojects. parses args in settings.gradle in a simlar way to Dawson's, but i reimplemented it in a way so that it was all handled in settings (necessary when using gradle subprojects i think) abstracts some core functionality (env variable asserts, for example) into parent build.gradle, and leaves specifics to subprojects. one subproject per ecosystem component. So this is Very close .
          Hide
          jayunit100 jay vyas added a comment -

          FINALLY

          • added mapreduce/ tests
          • made it so all tests by default log to stdout using a log directory
          • tested Flume,mapreduce,sqoop,hive,etc... All seem to do good and work normally.

          I'd really like to push this in soon. I'm going to pull the patch down and reformat it - right now there is a mix of tabs/spaces etc... but otherwise, given the interest from nate, doug, and others, id like to get this in so others can pitch in and help maintain and adopt it as the "new", easy way to run bigtop tests . Also i have a README file that is in here as well which should be useful.

          Show
          jayunit100 jay vyas added a comment - FINALLY added mapreduce/ tests made it so all tests by default log to stdout using a log directory tested Flume,mapreduce,sqoop,hive,etc... All seem to do good and work normally. I'd really like to push this in soon. I'm going to pull the patch down and reformat it - right now there is a mix of tabs/spaces etc... but otherwise, given the interest from nate, doug, and others, id like to get this in so others can pitch in and help maintain and adopt it as the "new", easy way to run bigtop tests . Also i have a README file that is in here as well which should be useful.
          Hide
          jayunit100 jay vyas added a comment -

          formatted and cleaned up patch above

          Show
          jayunit100 jay vyas added a comment - formatted and cleaned up patch above
          Hide
          jayunit100 jay vyas added a comment -

          Dawson Choong Can you take a look and try to apply this patch ? Its finally ready to go ! Then Nate DAmico and the other folkscan move forward iterating against these new , super easy to use tests !

          Show
          jayunit100 jay vyas added a comment - Dawson Choong Can you take a look and try to apply this patch ? Its finally ready to go ! Then Nate DAmico and the other folkscan move forward iterating against these new , super easy to use tests !
          Hide
          dawson.choong Dawson Choong added a comment -

          Hi Jay. Am I using the same testing commands as you showed me before?

          Show
          dawson.choong Dawson Choong added a comment - Hi Jay. Am I using the same testing commands as you showed me before?
          Hide
          jayunit100 jay vyas added a comment -

          fixed my name on the patch

          Show
          jayunit100 jay vyas added a comment - fixed my name on the patch
          Hide
          jayunit100 jay vyas added a comment - - edited

          Dawson Choong here is the new way to run the tests:

          gradle compileGroovy test -Dsmoke-tests=flume,hive --info 
          
          Show
          jayunit100 jay vyas added a comment - - edited Dawson Choong here is the new way to run the tests: gradle compileGroovy test -Dsmoke-tests=flume,hive --info
          Hide
          cos Konstantin Boudnik added a comment -

          Can we fix 4-vs-2 indent thingy?

          Show
          cos Konstantin Boudnik added a comment - Can we fix 4-vs-2 indent thingy?
          Hide
          cos Konstantin Boudnik added a comment -
          • inlined versions

            + testCompile group: 'org.apache.bigtop.itest', name: 'itest-common', version: '0.7.0', transitive: 'true'

            Shall it be at 0.8.0, no? Also, would it make sense to have these versions defined somewhere at the top, instead of being hard-coded?

          • also it seems that dependencies section is being propagated through all the build files and the versions are being inlined everywhere. Can it be defined at the top-level build.gradle and reused downstream?
          • I see some weird stuff with the same files having different content and different paths. Like
            diff --git a/bigtop-smoke-tests/flume/conf/flume.conf b/bigtop-smoke-tests/flume/conf/flume.conf
            diff --git a/bigtop-smoke-tests/flume/flume.conf b/bigtop-smoke-tests/flume/flume.conf
            

            What's the purpose of it?

          • do you really need empty lines in the Runtime exception throw like
            +    throw new RuntimeException(
            +            """
            +
            +      Oops! You forgot to define some tests!
            
          • I don't like the fact that we need to manually list the tests in the build.gradle file. The case in point
            +def tests_to_include() {
            +    return [
            +            "TestSqoopETLHsql.groovy"
            +    ];
            +}
            

            Can't it be done dynamically?

          • is the wording change relevant to the scope of the JIRA
            -        assertTrue("Could not create input directory to HDFS", sh.getRet() == 0);
            +        assertTrue("Could not create input directory to the DFS", sh.getRet() == 0);
            

            or has it been done because "we were at it anyway". Can we keep only relevant changes so it is easy to track them later?

          • shall the logic inside of the "catch ... Throwable" (class TestHadoopExamples) be moved into TestUtils.unpackTestResources or a new one? Do you think this is a common retry pattern that will be commonly used elsewhere?
          • is it a relevant change? Looks like it changes the defaults:
            -  public static String pi_samples = System.getProperty("pi_samples", "1000");
            -
            +  public static String pi_samples = System.getProperty("pi_samples", "2");
            
          • if this code isn't used anymore - just remove it
            -    TestUtils.unpackTestResources(TestHadoopSmoke.class, "${testDir}/cachefile", inputFiles, null);
            +    //TestUtils.unpackTestResources(TestHadoopSmoke.class, "${testDir}/cachefile", inputFiles, null);
            

          The indentation varies widely, so please fix it as well.
          Also there are some occasional doubled empty-lines, which makes sense to trim off.

          Great job overall! Thank you!

          Show
          cos Konstantin Boudnik added a comment - inlined versions + testCompile group: 'org.apache.bigtop.itest', name: 'itest-common', version: '0.7.0', transitive: 'true' Shall it be at 0.8.0, no? Also, would it make sense to have these versions defined somewhere at the top, instead of being hard-coded? also it seems that dependencies section is being propagated through all the build files and the versions are being inlined everywhere. Can it be defined at the top-level build.gradle and reused downstream? I see some weird stuff with the same files having different content and different paths. Like diff --git a/bigtop-smoke-tests/flume/conf/flume.conf b/bigtop-smoke-tests/flume/conf/flume.conf diff --git a/bigtop-smoke-tests/flume/flume.conf b/bigtop-smoke-tests/flume/flume.conf What's the purpose of it? do you really need empty lines in the Runtime exception throw like + throw new RuntimeException( + """ + + Oops! You forgot to define some tests! I don't like the fact that we need to manually list the tests in the build.gradle file. The case in point +def tests_to_include() { + return [ + "TestSqoopETLHsql.groovy" + ]; +} Can't it be done dynamically? is the wording change relevant to the scope of the JIRA - assertTrue( "Could not create input directory to HDFS" , sh.getRet() == 0); + assertTrue( "Could not create input directory to the DFS" , sh.getRet() == 0); or has it been done because "we were at it anyway". Can we keep only relevant changes so it is easy to track them later? shall the logic inside of the "catch ... Throwable" (class TestHadoopExamples) be moved into TestUtils.unpackTestResources or a new one? Do you think this is a common retry pattern that will be commonly used elsewhere? is it a relevant change? Looks like it changes the defaults: - public static String pi_samples = System .getProperty( "pi_samples" , "1000" ); - + public static String pi_samples = System .getProperty( "pi_samples" , "2" ); if this code isn't used anymore - just remove it - TestUtils.unpackTestResources(TestHadoopSmoke.class, "${testDir}/cachefile" , inputFiles, null ); + //TestUtils.unpackTestResources(TestHadoopSmoke.class, "${testDir}/cachefile" , inputFiles, null ); The indentation varies widely, so please fix it as well. Also there are some occasional doubled empty-lines, which makes sense to trim off. Great job overall! Thank you!
          Hide
          jayunit100 jay vyas added a comment - - edited

          Hi cos. Thanks for looking at it. I just ran a code formatter and it only found one tab, and replaced it with spaces. I think the spaces/tabs are all fixed (i'll double check tomorrow morning - its late here and my eyeballs are a little weary).

          1) removed the extra flume.conf good catch. Thats generated by the framework actually.
          2) Cleaned up the blank lines in the multiline runtime exception.
          3) I actually would like to keep the explicit tests files for now. Maybe we can add the dynamic stuff later on? For now, all tests are implemented by explitly specifying them.
          I agree its inelegant - but its uniform. Over time, we can add looser regexes /etc for the tests which are in directories that don't require filters.
          4) Regarding pi and the DFS comment changes : Out of scope? Sort of... but Part of this JIRA is to clean up the tests to make tests easy to run - and that means having accurate commants / reasonable defaults.
          5) I've added back in the uncommented code: It is still used when we run tests from jars, and I think we should support that functionality for while folks migrate over to the new framework.

          so, above is a quick patch which fixes some but not all of the stuff you mentioned. I'll take another look in the morning and test it to make sure it works and also re-review your comments.

          Show
          jayunit100 jay vyas added a comment - - edited Hi cos. Thanks for looking at it. I just ran a code formatter and it only found one tab, and replaced it with spaces. I think the spaces/tabs are all fixed (i'll double check tomorrow morning - its late here and my eyeballs are a little weary). 1) removed the extra flume.conf good catch. Thats generated by the framework actually. 2) Cleaned up the blank lines in the multiline runtime exception. 3) I actually would like to keep the explicit tests files for now. Maybe we can add the dynamic stuff later on? For now, all tests are implemented by explitly specifying them. I agree its inelegant - but its uniform. Over time, we can add looser regexes /etc for the tests which are in directories that don't require filters. 4) Regarding pi and the DFS comment changes : Out of scope? Sort of... but Part of this JIRA is to clean up the tests to make tests easy to run - and that means having accurate commants / reasonable defaults. 5) I've added back in the uncommented code: It is still used when we run tests from jars, and I think we should support that functionality for while folks migrate over to the new framework. so, above is a quick patch which fixes some but not all of the stuff you mentioned. I'll take another look in the morning and test it to make sure it works and also re-review your comments.
          Hide
          cos Konstantin Boudnik added a comment -

          I am ok with most of the answers but will wait until tomorrow to looks at the new version of the patch for the same reason as you have
          One thing on the #4 above: I don't think changing the message to mention DFS instead of HDFS is about a command accuracy: after all it is just a comment. Also, reducing the number of the Pi samples from 1000 to 2 won't have any performance effect as far as I know, but rather will lower the accuracy of the calculation. Please correct me if I am wrong on the last one.

          Show
          cos Konstantin Boudnik added a comment - I am ok with most of the answers but will wait until tomorrow to looks at the new version of the patch for the same reason as you have One thing on the #4 above: I don't think changing the message to mention DFS instead of HDFS is about a command accuracy: after all it is just a comment. Also, reducing the number of the Pi samples from 1000 to 2 won't have any performance effect as far as I know, but rather will lower the accuracy of the calculation. Please correct me if I am wrong on the last one.
          Hide
          jayunit100 jay vyas added a comment - - edited
          • ok regarding pi , your right ! The 1000 is for the number of samples, which doesnt result in any real extra compute time. updated. I had confused that w/ number of maps.
          • Comment improvements: I can remove them if you really feel its necessary, but Id rather keep them in. here is my argument for keeping them: they are a non functional change which improves the understandability of the tests... and that is part of the overall goal of the patch.
          • Dependencies inheritance: I actually tried this but it failed. I think it could be gradle version dependent, and that scares me also. So I think the right thing to do is actually implement this JIRA: BIGTOP-1384.

          if necessary to put in these changes i will do so, otherwise lets push it in so others can help improve it

          Show
          jayunit100 jay vyas added a comment - - edited ok regarding pi , your right ! The 1000 is for the number of samples , which doesnt result in any real extra compute time. updated. I had confused that w/ number of maps. Comment improvements: I can remove them if you really feel its necessary, but Id rather keep them in. here is my argument for keeping them: they are a non functional change which improves the understandability of the tests... and that is part of the overall goal of the patch. Dependencies inheritance: I actually tried this but it failed. I think it could be gradle version dependent, and that scares me also. So I think the right thing to do is actually implement this JIRA: BIGTOP-1384 . if necessary to put in these changes i will do so, otherwise lets push it in so others can help improve it
          Hide
          jayunit100 jay vyas added a comment - - edited

          hi Konstantin Boudnik : Should we push these in after the last round of modifications ? or if not i can iterate again this weekend and clean up some more. im fine either way.

          Also i realize its a big patch, so if you want me to revailidate that it all works from zero i maybe can do that later in the week as well.

          Show
          jayunit100 jay vyas added a comment - - edited hi Konstantin Boudnik : Should we push these in after the last round of modifications ? or if not i can iterate again this weekend and clean up some more. im fine either way. Also i realize its a big patch, so if you want me to revailidate that it all works from zero i maybe can do that later in the week as well.
          Hide
          cos Konstantin Boudnik added a comment -

          I think the patch is almost there. Two comments:

          • there are some unrelated changes like (may be more)
            -  String arg = "${nn}/user/${System.properties['user.name']}/${testDir}/cachefile/cachedir.jar#testlink"
            +  String arg =
            +          "${nn}/user/${System.properties['user.name']}/${testDir}/cachefile/cachedir.jar#testlink"
             
          • a number of files - including most if not all Gradle files - are still using 4/8 indentation.
            Could you please fix these before committing? Thanks.
          Show
          cos Konstantin Boudnik added a comment - I think the patch is almost there. Two comments: there are some unrelated changes like (may be more) - String arg = "${nn}/user/${ System .properties['user.name']}/${testDir}/cachefile/cachedir.jar#testlink" + String arg = + "${nn}/user/${ System .properties['user.name']}/${testDir}/cachefile/cachedir.jar#testlink" a number of files - including most if not all Gradle files - are still using 4/8 indentation. Could you please fix these before committing? Thanks.
          Hide
          jayunit100 jay vyas added a comment -

          Okay, will look at it this wknd and shore up final bits. Thanks againfor the review, i know this is a big one !

          Show
          jayunit100 jay vyas added a comment - Okay, will look at it this wknd and shore up final bits. Thanks againfor the review, i know this is a big one !
          Hide
          jayunit100 jay vyas added a comment - - edited

          the groovy/gradle files should now be formatted correctly. let me know if its looking clean now.

          Show
          jayunit100 jay vyas added a comment - - edited the groovy/gradle files should now be formatted correctly. let me know if its looking clean now.
          Hide
          cos Konstantin Boudnik added a comment - - edited

          I applied the patch and ran gradle compileGroovy test -Dsmoke-tests=hive --info and nothing happened: build just finished without doing anything. Is it how it shall work?

          A couple of other comments:

          • shall new tasks be added to the default showHelp task? Or at lest be visible via standard groovy tasks call?
          • smoke-tests sysprop name: I believe more java'ish way of naming those'd be like smokeTests or perhaps smock.tests (I personally prefer the former as it is also more Maven like, but I am not married to it)
          • it seems that some hive tests are exposed to the top-level build file of the new module, i.e.
              println("Now testing...");
              test {
            
                systemProperties['org.apache.bigtop.itest.hivesmoke.TestHiveSmokeBulk.test_include'] = 'basic'
            
                testLogging {
                  events "passed", "skipped", "failed"
                }
            

            Is it intentional?

          • do you have to specify both tasks compileGroovy test in the command line? Can the latter simply depends on the former? Also, compileGroovy does sound very broad: will it be used for all groovy compilation down the road?
          • looks like at least some of the new gradle files are missing ASL boiler-plate

          Also, perhaps it makes sense to put the new smoke tests under bigtop-tests to avoid creating top-level modules for everything?

          Show
          cos Konstantin Boudnik added a comment - - edited I applied the patch and ran gradle compileGroovy test -Dsmoke-tests=hive --info and nothing happened: build just finished without doing anything. Is it how it shall work? A couple of other comments: shall new tasks be added to the default showHelp task? Or at lest be visible via standard groovy tasks call? smoke-tests sysprop name: I believe more java'ish way of naming those'd be like smokeTests or perhaps smock.tests (I personally prefer the former as it is also more Maven like, but I am not married to it) it seems that some hive tests are exposed to the top-level build file of the new module, i.e. println( "Now testing..." ); test { systemProperties['org.apache.bigtop.itest.hivesmoke.TestHiveSmokeBulk.test_include'] = 'basic' testLogging { events "passed" , "skipped" , "failed" } Is it intentional? do you have to specify both tasks compileGroovy test in the command line? Can the latter simply depends on the former? Also, compileGroovy does sound very broad: will it be used for all groovy compilation down the road? looks like at least some of the new gradle files are missing ASL boiler-plate Also, perhaps it makes sense to put the new smoke tests under bigtop-tests to avoid creating top-level modules for everything?
          Hide
          jayunit100 jay vyas added a comment -

          Konstantin Boudnik thanks for testing!

          1) I will apply this patch from scratch and confirm that it works. maby when i refactored it something got lost.

          2) sure, I can modify the smokeTests systeProperty style.

          3) its intentional for the hive tests to decalre the 'basic' value, so that the basic test runs.

          4) I can add in the test -> compileGroovy dependency.

          5) will add in the apache boiler plate as well

          will ping after i test this patch from scratch on a clean system.

          Show
          jayunit100 jay vyas added a comment - Konstantin Boudnik thanks for testing! 1) I will apply this patch from scratch and confirm that it works. maby when i refactored it something got lost. 2) sure, I can modify the smokeTests systeProperty style. 3) its intentional for the hive tests to decalre the 'basic' value, so that the basic test runs. 4) I can add in the test -> compileGroovy dependency. 5) will add in the apache boiler plate as well will ping after i test this patch from scratch on a clean system.
          Hide
          jayunit100 jay vyas added a comment - - edited

          Okidokie , here's the update.

          • prints help/error message if you run gradle help or any other off command.
          • uses smoke.tests instead of smoke_tests
          • added ASF boilerplates
          • added test dependsOn compileGroovy dependency. that was a good idea.
          • made a fresh system, and tested gradle clean compileGroovy test -Dsmoke.tests=hive, seems to work for me, I assume this just means that the patch is dependent on gradle 2.0.

          Konstantin Boudnik, can you comment on the final remiaining issue ~ people running this running it need to use Gradle 2.x, not gradle 1x. we can just open a jira for bigtop toolchain to move to 2.x to solve that, (simplest solution), or find a way to embed gradle wrapper (harder solution). Im okay with either.

          Show
          jayunit100 jay vyas added a comment - - edited Okidokie , here's the update. prints help/error message if you run gradle help or any other off command. uses smoke.tests instead of smoke_tests added ASF boilerplates added test dependsOn compileGroovy dependency. that was a good idea. made a fresh system, and tested gradle clean compileGroovy test -Dsmoke.tests=hive , seems to work for me, I assume this just means that the patch is dependent on gradle 2.0. Konstantin Boudnik , can you comment on the final remiaining issue ~ people running this running it need to use Gradle 2.x, not gradle 1x. we can just open a jira for bigtop toolchain to move to 2.x to solve that, (simplest solution), or find a way to embed gradle wrapper (harder solution). Im okay with either.
          Hide
          jayunit100 jay vyas added a comment -

          (bump) just in case you missed the above update Konstantin Boudnik ^^

          Show
          jayunit100 jay vyas added a comment - (bump) just in case you missed the above update Konstantin Boudnik ^^
          Hide
          cos Konstantin Boudnik added a comment -

          we can just open a jira for bigtop toolchain to move to 2.x

          Let's move to 2.0 - there's no reason to stay behind that far. Once we have 2.0 or 2.x in place this fix should be running well, I presume.

          I giess you don't need to call gradle clean compileGroovy test as there's a depedency between the last two? And I still don't like compileGroovy name - that implies the compilation of all the groovy files, I guess. Thanks!

          Show
          cos Konstantin Boudnik added a comment - we can just open a jira for bigtop toolchain to move to 2.x Let's move to 2.0 - there's no reason to stay behind that far. Once we have 2.0 or 2.x in place this fix should be running well, I presume. I giess you don't need to call gradle clean compileGroovy test as there's a depedency between the last two? And I still don't like compileGroovy name - that implies the compilation of all the groovy files, I guess. Thanks!
          Hide
          jayunit100 jay vyas added a comment -

          need gradle 2x for this, i think. making it a blocker.

          Show
          jayunit100 jay vyas added a comment - need gradle 2x for this, i think. making it a blocker.
          Hide
          jayunit100 jay vyas added a comment -

          hi again old buddy ... Dawson Choong ... i just updated bigtops toolchain to use gradle 2x.

          Now possibly, you or cos you can confirm if the latest patch works for you.... at that point i can push this in and then we can move forward with the new generation of bigtop tests.

          Show
          jayunit100 jay vyas added a comment - hi again old buddy ... Dawson Choong ... i just updated bigtops toolchain to use gradle 2x. Now possibly, you or cos you can confirm if the latest patch works for you.... at that point i can push this in and then we can move forward with the new generation of bigtop tests.
          Hide
          jayunit100 jay vyas added a comment -

          I breifly looked at this just now and found that there are actually some hardcoded paths which should be cleaned up, resbumitting a patch shortly. this could be related to why hive tests didnt do anything on cos' machine

          Show
          jayunit100 jay vyas added a comment - I breifly looked at this just now and found that there are actually some hardcoded paths which should be cleaned up, resbumitting a patch shortly. this could be related to why hive tests didnt do anything on cos' machine
          Hide
          jayunit100 jay vyas added a comment - - edited

          Okay, cleaned up some more and tested on a fresh system. ive pasted a screenshot of what the output looks like when you actually run it... (new patch is also attached, i definetly think its ready to push in now !!!)

          Show
          jayunit100 jay vyas added a comment - - edited Okay, cleaned up some more and tested on a fresh system. ive pasted a screenshot of what the output looks like when you actually run it... (new patch is also attached, i definetly think its ready to push in now !!!)
          Hide
          cos Konstantin Boudnik added a comment -

          Sorry, jay vyas - I don't see it's happening: even with Gradle 2.0 ;(
          Also, a couple more comments:

          • thanks for moving bigtop-smoke-tests/ under bigtop-tests. I think you can safely call it just smoke-tests now - doesn't make much sense of having bigtop-tests/bigtop... subdirectory.
          • I see some of the tests are using println instead of proper logging? Shall it be fixed?
          • I see a number of tailing whitespaces in the patch. From git am output
            Applying: BIGTOP-1222: Shiny new test framework for the hadoop ecosystem.
            /home/cos/workspaces/bigtop/.git/rebase-apply/patch:57: trailing whitespace.
                // Unpack resource 
            /home/cos/workspaces/bigtop/.git/rebase-apply/patch:107: trailing whitespace.
            # This is the new smoke testing module for bigtop 
            /home/cos/workspaces/bigtop/.git/rebase-apply/patch:129: trailing whitespace.
                gradle compileGroovy test -Dsmoke-tests=flume,hive --info 
            /home/cos/workspaces/bigtop/.git/rebase-apply/patch:1165: trailing whitespace.
              
            /home/cos/workspaces/bigtop/.git/rebase-apply/patch:1169: trailing whitespace.
              * See BIGTOP-1222 for example. 
            warning: squelched 22 whitespace errors
            warning: 27 lines add whitespace errors.
            
          Show
          cos Konstantin Boudnik added a comment - Sorry, jay vyas - I don't see it's happening: even with Gradle 2.0 ;( Also, a couple more comments: thanks for moving bigtop-smoke-tests/ under bigtop-tests . I think you can safely call it just smoke-tests now - doesn't make much sense of having bigtop-tests/bigtop... subdirectory. I see some of the tests are using println instead of proper logging? Shall it be fixed? I see a number of tailing whitespaces in the patch. From git am output Applying: BIGTOP-1222: Shiny new test framework for the hadoop ecosystem. /home/cos/workspaces/bigtop/.git/rebase-apply/patch:57: trailing whitespace. // Unpack resource /home/cos/workspaces/bigtop/.git/rebase-apply/patch:107: trailing whitespace. # This is the new smoke testing module for bigtop /home/cos/workspaces/bigtop/.git/rebase-apply/patch:129: trailing whitespace. gradle compileGroovy test -Dsmoke-tests=flume,hive --info /home/cos/workspaces/bigtop/.git/rebase-apply/patch:1165: trailing whitespace. /home/cos/workspaces/bigtop/.git/rebase-apply/patch:1169: trailing whitespace. * See BIGTOP-1222 for example. warning: squelched 22 whitespace errors warning: 27 lines add whitespace errors.
          Hide
          cos Konstantin Boudnik added a comment -

          BTW, I shall take back my comment about dependency test -> compileGroovy. The latter is a part of the standard build life-cycle - as well as the former - and will always be run if needed. Hence, you can update the documentation to just run test, I am sure.

          Show
          cos Konstantin Boudnik added a comment - BTW, I shall take back my comment about dependency test -> compileGroovy . The latter is a part of the standard build life-cycle - as well as the former - and will always be run if needed. Hence, you can update the documentation to just run test , I am sure.
          Hide
          cos Konstantin Boudnik added a comment - - edited

          Ok, I think we are getting somewhere. Jay has pointed out that I need to be in bigtop-tests/bigtop-smoke-tests directory to run the tests. I believe it is a better idea to be able to run the tests from the top-level directory. I am ok with solving it separately in a follow-up JIRA, but I believe this is a reasonable requirement to have.

          Another point, when I am running pig smoke tests I see all sorts of mis-aligned Hadoop dependencies getting pulled in. E.g.

          Download http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.1.0-beta/hadoop-hdfs-2.1.0-beta.jar
          Download http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-annotations/2.0.6-alpha/hadoop-annotations-2.0.6-alpha.jar
          

          This is clearly caused by the mixed bag of deps. declared in the pig - and all others - smoke build.gradle

          Show
          cos Konstantin Boudnik added a comment - - edited Ok, I think we are getting somewhere. Jay has pointed out that I need to be in bigtop-tests/bigtop-smoke-tests directory to run the tests. I believe it is a better idea to be able to run the tests from the top-level directory. I am ok with solving it separately in a follow-up JIRA, but I believe this is a reasonable requirement to have. Another point, when I am running pig smoke tests I see all sorts of mis-aligned Hadoop dependencies getting pulled in. E.g. Download http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.1.0-beta/hadoop-hdfs-2.1.0-beta.jar Download http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-annotations/2.0.6-alpha/hadoop-annotations-2.0.6-alpha.jar This is clearly caused by the mixed bag of deps. declared in the pig - and all others - smoke build.gradle
          Hide
          jayunit100 jay vyas added a comment -

          so at a minimum ill be doing the following
          (1) fix the hadoop versions to 2.3
          (2) remove added whitespace
          (3) fix printlns.
          on it ... will attach patch in a little bit

          i think it will take some thoughtfullness to properly integrate this with the top level gradle, so we can do that in a separate patch.

          Show
          jayunit100 jay vyas added a comment - so at a minimum ill be doing the following (1) fix the hadoop versions to 2.3 (2) remove added whitespace (3) fix printlns. on it ... will attach patch in a little bit i think it will take some thoughtfullness to properly integrate this with the top level gradle, so we can do that in a separate patch.
          Hide
          cos Konstantin Boudnik added a comment -

          Sounds good to me. Thanks!

          Show
          cos Konstantin Boudnik added a comment - Sounds good to me. Thanks!
          Hide
          jayunit100 jay vyas added a comment -

          okay ! Here we go.

          • applies cleanly.
          • moved the tests to "smoke-tests".
          • removed whitespace and trailing newlines.
          • updated tests to use variables for the a hadoopVersion variable in submodules.

          *ready for final review now, and just tested *

          Show
          jayunit100 jay vyas added a comment - okay ! Here we go. applies cleanly. moved the tests to "smoke-tests". removed whitespace and trailing newlines. updated tests to use variables for the a hadoopVersion variable in submodules. *ready for final review now, and just tested *
          Hide
          cos Konstantin Boudnik added a comment -

          Thanks Jay! I think it's almost there. A couple of comments:

          • Good call on the hadoopVersion. However it seems to be unused in the smoke-tests/build.gradle where the dependencies are still have wrong version.
          • did you consider using a variable for other dependency versions? E.g. I see that you're using iTest artifact all over and its version is hardcoded. I am sure you can use a variable instead. And so on... We've been through this in Maven build - it is was painful enough to fix at the later time, so I don't want us to go through this again.
          • iTest version is 0.7.0 where's the version we are working on is 0.8.0-SNAPSHOT. Shall it be updated? Cause otherwise the TestUtils change won't be picked up.
          • please make sure that the change in bigtop-tests/test-artifacts/hadoop/src/main/groovy/org/apache/bigtop/itest/hadoop/mapreduce/TestHadoopSmoke.groovy isn't breaking old-fashioned way of the test execution
          • Please make the commit comment to be in the format
            JIRA-number. JIRA-synopsis
            Right now it says something else.
          Show
          cos Konstantin Boudnik added a comment - Thanks Jay! I think it's almost there. A couple of comments: Good call on the hadoopVersion. However it seems to be unused in the smoke-tests/build.gradle where the dependencies are still have wrong version. did you consider using a variable for other dependency versions? E.g. I see that you're using iTest artifact all over and its version is hardcoded. I am sure you can use a variable instead. And so on... We've been through this in Maven build - it is was painful enough to fix at the later time, so I don't want us to go through this again. iTest version is 0.7.0 where's the version we are working on is 0.8.0-SNAPSHOT. Shall it be updated? Cause otherwise the TestUtils change won't be picked up. please make sure that the change in bigtop-tests/test-artifacts/hadoop/src/main/groovy/org/apache/bigtop/itest/hadoop/mapreduce/TestHadoopSmoke.groovy isn't breaking old-fashioned way of the test execution Please make the commit comment to be in the format JIRA-number. JIRA-synopsis Right now it says something else.
          Hide
          jayunit100 jay vyas added a comment - - edited

          sure ! okay I can

          • remove the dependencies in build.gradle (they are unnecessary for now).
          • parameterize the itest.
          • And, I will also confirm that old TestHadoopSmoke still works.
          • fixup the commit message.
            will put another patch in tonite
          Show
          jayunit100 jay vyas added a comment - - edited sure ! okay I can remove the dependencies in build.gradle (they are unnecessary for now). parameterize the itest. And, I will also confirm that old TestHadoopSmoke still works. fixup the commit message. will put another patch in tonite
          Hide
          cos Konstantin Boudnik added a comment -

          also, it seems that hsqldb dep. is everywhere, but I suspect it is only required by hive tests. Please correct me if I am wrong

          Show
          cos Konstantin Boudnik added a comment - also, it seems that hsqldb dep. is everywhere, but I suspect it is only required by hive tests. Please correct me if I am wrong
          Hide
          jayunit100 jay vyas added a comment - - edited

          Actually, its for sqoop. But yup ! i can clean that up to ! one of the nice things is that it spins up an embedded DB, so know need for sqoop mysql instance. The sqoop commiters actually agreed with that approach

          Show
          jayunit100 jay vyas added a comment - - edited Actually, its for sqoop. But yup ! i can clean that up to ! one of the nice things is that it spins up an embedded DB, so know need for sqoop mysql instance. The sqoop commiters actually agreed with that approach
          Hide
          jayunit100 jay vyas added a comment -
          • removed extraneous deps in root build.gradle
          • parameterized itest.
          • confirmed that the old way of running tests still works https://gist.github.com/jayunit100/f224cab9da623738bad5 ... although I hope nobody uses that old frameork for basic smokes anymore
          • Also tested that the patch applies clean still .

          Ready again for for "final" review

          Show
          jayunit100 jay vyas added a comment - removed extraneous deps in root build.gradle parameterized itest. confirmed that the old way of running tests still works https://gist.github.com/jayunit100/f224cab9da623738bad5 ... although I hope nobody uses that old frameork for basic smokes anymore Also tested that the patch applies clean still . Ready again for for "final" review
          Hide
          cos Konstantin Boudnik added a comment -

          The sqoop commiters actually agreed with that approach

          I am not arguing with the approach - I am arguing about the need for this dependency for every test

          Show
          cos Konstantin Boudnik added a comment - The sqoop commiters actually agreed with that approach I am not arguing with the approach - I am arguing about the need for this dependency for every test
          Hide
          cos Konstantin Boudnik added a comment - - edited
          • ext.itestVersion should be '0.8.0-SNAPSHOT' as 0.8.0 hasn't been released yet
          • The commit message is still wrong as far as I can see. It says
            BIGTOP-1222. New Testing framework with ecosystem test improvements for pig/sqoop support for major hadoop components
            instead of
            BIGTOP-1222. Simplify and gradleize a subset of the bigtop smokes
          • do you think testCompile module('org.codehaus.groovy:groovy:1.8.0') could be also parametrized?
          Show
          cos Konstantin Boudnik added a comment - - edited ext.itestVersion should be '0.8.0-SNAPSHOT' as 0.8.0 hasn't been released yet The commit message is still wrong as far as I can see. It says BIGTOP-1222 . New Testing framework with ecosystem test improvements for pig/sqoop support for major hadoop components instead of BIGTOP-1222 . Simplify and gradleize a subset of the bigtop smokes do you think testCompile module('org.codehaus.groovy:groovy:1.8.0') could be also parametrized?
          Hide
          jayunit100 jay vyas added a comment - - edited
          • re sqoop, yup i realize you werent arguing the approach... just providing context for it (it was a while back when we had that idea).
          • okay , reattached w/ itest=0.8.0-SNAPSHOT i have a local installed version i called 0.8.0 which is why it worked.
          • i added a comment for folks so if itest breaks they can easily use another version.
          • can we push the groovy and other parameters fixes to another patch? The parameterization can be done better in another iteration.
            ...reattached * new patch *
          Show
          jayunit100 jay vyas added a comment - - edited re sqoop, yup i realize you werent arguing the approach... just providing context for it (it was a while back when we had that idea). okay , reattached w/ itest=0.8.0-SNAPSHOT i have a local installed version i called 0.8.0 which is why it worked. i added a comment for folks so if itest breaks they can easily use another version. can we push the groovy and other parameters fixes to another patch? The parameterization can be done better in another iteration. ...reattached * new patch *
          Hide
          cos Konstantin Boudnik added a comment -

          i added a comment for folks so if itest breaks they can easily use another version.

          I think a part of the release process is to grep/sed all -SNAPSHOTs into release version, but a comment won't hurt

          Let's open a JIRA for parametrization then to work on it in 0.9.0? As well let's have a ticket to integrate smokes into the top level build, ok?

          +1 on the patch and thanks for sticking with me on that!

          Show
          cos Konstantin Boudnik added a comment - i added a comment for folks so if itest breaks they can easily use another version. I think a part of the release process is to grep/sed all -SNAPSHOTs into release version, but a comment won't hurt Let's open a JIRA for parametrization then to work on it in 0.9.0? As well let's have a ticket to integrate smokes into the top level build, ok? +1 on the patch and thanks for sticking with me on that!
          Hide
          jayunit100 jay vyas added a comment -

          commited!
          thanks cos/roman/martin/doug for helping me review this and design these new tests... finally we have a new testing framework ! Hope others will join in and help me to improve and maintain it!
          Making follow up JIRAs now.

          Show
          jayunit100 jay vyas added a comment - commited! thanks cos/roman/martin/doug for helping me review this and design these new tests... finally we have a new testing framework ! Hope others will join in and help me to improve and maintain it! Making follow up JIRAs now.

            People

            • Assignee:
              cos Konstantin Boudnik
              Reporter:
              jayunit100 jay vyas
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development