Bigtop
  1. Bigtop
  2. BIGTOP-1052

Increase environment configurability/debugging of Mahout Tests

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.7.0
    • Component/s: tests
    • Labels:
      None

      Description

      The TestMahoutExamples.groovy file is very comprehensive but also very long and complex. There are other JIRA's that suggest that some of the tests in it fail. It is different from TestHadoopExamples in that it is not easy to comment out the subtests to only run a subset.

      I suggest

      1) [TO BE MOVED TO A SEPARATE JIRA] Refactoring tests into separate groovy scripts so that they can be filtered in the Pom, using the includes. i.e.

      <includes>
      <include>**/Test*MovieLens</include>
      </includes>

      And as another convenience for the purpose of easing the running of these tests, I suggest

      2) [THIS JIRA: 1052] Utilizing MAHOUT_HOME environmental variable in the tests if it is provided, so that the mahout package installation can be tested directly from tarballs.

        Activity

        Hide
        jay vyas added a comment -

        An alternative could be using the "@Category" annotations for some of the tests, and then having each test manage its own download of input data

        Show
        jay vyas added a comment - An alternative could be using the "@Category" annotations for some of the tests, and then having each test manage its own download of input data
        Hide
        Roman Shaposhnik added a comment -

        Sounds like a good idea to me!

        Show
        Roman Shaposhnik added a comment - Sounds like a good idea to me!
        Hide
        jay vyas added a comment - - edited

        which one : separating groovy scripts and using MAHOUT_HOME (idea 1 and 2) or using @Category notations (idea 3) ...

        or just some reasonable combination of all of the above which makes the tests easier to run ?

        Show
        jay vyas added a comment - - edited which one : separating groovy scripts and using MAHOUT_HOME (idea 1 and 2) or using @Category notations (idea 3) ... or just some reasonable combination of all of the above which makes the tests easier to run ?
        Hide
        Roman Shaposhnik added a comment -

        Looks like our comments crossed. Yes – I meant that anything that makes our tests easier to run would be a good thing. If we can get to a point where we can easily run them "from the source" that would be even better. As for @Category – I haven't really used it much so I don't happen to have much feedback here.

        Show
        Roman Shaposhnik added a comment - Looks like our comments crossed. Yes – I meant that anything that makes our tests easier to run would be a good thing. If we can get to a point where we can easily run them "from the source" that would be even better. As for @Category – I haven't really used it much so I don't happen to have much feedback here.
        Hide
        jay vyas added a comment -

        Do you mean, run them from source without the jar building step? I would think this should be doable... Im not sure how that would effect the maven lifecycle though - im assuming running from Jars is baked into the whole build right?

        Show
        jay vyas added a comment - Do you mean, run them from source without the jar building step? I would think this should be doable... Im not sure how that would effect the maven lifecycle though - im assuming running from Jars is baked into the whole build right?
        Hide
        Roman Shaposhnik added a comment -

        I guess what I have in mind is to have a script that would maintain an illusion that you could just run them from the source. If you think about it, that's exactly what Groovy scripts do. Supposed you have:

        #!/usr/bin/env groovy
        println("Hello world")
        

        Groovy runtime doesn't really create any jars/class files at all in this particular case, the compilation
        just happens dynamically.

        Show
        Roman Shaposhnik added a comment - I guess what I have in mind is to have a script that would maintain an illusion that you could just run them from the source. If you think about it, that's exactly what Groovy scripts do. Supposed you have: #!/usr/bin/env groovy println("Hello world") Groovy runtime doesn't really create any jars/class files at all in this particular case, the compilation just happens dynamically.
        Hide
        jay vyas added a comment - - edited

        Ah yes - now to clarify:

        I think these tests could run in pure scripts, but, you wont get all of the pretty exception assertion reporting from failsafe.

        That seems okay and makes the bigtop smokes much more useable.

        Of course, for the mvn failsafe goal, you could still run them in failsafe, so that you have the best of both worlds:

        1) Pure groovy mahout tests that run as independant scripts

        2) Java/Failsafe/JUNIT freindly tests which integrate with failsafe in a way that maven builds can still report success/failures on individual tests

        Show
        jay vyas added a comment - - edited Ah yes - now to clarify: I think these tests could run in pure scripts, but, you wont get all of the pretty exception assertion reporting from failsafe. That seems okay and makes the bigtop smokes much more useable. Of course, for the mvn failsafe goal, you could still run them in failsafe, so that you have the best of both worlds: 1) Pure groovy mahout tests that run as independant scripts 2) Java/Failsafe/JUNIT freindly tests which integrate with failsafe in a way that maven builds can still report success/failures on individual tests
        Hide
        jay vyas added a comment -

        Okay I have modified this JIRA name to specifically address the first phase of improvements, and tested this file on a four node cluster. It works . The changes i made in this patch proxy all jobs to an assertRun method, which debugs the exact command+ asserts the return code = 0. Also it uses the MAHOUT_HOME Environmental variable, and I've confirmed that also (because my cluster doesn't have /usr/bin/mahout, so that clearly is working as well). Here is the patch. I also fixed the formatting some, hence the large diff log.

        Show
        jay vyas added a comment - Okay I have modified this JIRA name to specifically address the first phase of improvements, and tested this file on a four node cluster. It works . The changes i made in this patch proxy all jobs to an assertRun method, which debugs the exact command+ asserts the return code = 0. Also it uses the MAHOUT_HOME Environmental variable, and I've confirmed that also (because my cluster doesn't have /usr/bin/mahout, so that clearly is working as well). Here is the patch. I also fixed the formatting some, hence the large diff log.
        Hide
        Roman Shaposhnik added a comment -

        +1. LGTM. Please let me know if the patch is all you want on this JIRA and I can commit it then.

        Show
        Roman Shaposhnik added a comment - +1. LGTM. Please let me know if the patch is all you want on this JIRA and I can commit it then.
        Hide
        jay vyas added a comment -

        Yes, this completes this JIRA . Another JIRA/patch on the way to decouple the tests once i think of a better way to do so.

        Thanks!

        Show
        jay vyas added a comment - Yes, this completes this JIRA . Another JIRA/patch on the way to decouple the tests once i think of a better way to do so. Thanks!
        Hide
        Roman Shaposhnik added a comment -

        And committed!

        Show
        Roman Shaposhnik added a comment - And committed!

          People

          • Assignee:
            jay vyas
            Reporter:
            jay vyas
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development