Bigtop
  1. Bigtop
  2. BIGTOP-1128

FIX and modularize mahout sample data sets

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: tests
    • Labels:
      None

      Description

      The mahout smokes have alot of dependencies

      Concretely, we need to fix the movie lens sample data which has moved....
      from http://www.grouplens.org/system/files/ml-1m.zip
      to http://files.grouplens.org/papers/ml-1m.zip

      Otherwise mahout smokes break for obvious reasons.

      More generally, consolidating and verifying these download URLs in a separate function might make for simpler debugging of the tests, otherwise, you get html documents stored as .zip files, which causes a very hard to interpret error in the tests, i.e. you get an exception about how the zip file isnt formatted correctly.

      Other Thoughts on how to simplify and isolate moving parts of mahout tests?
      We can bundle them into a patch. Would be a shame if the only thing this JIRA resulted in was a fix to a single URL ....

        Activity

        Hide
        Bruno Mahé added a comment -

        I just committed it. So feel free to make another patch
        Thanks!

        Show
        Bruno Mahé added a comment - I just committed it. So feel free to make another patch Thanks!
        Hide
        jay vyas added a comment - - edited

        Sure . Should i re-roll the patch/commit? Or is it already in. i dont mind .

        Or else i could just do another iteration (just a new JIRA/patch on top of this one).. Just let me know

        thanks for helping to get this patch in. should make life alot easier in case bigtop download targets or algorithm implementations shift around.

        Show
        jay vyas added a comment - - edited Sure . Should i re-roll the patch/commit? Or is it already in. i dont mind . Or else i could just do another iteration (just a new JIRA/patch on top of this one).. Just let me know thanks for helping to get this patch in. should make life alot easier in case bigtop download targets or algorithm implementations shift around.
        Hide
        Bruno Mahé added a comment -

        Also, please use the following commit format:
        BIGTOP-XXXX. <one liner description>
        <Anything you want>

        Show
        Bruno Mahé added a comment - Also, please use the following commit format: BIGTOP-XXXX. <one liner description> <Anything you want>
        Hide
        Bruno Mahé added a comment -

        +1
        Great!

        I don't see this in the original file and am no expert in groovy, so I won't complain too loudly but what about:

        • Would be great to use final for constants
        • Not mixing tabs and spaces. Popular IDEs can automatically format code to some defined guidelines. Maybe Apache Bigtop should define one?
        • +        assertEquals("Unable to create work dir in HCFS", 0, sh.getRet());

          -> I think you meant HDFS

        • Code is not indented. Some parts do not match. Would be great to have a uniform indentation.
        • Defining all constants at a top of a class (ie. ITERATIONS)
        Show
        Bruno Mahé added a comment - +1 Great! I don't see this in the original file and am no expert in groovy, so I won't complain too loudly but what about: Would be great to use final for constants Not mixing tabs and spaces. Popular IDEs can automatically format code to some defined guidelines. Maybe Apache Bigtop should define one? + assertEquals("Unable to create work dir in HCFS", 0, sh.getRet()); -> I think you meant HDFS Code is not indented. Some parts do not match. Would be great to have a uniform indentation. Defining all constants at a top of a class (ie. ITERATIONS)
        Hide
        jay vyas added a comment -

        this patch is ready for review ! it will

        1) run faster for movie lens data
        2) fail faster (if/when urls become obsolete)
        3) be more maintainable
        4) possibly help people to understand bigtop supported mahout operations, because of the commenting

        Show
        jay vyas added a comment - this patch is ready for review ! it will 1) run faster for movie lens data 2) fail faster (if/when urls become obsolete) 3) be more maintainable 4) possibly help people to understand bigtop supported mahout operations, because of the commenting
        Hide
        jay vyas added a comment -

        This patch
        1) Fixes the movie lens url which was recently moved
        2) Modularizes all file downloads to single function so they are easy to debug.
        3) Adds alot of necessary comments to tests
        4) Reduces test time for movie lenst by only running 2 iterations instead of
        5) Also parameterizes iterations in a variable that can be easily locally edited in the groovy script. (next iteration maybe add configuration support for all this so mahout smokes can run faster, i.e. for the clustering jobs)

        Show
        jay vyas added a comment - This patch 1) Fixes the movie lens url which was recently moved 2) Modularizes all file downloads to single function so they are easy to debug. 3) Adds alot of necessary comments to tests 4) Reduces test time for movie lenst by only running 2 iterations instead of 5) Also parameterizes iterations in a variable that can be easily locally edited in the groovy script. (next iteration maybe add configuration support for all this so mahout smokes can run faster, i.e. for the clustering jobs)
        Hide
        jay vyas added a comment -

        Okay, ive got a patch for this that im testing now...

        ..any initial thoughts?

        https://github.com/jayunit100/bigtop/commit/c83a0d9b3bd76a3db4859fd9107bce27b00e237a

        Show
        jay vyas added a comment - Okay, ive got a patch for this that im testing now... ..any initial thoughts? https://github.com/jayunit100/bigtop/commit/c83a0d9b3bd76a3db4859fd9107bce27b00e237a

          People

          • Assignee:
            jay vyas
            Reporter:
            jay vyas
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development