Mahout
  1. Mahout
  2. MAHOUT-205

Pull Writable (and anything else hadoop dependent) out of the matrix module

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.1
    • Fix Version/s: 0.3
    • Component/s: Math
    • Labels:
      None
    • Environment:

      all

      Description

      Vector and Matrix extend Writable, and while that was merely poorly coupled before, it will be an actual problem now that we have a separate submodule for matrix: this module should not depend on hadoop at all, ideally. Distributed matrix work, as well as simply Writable wrappers can go somewhere else (where? core? yet another submodule which depends on matrix?), but it would be really nice if we could produce an artifact which doesn't require Hadoop which has our core linear primitives.

      1. MAHOUT-205.patch
        203 kB
        Jake Mannix
      2. MAHOUT-205.patch
        215 kB
        Jake Mannix
      3. MAHOUT-205.patch
        226 kB
        Jake Mannix

        Activity

        Hide
        Jake Mannix added a comment -

        Monster monster patch. So many diffs, yikes. mvn clean install passes on my laptop, however.

        I haven't tried to actually remove all hadoop dependencies in the maven world in this patch, I've just ripped Writable off of Vector and Matrix, which was a big enough undertaking, given the number of places that required not-IDE-based refactoring.

        I'd welcome any code review eyeballs on this, for things done which are obviously horrible. I may need to go put back in some tests, because I had to rip out a few IO-related tests and I may have forgotten to put all of them back.

        Show
        Jake Mannix added a comment - Monster monster patch. So many diffs, yikes. mvn clean install passes on my laptop, however. I haven't tried to actually remove all hadoop dependencies in the maven world in this patch, I've just ripped Writable off of Vector and Matrix, which was a big enough undertaking, given the number of places that required not-IDE-based refactoring. I'd welcome any code review eyeballs on this, for things done which are obviously horrible. I may need to go put back in some tests, because I had to rip out a few IO-related tests and I may have forgotten to put all of them back.
        Hide
        Jake Mannix added a comment -

        Ok, this new patch actually removes the hadoop dependency in mahout-math. All tests pass, but because tests had to be refactored a bit, I'm not 100% sure I've got exactly the coverage that was there before.

        I'll leave it up here for a bit, but I'm going to want to commit this pretty soon, because this patch and the one for MAHOUT-206 are pretty conflictory.

        Show
        Jake Mannix added a comment - Ok, this new patch actually removes the hadoop dependency in mahout-math. All tests pass, but because tests had to be refactored a bit, I'm not 100% sure I've got exactly the coverage that was there before. I'll leave it up here for a bit, but I'm going to want to commit this pretty soon, because this patch and the one for MAHOUT-206 are pretty conflictory.
        Hide
        Jake Mannix added a comment -

        It should be noted: the place the dependency went to is just mahout-core (where it was already). Serialization stuff seemed to belong in utils, but we have our dependencies running the other way, so that's a no-go.

        Show
        Jake Mannix added a comment - It should be noted: the place the dependency went to is just mahout-core (where it was already). Serialization stuff seemed to belong in utils, but we have our dependencies running the other way, so that's a no-go.
        Hide
        Sean Owen added a comment -

        Now, it may still be my environment, since I was still seeing compile problems. I completely reset my installation. After this patch I get errors like...

        /Users/srowen/Documents/Development/Mahout/core/src/main/java/org/apache/mahout/math/VectorWritable.java:[42,34] cannot find symbol
        symbol : constructor DenseVector(org.apache.mahout.math.Vector)
        location: class org.apache.mahout.math.DenseVector

        /Users/srowen/Documents/Development/Mahout/core/src/main/java/org/apache/mahout/math/DenseMatrixWritable.java:[30,17] cannot find symbol
        symbol : method rowSize()
        location: class org.apache.mahout.math.DenseMatrixWritable

        Does this make any sense?

        Show
        Sean Owen added a comment - Now, it may still be my environment, since I was still seeing compile problems. I completely reset my installation. After this patch I get errors like... /Users/srowen/Documents/Development/Mahout/core/src/main/java/org/apache/mahout/math/VectorWritable.java: [42,34] cannot find symbol symbol : constructor DenseVector(org.apache.mahout.math.Vector) location: class org.apache.mahout.math.DenseVector /Users/srowen/Documents/Development/Mahout/core/src/main/java/org/apache/mahout/math/DenseMatrixWritable.java: [30,17] cannot find symbol symbol : method rowSize() location: class org.apache.mahout.math.DenseMatrixWritable Does this make any sense?
        Hide
        Jake Mannix added a comment -

        Are you getting those during mvn clean install, or in your IDE, or where? I remember seeing Idea forget that even though VectorWritable and DenseMatrixWritable were in o.a.m.math, since they were in the mahout-core module, not mahout-math, that it ended up confused.

        I'm running the tests in the background on a clean checkout (you're using the newer of the two patches, right?), and no compile problems here (although I didn't blow away all of my .m2 folder... I guess I can try that next...) - Mac OS X, fwiw.

        Show
        Jake Mannix added a comment - Are you getting those during mvn clean install, or in your IDE, or where? I remember seeing Idea forget that even though VectorWritable and DenseMatrixWritable were in o.a.m.math, since they were in the mahout-core module, not mahout-math, that it ended up confused. I'm running the tests in the background on a clean checkout (you're using the newer of the two patches, right?), and no compile problems here (although I didn't blow away all of my .m2 folder... I guess I can try that next...) - Mac OS X, fwiw.
        Hide
        Jake Mannix added a comment -

        Ok, blew away ~/.m2, clean checkout with the later patch applied, and "mvn clean compile" ends with BUILD SUCCESSFUL (after 17m 9s, oof!). Anyone else try this patch to see if you see what Sean saw?

        Show
        Jake Mannix added a comment - Ok, blew away ~/.m2, clean checkout with the later patch applied, and "mvn clean compile" ends with BUILD SUCCESSFUL (after 17m 9s, oof!). Anyone else try this patch to see if you see what Sean saw?
        Hide
        Drew Farris added a comment -

        Works for me, with a clean checkout & latest patch applied.

        Yes, that build does a long time indeed. (for me: 12m6s)

        FWIW, here are some of the worst performing unit tests:

        Running org.apache.mahout.clustering.dirichlet.TestMapReduce
        Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 164.514 sec

        Running org.apache.mahout.clustering.kmeans.TestKmeansClustering
        Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 56.446 sec

        Running org.apache.mahout.clustering.fuzzykmeans.TestFuzzyKmeansClustering
        Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 58.853 sec

        (Seems like it would be reasonable to open another issue the remind one of us to look more closely at these)

        Show
        Drew Farris added a comment - Works for me, with a clean checkout & latest patch applied. Yes, that build does a long time indeed. (for me: 12m6s) FWIW, here are some of the worst performing unit tests: Running org.apache.mahout.clustering.dirichlet.TestMapReduce Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 164.514 sec Running org.apache.mahout.clustering.kmeans.TestKmeansClustering Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 56.446 sec Running org.apache.mahout.clustering.fuzzykmeans.TestFuzzyKmeansClustering Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 58.853 sec (Seems like it would be reasonable to open another issue the remind one of us to look more closely at these)
        Hide
        Sean Owen added a comment -

        Just checking, did this get committed? I know I had problems with the patch but hope someone else didn't. If not, is it still suitable for committing? I can try again.

        Show
        Sean Owen added a comment - Just checking, did this get committed? I know I had problems with the patch but hope someone else didn't. If not, is it still suitable for committing? I can try again.
        Hide
        jakes old (non-committer) account added a comment -

        It's not byet committed, because I was hoping at least one other person besides Drew could verify it builds properly... the patch might be a tiny bit out of date now, I'll check in a bit when I get to work.

        Show
        jakes old (non-committer) account added a comment - It's not byet committed, because I was hoping at least one other person besides Drew could verify it builds properly... the patch might be a tiny bit out of date now, I'll check in a bit when I get to work.
        Hide
        Jake Mannix added a comment -

        up to date patch, with Robin's most recent Vectorizer commits merged in.

        Show
        Jake Mannix added a comment - up to date patch, with Robin's most recent Vectorizer commits merged in.
        Hide
        Jake Mannix added a comment -

        and since tests pass for me with this, I'll commit it in a bit unless I hear an objection.

        Show
        Jake Mannix added a comment - and since tests pass for me with this, I'll commit it in a bit unless I hear an objection.
        Hide
        Jake Mannix added a comment -

        committed in revision 898669

        Show
        Jake Mannix added a comment - committed in revision 898669

          People

          • Assignee:
            Jake Mannix
            Reporter:
            Jake Mannix
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development