Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      Provide H2O backend for the Mahout DSL

        Activity

        Anand Avati created issue -
        Hide
        Sebastian Schelter added a comment -

        Hi Anand,
        great to see this being started. Could you provide a short description how you see the integration coming into place?

        I'd like to get the big picture and understand which pieces of h2o and its algorithms you wish to move to mahout and which not.

        Show
        Sebastian Schelter added a comment - Hi Anand, great to see this being started. Could you provide a short description how you see the integration coming into place? I'd like to get the big picture and understand which pieces of h2o and its algorithms you wish to move to mahout and which not.
        Hide
        Anand Avati added a comment -

        Sebastian, at this point I am exploring things (mostly digging around the internals of Mahout to understand what are the possible points of integration). At a high level, the thinking is that Mahout will be "depending" on h2o (like how it "depends on" Spark), and there will be enough infrastructure implementations (like Matrix, Vector, possibly Job) which would allow for existing algorithms to be easily refactored to use H2O in place of (or along with), say, Hadoop/MR and/or DistributedRowMatrix etc.

        As I said, very much open to hear feedback and thoughts on integration patterns.

        Show
        Anand Avati added a comment - Sebastian, at this point I am exploring things (mostly digging around the internals of Mahout to understand what are the possible points of integration). At a high level, the thinking is that Mahout will be "depending" on h2o (like how it "depends on" Spark), and there will be enough infrastructure implementations (like Matrix, Vector, possibly Job) which would allow for existing algorithms to be easily refactored to use H2O in place of (or along with), say, Hadoop/MR and/or DistributedRowMatrix etc. As I said, very much open to hear feedback and thoughts on integration patterns.
        Hide
        Dmitriy Lyubimov added a comment -

        out will be "depending" on h2o (like how it "depends on" Spark), and there will be enough infrastructure implementations (like Matrix, Vector, possibly Job) which would allow for existing algorithms to be easily refactored to use H2O in place of (or along with), say, Hadoop/MR and/or DistributedRowMatrix etc.

        @Anand: Mahout does not "depend on spark" at the Matrix and Vector API. Instead, integration with Spark is on physical plan operator layer and a completely separate logical layer matrix representation (DrmLike, etc) in order to cleanly separate "shared mem" and "shared nothing" use cases. And of course, no Spark actual dependencies ever sip into mahout-math module. We actually spent a lot of effort to unmarry that module from even Hadoop dependencies, IIRC. I expect it to stay the same.

        o.a.m.math.Matrix and Vector API are reserved for in-core operations only, and all algorithms around it are built assuming "shared memory" model (i.e. they don't see it as a problem to iterate over all non-zeros in a single thread). Dumping "shared nothing" and "shared mem" use cases into single api in my not so humble opinion makes no sense to me (unless the proposal is to work towards "unholy mess" architectural standards.)

        This would be confusing to devs to no end. No algorithm IMO can be written to be completely agnostic of "shared-mem" vs. "shared nothing" issues. I.e. distributed functional stuff will be able of course to work in a single machine, but this simply amounts to logic "write everything as if it is distributed using FP", so this is not the answer.

        So -1 on this. This is not nearly the same as how Spark was integrated.

        My suggestion is to either integrate with linear algebra optimizer at physical layer (which it seems to be quite impossible to me today because of h2o programming model), absent of which i'd suggest to start on completely yet-another set of "shared-nothing" api just like it was done for Spark. Of course, we'd be incoherent here once again, which is why i'd not like even this – this might as well be a happily standalone or contrib project with no common parts.

        Messing with Job API is less objectionable I guess, since Job is a shared-nothing api to begin with; however, you are providing too few details to make a sensible opinion on this, so -0 on this at this point.

        Show
        Dmitriy Lyubimov added a comment - out will be "depending" on h2o (like how it "depends on" Spark), and there will be enough infrastructure implementations (like Matrix, Vector, possibly Job) which would allow for existing algorithms to be easily refactored to use H2O in place of (or along with), say, Hadoop/MR and/or DistributedRowMatrix etc. @Anand: Mahout does not "depend on spark" at the Matrix and Vector API. Instead, integration with Spark is on physical plan operator layer and a completely separate logical layer matrix representation (DrmLike, etc) in order to cleanly separate "shared mem" and "shared nothing" use cases. And of course, no Spark actual dependencies ever sip into mahout-math module. We actually spent a lot of effort to unmarry that module from even Hadoop dependencies, IIRC. I expect it to stay the same. o.a.m.math.Matrix and Vector API are reserved for in-core operations only, and all algorithms around it are built assuming "shared memory" model (i.e. they don't see it as a problem to iterate over all non-zeros in a single thread). Dumping "shared nothing" and "shared mem" use cases into single api in my not so humble opinion makes no sense to me (unless the proposal is to work towards "unholy mess" architectural standards.) This would be confusing to devs to no end. No algorithm IMO can be written to be completely agnostic of "shared-mem" vs. "shared nothing" issues. I.e. distributed functional stuff will be able of course to work in a single machine, but this simply amounts to logic "write everything as if it is distributed using FP", so this is not the answer. So -1 on this. This is not nearly the same as how Spark was integrated. My suggestion is to either integrate with linear algebra optimizer at physical layer (which it seems to be quite impossible to me today because of h2o programming model), absent of which i'd suggest to start on completely yet-another set of "shared-nothing" api just like it was done for Spark. Of course, we'd be incoherent here once again, which is why i'd not like even this – this might as well be a happily standalone or contrib project with no common parts. Messing with Job API is less objectionable I guess, since Job is a shared-nothing api to begin with; however, you are providing too few details to make a sensible opinion on this, so -0 on this at this point.
        Hide
        Anand Avati added a comment -

        @Dmitry: When I said "depends on", I was only replying to the question from Sebastian – "I'd like to get the big picture and understand which pieces of h2o and its algorithms you wish to move to mahout and which not" – implying that no pieces of h2o was planned to be "moved", but have a project level "dependency" (in maven build, and distribution etc.) – just like how the Spark bindings work was done. Hope that clarifies.

        Show
        Anand Avati added a comment - @Dmitry: When I said "depends on", I was only replying to the question from Sebastian – "I'd like to get the big picture and understand which pieces of h2o and its algorithms you wish to move to mahout and which not" – implying that no pieces of h2o was planned to be "moved", but have a project level "dependency" (in maven build, and distribution etc.) – just like how the Spark bindings work was done. Hope that clarifies.
        Hide
        Dmitriy Lyubimov added a comment -

        PS. Spark integration provides identical intersection of some algebraic operators (such as slicing, multiplication, elementwise, summaries etc.etc.) in Scala, which amounts to single domain-specific semantics, but they diverge significantly in the operations that govern lifecycle, persistence and functional programming.

        Show
        Dmitriy Lyubimov added a comment - PS. Spark integration provides identical intersection of some algebraic operators (such as slicing, multiplication, elementwise, summaries etc.etc.) in Scala, which amounts to single domain-specific semantics, but they diverge significantly in the operations that govern lifecycle, persistence and functional programming.
        Hide
        Dmitriy Lyubimov added a comment - - edited

        ... mplying that no pieces of h2o was planned to be "moved", but have a project level "dependency" (in maven build, and distribution etc.) – just like how the Spark bindings work was done. Hope that clarifies.

        Yes. i understood that. I actually don't object that. But if you try to read again, you'd see that's not what i was objecting in your explanation.

        Jira is usually N-way discussion, but if you don't like me quoting your answers to specific comments, i object to this approach specifically –

        Start with providing implementations of AbstractMatrix and AbstractVector, and more as we make progress.

        Show
        Dmitriy Lyubimov added a comment - - edited ... mplying that no pieces of h2o was planned to be "moved", but have a project level "dependency" (in maven build, and distribution etc.) – just like how the Spark bindings work was done. Hope that clarifies. Yes. i understood that. I actually don't object that. But if you try to read again, you'd see that's not what i was objecting in your explanation. Jira is usually N-way discussion, but if you don't like me quoting your answers to specific comments, i object to this approach specifically – Start with providing implementations of AbstractMatrix and AbstractVector, and more as we make progress.
        Hide
        Dmitriy Lyubimov added a comment -

        o.a.m.math.Matrix and Vector API are reserved for in-core operations only, and all algorithms around it are built assuming "shared memory" model

        In particular, to that end, the cost-optimized approach for in-core math vector operations written by Robin assumes in-core only cost models. It absolutely isn't compatible with any notion of a distributed vector, and messing with these cost optimizations is the last thing i'd like to suggest here. Not to mention outer user block algorithms.

        Which is a strong argument as to why those two approaches to cost optimization (in-core vs. distributed) should really be kept strictly apart in two separate wards with cotton blankets on the walls and under 24hr monitoring.

        Show
        Dmitriy Lyubimov added a comment - o.a.m.math.Matrix and Vector API are reserved for in-core operations only, and all algorithms around it are built assuming "shared memory" model In particular, to that end, the cost-optimized approach for in-core math vector operations written by Robin assumes in-core only cost models. It absolutely isn't compatible with any notion of a distributed vector, and messing with these cost optimizations is the last thing i'd like to suggest here. Not to mention outer user block algorithms. Which is a strong argument as to why those two approaches to cost optimization (in-core vs. distributed) should really be kept strictly apart in two separate wards with cotton blankets on the walls and under 24hr monitoring.
        Hide
        Anand Avati added a comment -

        Thanks for your feedback, Dmitry.

        Now it seems to me (with my limited exploring of Mahout) that it might actually be viable to provide a "hadoop alternative" in the form of an alternate implementation of DistributedRowMatrix (instead of AbstractMatrix) and AbstractJob (by internally using h2o's Frame/Vec and MRTask2 APIs), and thereby allow for a runtime choice of Hadoop vs H2O. This seems like a reasonable first step?

        Show
        Anand Avati added a comment - Thanks for your feedback, Dmitry. Now it seems to me (with my limited exploring of Mahout) that it might actually be viable to provide a "hadoop alternative" in the form of an alternate implementation of DistributedRowMatrix (instead of AbstractMatrix) and AbstractJob (by internally using h2o's Frame/Vec and MRTask2 APIs), and thereby allow for a runtime choice of Hadoop vs H2O. This seems like a reasonable first step?
        Hide
        Dmitriy Lyubimov added a comment - - edited

        Now it seems to me (with my limited exploring of Mahout) that it might actually be viable to provide a "hadoop alternative" in the form of an alternate implementation of DistributedRowMatrix (instead of AbstractMatrix)

        yes that's what i meant. On Scala side, this is done by introducing mix-ins DrmLike, RLikeOps, RLikeDrmOps, RLikeVectorOps etc.etc. On java side, working with mix-ins (functionality-filled traits) is of course not easy, but the important point is that it should be an alternative hierarchy with an identical intersection of optimized linalg operators (operator-oriented semantics in linear algebra).

        I. e. assumption is that to the end user (developer) it is more important that notation

        a dot b
        

        means exactly the same regardless of whether a and b in-core or distributed; but it matters significantly less whether a and b descend from different hierarchies (e.g. Matrix or DRM), as long as operator dot(A,B) is defined for all possible type combinations (sparse, dense, distributed).

        and AbstractJob (by internally using h2o's Frame/Vec and MRTask2 APIs), and thereby allow for a runtime choice of Hadoop vs H2O.

        I care significantly less about Job api and Hadoop MR in particular. It is my belief they are non-essential to the math user and therefore should be avoided altogether (and such notion is eliminated in Spark Bindings)

        This seems like a reasonable first step?

        Yes – with caveat that logical mix-ins for distributed and in-core already exists in Scala and Spark Bindings. Like i said, ideally mapping this logical layer into a particular physical layer seems to be an indefinitely better architecture to me, than creating yet-another logical layer specific to a particular back. However, i see that it would be hard to converge on that, or at least i don't see how. I will extract an architecture slide from my talk and post a link to illustrate the idea a bit later.

        Show
        Dmitriy Lyubimov added a comment - - edited Now it seems to me (with my limited exploring of Mahout) that it might actually be viable to provide a "hadoop alternative" in the form of an alternate implementation of DistributedRowMatrix (instead of AbstractMatrix) yes that's what i meant. On Scala side, this is done by introducing mix-ins DrmLike, RLikeOps, RLikeDrmOps, RLikeVectorOps etc.etc. On java side, working with mix-ins (functionality-filled traits) is of course not easy, but the important point is that it should be an alternative hierarchy with an identical intersection of optimized linalg operators (operator-oriented semantics in linear algebra). I. e. assumption is that to the end user (developer) it is more important that notation a dot b means exactly the same regardless of whether a and b in-core or distributed; but it matters significantly less whether a and b descend from different hierarchies (e.g. Matrix or DRM), as long as operator dot(A,B) is defined for all possible type combinations (sparse, dense, distributed). and AbstractJob (by internally using h2o's Frame/Vec and MRTask2 APIs), and thereby allow for a runtime choice of Hadoop vs H2O. I care significantly less about Job api and Hadoop MR in particular. It is my belief they are non-essential to the math user and therefore should be avoided altogether (and such notion is eliminated in Spark Bindings) This seems like a reasonable first step? Yes – with caveat that logical mix-ins for distributed and in-core already exists in Scala and Spark Bindings. Like i said, ideally mapping this logical layer into a particular physical layer seems to be an indefinitely better architecture to me, than creating yet-another logical layer specific to a particular back. However, i see that it would be hard to converge on that, or at least i don't see how. I will extract an architecture slide from my talk and post a link to illustrate the idea a bit later.
        Hide
        Dmitriy Lyubimov added a comment -

        link to component stack in Bindings https://issues.apache.org/jira/secure/attachment/12638098/BindingsStack.jpg

        illustrating how logical level for distributed matrix is unified accross engines.

        Show
        Dmitriy Lyubimov added a comment - link to component stack in Bindings https://issues.apache.org/jira/secure/attachment/12638098/BindingsStack.jpg illustrating how logical level for distributed matrix is unified accross engines.
        Hide
        Anand Avati added a comment -

        Dmitry, I am trying to fit that diagram with the source code.

        • What is the "Algebraic DSL"? Is that the one which came with the scala bindings (with "%*%" operator etc.)?
        • Today, what distinguishes "Logical translation layer" vs "Physical translation layer" in the code? What parts of the code is considered to be the "Logical translation layer"?
        • Is the selection of "physical translation layer" a run-time decision?

        Just trying to make sure I don't make wrong assumptions.

        Show
        Anand Avati added a comment - Dmitry, I am trying to fit that diagram with the source code. What is the "Algebraic DSL"? Is that the one which came with the scala bindings (with "%*%" operator etc.)? Today, what distinguishes "Logical translation layer" vs "Physical translation layer" in the code? What parts of the code is considered to be the "Logical translation layer"? Is the selection of "physical translation layer" a run-time decision? Just trying to make sure I don't make wrong assumptions.
        Hide
        Dmitriy Lyubimov added a comment -

        What is the "Algebraic DSL"? Is that the one which came with the scala bindings (with "%*%" operator etc.)?

        There are two sets of operators – for mahout-math (in-core), i call it scala bindings and it is in the math-scala. It doesn't do much actually but just providing a syntactic sugar for passing off things to in-core cost-based optimizers (where they are implemented).

        The second set of DSL is for (looking identically to in-core set of operators) is for distributed stuff. (on diagram those two are not visually separated other than there's just part of it over in-core and part of it over distributed optimizer).

        Today, what distinguishes "Logical translation layer" vs "Physical translation layer" in the code? What parts of the code is considered to be the "Logical translation layer"?

        Well you need to keep in perspective that distributed optimizer part was done in like 3 days and it is now fairly tightly bound to spark code so separation at this point is not very clean until we introduce another engine (which is coming). Obviously at the time of introducing second engine, this needs to be abstracted in a separate module without spark dependencies.

        Logical translation is everything in drm.plan (operators implementing DrmLike[] ).
        Physical translation to Spark is CheckpointedDrm, CheckpointAction and everything in blas package (actual spark specific support for physical plan after optimization run).

        Is the selection of "physical translation layer" a run-time decision?

        yes it is run time optimizer action based on operand types, geometry (size), orientation and partitioning. (very similar in fact to what happens in Pig graph, except such graph rewrites are much more elegant in Scala).

        Show
        Dmitriy Lyubimov added a comment - What is the "Algebraic DSL"? Is that the one which came with the scala bindings (with "%*%" operator etc.)? There are two sets of operators – for mahout-math (in-core), i call it scala bindings and it is in the math-scala. It doesn't do much actually but just providing a syntactic sugar for passing off things to in-core cost-based optimizers (where they are implemented). The second set of DSL is for (looking identically to in-core set of operators) is for distributed stuff. (on diagram those two are not visually separated other than there's just part of it over in-core and part of it over distributed optimizer). Today, what distinguishes "Logical translation layer" vs "Physical translation layer" in the code? What parts of the code is considered to be the "Logical translation layer"? Well you need to keep in perspective that distributed optimizer part was done in like 3 days and it is now fairly tightly bound to spark code so separation at this point is not very clean until we introduce another engine (which is coming). Obviously at the time of introducing second engine, this needs to be abstracted in a separate module without spark dependencies. Logical translation is everything in drm.plan (operators implementing DrmLike[] ). Physical translation to Spark is CheckpointedDrm, CheckpointAction and everything in blas package (actual spark specific support for physical plan after optimization run). Is the selection of "physical translation layer" a run-time decision? yes it is run time optimizer action based on operand types, geometry (size), orientation and partitioning. (very similar in fact to what happens in Pig graph, except such graph rewrites are much more elegant in Scala).
        Hide
        Dmitriy Lyubimov added a comment -

        PS in java/MR world, DRM also already has representation, it is DistributedRowMatrix class, and it is also totally separate hierarchy from in-core hierarchy of things. However, it doesn't have a notion of optimizer and algebraic expression checkpoints, so it was not coherently usable for that approach.

        Show
        Dmitriy Lyubimov added a comment - PS in java/MR world, DRM also already has representation, it is DistributedRowMatrix class, and it is also totally separate hierarchy from in-core hierarchy of things. However, it doesn't have a notion of optimizer and algebraic expression checkpoints, so it was not coherently usable for that approach.
        Hide
        Dmitriy Lyubimov added a comment - - edited

        @Anand, Bottom line, the core of AbstractMatrix and Vector is elementwise iterators and direct element accessors. Lacking closure(functional) programming, they don't work for the distributed stuff.

        There are two ways with such approach: either declare core abstractions unsupported in distributed implementation, which just proves AbstractMatrix and Vector are not good abstractions for that work. (why would one need an abstraction, if its major and core contracts are all of a sudden declared optional or deprecated).

        Truth to be told, there is some Matrix api that uses FP – two major things are aggregate() and assign(). However, this still doesn't get us anywhere in a sense that we should support all core contracts, not just assign() and aggregate().

        Another way of going about it is to heavily refactor core abstraction in favor of functional support, while deprecating or eliminating direct access. I call this "nuclear option". Because it sends ripple effects not only thru Mahout, but thru any 3rd party code that uses mahout-math. (in my case specifically). It will force people reconsider using mahout because of stability issues in the areas where it was promised to be stable.

        Extending DistributedRowMatrix api.. I kind of dubious about it as well, since it is also unusable without major FP infusion, and frankly kind of ancient.

        More likely, a completely new FP-laced distributed Matrix representation is desired. SparkBindings went that path and created FP-laced DRM api. But this is entirely Scala side abstraction, with Scala function literals etc. So if you are looking to create a java distributed matrix abstraction, this is not going to be useful at all either.

        So more likely, you need a completely new FP-oriented java API interface. Something like X2OMatrix.java. This will fragment project even further, but all marketing fluff excluding, that's the only realistic option i see that might work.

        I would also question (kinda) the wisdom of a standalone distributed vector abstraction. On Hadoop side and spark side this abstraction is completely bypassed (it is assumed that real vector will always fit into single machine memory). In situations where vector might be formed as a result of distributed operation (e.g. A %*% x) the result is simply a distributed single-column matrix, from which the column can be always collected in front end via collection/slicing api.

        Show
        Dmitriy Lyubimov added a comment - - edited @Anand, Bottom line, the core of AbstractMatrix and Vector is elementwise iterators and direct element accessors. Lacking closure(functional) programming, they don't work for the distributed stuff. There are two ways with such approach: either declare core abstractions unsupported in distributed implementation, which just proves AbstractMatrix and Vector are not good abstractions for that work. (why would one need an abstraction, if its major and core contracts are all of a sudden declared optional or deprecated). Truth to be told, there is some Matrix api that uses FP – two major things are aggregate() and assign(). However, this still doesn't get us anywhere in a sense that we should support all core contracts, not just assign() and aggregate(). Another way of going about it is to heavily refactor core abstraction in favor of functional support, while deprecating or eliminating direct access. I call this "nuclear option". Because it sends ripple effects not only thru Mahout, but thru any 3rd party code that uses mahout-math. (in my case specifically). It will force people reconsider using mahout because of stability issues in the areas where it was promised to be stable. Extending DistributedRowMatrix api.. I kind of dubious about it as well, since it is also unusable without major FP infusion, and frankly kind of ancient. More likely, a completely new FP-laced distributed Matrix representation is desired. SparkBindings went that path and created FP-laced DRM api. But this is entirely Scala side abstraction, with Scala function literals etc. So if you are looking to create a java distributed matrix abstraction, this is not going to be useful at all either. So more likely, you need a completely new FP-oriented java API interface. Something like X2OMatrix.java. This will fragment project even further, but all marketing fluff excluding, that's the only realistic option i see that might work. I would also question (kinda) the wisdom of a standalone distributed vector abstraction. On Hadoop side and spark side this abstraction is completely bypassed (it is assumed that real vector will always fit into single machine memory). In situations where vector might be formed as a result of distributed operation (e.g. A %*% x) the result is simply a distributed single-column matrix, from which the column can be always collected in front end via collection/slicing api.
        Hide
        Dmitriy Lyubimov added a comment -

        After reviewing the newly announced https://github.com/tdunning/h2o-matrix and making a willful conjecture that it is what this issue is about (since it is still not explicitly confirmed on this Jira), I am changing my vote to -0.

        Here are components of my vote.

        (1) +1 Do-ocracy – those who willing do things, and (what is especially important in our case) provide continued support for it, deserve componential +1 to begin with.
        (2) Big +1 on using h20 as external dependency. I don't think we want to be in business of creating, maintaining, or merging with distributed execution engines, we should be just translating high level ML semantics to them.
        (3) +0 in-core API stability: This work must not change or deprecate in-core API contracts thus forcing existing mahout-math users to do unreasonable migration and refactoring steps and/or experience performance decline. Mahout-math is one of the few still very valuable components, this is important. (Current state of the things do not introduce such changes).
        (4) +0 in-core API augmentation. This work must not create API duplication (alternatives to existing contracts) or augmented API contracts that are either not adequately backed by the existing multitude of in-core matrix types or do not make sense for in-core structures. (Current state of the things does not introduce such changes).
        (5) -1 I still maintain that major Matrix and Vector in-core contracts do not provide adequate basis, nor are a good fit for for building shared-nothing generic environment. Thus, further partitioning of Matrix and Vector contract sets is required If distributed structures must share same hierarchy base with in-core ones. However, doing so will contradict positions (3) and (4) above. Which is why i maintain that the least painful way to address those is to create a separate hierarchy base for H20Matrix which would intersect some of high-level algebraic contracts with in-core contracts while bearing identical semantics.

        This concern seems to be shared even by the authors of the code, if I am not misinterpreting the meaning of the comments here.

        "H2OMatrix.java"
        // Single-element accessors.  Calling these likely indicates a huge performance bug.
          @Override public double getQuick(int row, int column) { return _fr.vecs()[column].at(row); }
          @Override public void setQuick(int row, int column, double value) { _fr.vecs()[column].set(row,value); _fr.vecs()[column].
        

        I reserve the right to change my vote if components of my vote are affected by future changes.
        I will not raise objections or add points based on performance.

        Show
        Dmitriy Lyubimov added a comment - After reviewing the newly announced https://github.com/tdunning/h2o-matrix and making a willful conjecture that it is what this issue is about (since it is still not explicitly confirmed on this Jira), I am changing my vote to -0. Here are components of my vote. (1) +1 Do-ocracy – those who willing do things, and (what is especially important in our case) provide continued support for it, deserve componential +1 to begin with. (2) Big +1 on using h20 as external dependency. I don't think we want to be in business of creating, maintaining, or merging with distributed execution engines, we should be just translating high level ML semantics to them. (3) +0 in-core API stability: This work must not change or deprecate in-core API contracts thus forcing existing mahout-math users to do unreasonable migration and refactoring steps and/or experience performance decline. Mahout-math is one of the few still very valuable components, this is important. (Current state of the things do not introduce such changes). (4) +0 in-core API augmentation. This work must not create API duplication (alternatives to existing contracts) or augmented API contracts that are either not adequately backed by the existing multitude of in-core matrix types or do not make sense for in-core structures. (Current state of the things does not introduce such changes). (5) -1 I still maintain that major Matrix and Vector in-core contracts do not provide adequate basis, nor are a good fit for for building shared-nothing generic environment. Thus, further partitioning of Matrix and Vector contract sets is required If distributed structures must share same hierarchy base with in-core ones. However, doing so will contradict positions (3) and (4) above. Which is why i maintain that the least painful way to address those is to create a separate hierarchy base for H20Matrix which would intersect some of high-level algebraic contracts with in-core contracts while bearing identical semantics. This concern seems to be shared even by the authors of the code, if I am not misinterpreting the meaning of the comments here. "H2OMatrix.java" // Single-element accessors. Calling these likely indicates a huge performance bug. @Override public double getQuick( int row, int column) { return _fr.vecs()[column].at(row); } @Override public void setQuick( int row, int column, double value) { _fr.vecs()[column].set(row,value); _fr.vecs()[column]. I reserve the right to change my vote if components of my vote are affected by future changes. I will not raise objections or add points based on performance.
        Hide
        Sebastian Schelter added a comment -

        What's the status here?

        Show
        Sebastian Schelter added a comment - What's the status here?
        Hide
        Ted Dunning added a comment -

        The h2o integration work has been progressing nicely. It is located at

        https://github.com/tdunning/h2o-matrix

        The rationale for doing the work externally is largely the non-technical opposition from Dmitriy.

        The current status is that a reasonably performance implementation of the basic java math API which is sufficient for coding up a basic k-means. This work will be progressing to integration within the Scala DSL as well as basic implementations of other algorithms such as SSVD.

        Once basic integration with the Scala DSL works sufficiently for the test piece algorithms, it will make sense to bring this work back into Mahout.

        So far, communications have been handled by direct email. This is somewhat unsatisfactory in that the discussions are not publicly visible. I expect that as soon as the work comes back into Mahout itself, this issue will resolve itself.

        Show
        Ted Dunning added a comment - The h2o integration work has been progressing nicely. It is located at https://github.com/tdunning/h2o-matrix The rationale for doing the work externally is largely the non-technical opposition from Dmitriy. The current status is that a reasonably performance implementation of the basic java math API which is sufficient for coding up a basic k-means. This work will be progressing to integration within the Scala DSL as well as basic implementations of other algorithms such as SSVD. Once basic integration with the Scala DSL works sufficiently for the test piece algorithms, it will make sense to bring this work back into Mahout. So far, communications have been handled by direct email. This is somewhat unsatisfactory in that the discussions are not publicly visible. I expect that as soon as the work comes back into Mahout itself, this issue will resolve itself.
        Hide
        Dmitriy Lyubimov added a comment -

        The rationale for doing the work externally is largely the non-technical opposition from Dmitriy.

        Not sure what is not technical in my previous post. Or pretty much any post attached to this jira on my behalf.

        I am glad some github code is finally officially confirmed to be tied to this very M-1500 issue for the first time.

        However, i very much don't want to get pulled into discussion on height measurements of moral grounds here. Which is why it is the last time i post on this issue, since it obviously became pretty toxic for me to touch since desire to discredit-by-spin of my position has become so palpable.

        I have measured technical merit of those arguments given to me so far, privately or publicly, while consciously pushing objectivity levers of mine into their extreme "max" position; and unfortunately i don't think i found much substance to overcome the problems i have already reported. But this is just a matter of opinion. And i already gave 0 vote on this. So i don't see why you would want to do anything different w.r.t. submitting this work for further review with people on this forum based solely on my arguments – even if i have been privy to some additional information about this development before it was announced. I am not significant from the progress of this work point of view. My arguments might be of some value though.

        So, for the last time, to recap what it was.

        (A) critique of the idea of having anything blockwise-distributed under Matrix api as it exists today.

        As i mentioned above, the x2o-matrix code itself refers to core contracts as "performance bug" ( here we mean in-core abstraction of element-wise direct access, element-wise and vector-wise iterators, and in-core optimizer specific contracts). If implementation cannot satisfy core contracts of abstraction, it follows directly that the abstraction is not useful for the implementation. In other words, if the algorithms using abstraction need to pay attention to what actual implementation class actually lies underneath, then again, abstraction has failed by definition.

        Concerns like that could be allayed in some (not common) cases by declaring operations optionally supported (e.g. as in ByteBuffer$array()). However, in such situations optional contract is planned in the very first place rather than by alteration, as it would likely break existing users of the abstaction.

        Optional contracts also do not cover so numerous and so core-concern contracts as suggested by this "performance bug" qualifier (like i said, 95% of current Mahout code is using elementwise or vector-wise iterators whenever Matrix or Vector type is involved). So I don't consider declaring optional support for the family of those in-core contracts of Matrix and Vector a reconciliation path for this design problem.

        And I haven't heard any solid technical rebuttal to this from OOA point of view that would somehow vindicate this design in my mind.

        End-of-critique. Alternatives

        (B) Alternaltively, suppose we really wanted to go this way (i.e. marry something like "h2o-ized variation of DistributedRowMatrix" with AbstractMatrix using common mix-ins), then ideally solid design would imply re-working Matrix apis in order to split them further to separate into finer classes of concerns than those that exist today: algebraic ops, incore optimizer ops, and element-wise access concerns for in-core and distributed models (i.e stuff like getQuick, setQuick and Iterable vs. mapBlock).

        And then we would say that we have some mix-in (interface) that addresses all algebraic ops regardless of whether it is distributed or in-core backing.

        This sounds kind of right, doesn't it.

        However, this brings us back to the issue of destabilizing in-core Matrix api, splitting interfaces into hair, and hence sending ripple effects of code refactoring throughout, perhaps even beyond Mahout codebase.

        This cost in my opinion is not sufficiently outweighed by benefits of having some common in-core and distributed algebraic mix-ins among distributed and in-core stuff. Instead, algebraic operator-centric approach in my experience turned out much more cleaner pragmatically from distributed optimizer point of view, and resulted in much cleaner separation of in-core and distributed math concerns even in the end-user algorithms.

        Further on, even purely algebraic stuff is unlikely to be totally common (e.g. slice operators for vectors and elements are not supported in distributed stuff – instead, mapBlock operator is implied there to get access to in-core iterators of the blocks; in-place operators are generally bad for distributed plans too). This means even further split of API which at first seemed to be fairly same for both in-core and distributed stuff. That's my pragmatical net remainder of the Spark bindings work.

        (C) Another angle of attack on x2o integration IMO would be plugging x2o engines into optimizer, which this work (M-1500) doesn't target. I rate possibility of this happening as quite tepid at the moment, because x2o programming model is not rich enough to provide things like zipping identically distributed datasets, very general shuffle model (e.g. many-to-many shuffle), advanced partition management (shuffless resplit-coalesce), and so on. I am not even sure if there's a clear concept of combiner type operation. That observation leaves very bleak prospects for physical layer realization of the DRMLike scala stuff using H2O.

        So when Ted Dunning speaks of DSL integration, he most probably speaks of Scala bindings, not distributed DSL bindings. So this will create further fragmentation of approaches and goes against "write once, run anywhere" concept there. More likely, with this approach there would be "write once for H2O" and "write once for everything else". Which is not end of the world, but it doesn't sound appealing and it certainly doesn't seem to imply coherent H20 integration – not coherent with distributed algebra bindings, anyway.

        (D) And yet a third thought i probably have not yet said in this jira: I think the best path for any sort of benefits from x20 integration would be borrowing the compression techniques for columnar in-core data frame blocks, that's where x2o strength is said to be above anything else. But at this point my understanding no one has any intention to work this angle either.

        I am not supportive of A and B as explained.
        I am dubious about but not i am not sufficiently qualified to judge on C alternative.
        I am supportive of alternative D.

        Thank you for reading till the end.

        -d

        Show
        Dmitriy Lyubimov added a comment - The rationale for doing the work externally is largely the non-technical opposition from Dmitriy. Not sure what is not technical in my previous post. Or pretty much any post attached to this jira on my behalf. I am glad some github code is finally officially confirmed to be tied to this very M-1500 issue for the first time. However, i very much don't want to get pulled into discussion on height measurements of moral grounds here. Which is why it is the last time i post on this issue, since it obviously became pretty toxic for me to touch since desire to discredit-by-spin of my position has become so palpable. I have measured technical merit of those arguments given to me so far, privately or publicly, while consciously pushing objectivity levers of mine into their extreme "max" position; and unfortunately i don't think i found much substance to overcome the problems i have already reported. But this is just a matter of opinion. And i already gave 0 vote on this. So i don't see why you would want to do anything different w.r.t. submitting this work for further review with people on this forum based solely on my arguments – even if i have been privy to some additional information about this development before it was announced. I am not significant from the progress of this work point of view. My arguments might be of some value though. So, for the last time, to recap what it was. (A) critique of the idea of having anything blockwise-distributed under Matrix api as it exists today . As i mentioned above, the x2o-matrix code itself refers to core contracts as "performance bug" ( here we mean in-core abstraction of element-wise direct access, element-wise and vector-wise iterators, and in-core optimizer specific contracts). If implementation cannot satisfy core contracts of abstraction, it follows directly that the abstraction is not useful for the implementation. In other words, if the algorithms using abstraction need to pay attention to what actual implementation class actually lies underneath, then again, abstraction has failed by definition. Concerns like that could be allayed in some (not common) cases by declaring operations optionally supported (e.g. as in ByteBuffer$array()). However, in such situations optional contract is planned in the very first place rather than by alteration, as it would likely break existing users of the abstaction. Optional contracts also do not cover so numerous and so core-concern contracts as suggested by this "performance bug" qualifier (like i said, 95% of current Mahout code is using elementwise or vector-wise iterators whenever Matrix or Vector type is involved). So I don't consider declaring optional support for the family of those in-core contracts of Matrix and Vector a reconciliation path for this design problem. And I haven't heard any solid technical rebuttal to this from OOA point of view that would somehow vindicate this design in my mind. End-of-critique. Alternatives (B) Alternaltively, suppose we really wanted to go this way (i.e. marry something like "h2o-ized variation of DistributedRowMatrix" with AbstractMatrix using common mix-ins), then ideally solid design would imply re-working Matrix apis in order to split them further to separate into finer classes of concerns than those that exist today: algebraic ops, incore optimizer ops, and element-wise access concerns for in-core and distributed models (i.e stuff like getQuick, setQuick and Iterable vs. mapBlock). And then we would say that we have some mix-in (interface) that addresses all algebraic ops regardless of whether it is distributed or in-core backing. This sounds kind of right, doesn't it. However, this brings us back to the issue of destabilizing in-core Matrix api, splitting interfaces into hair, and hence sending ripple effects of code refactoring throughout, perhaps even beyond Mahout codebase. This cost in my opinion is not sufficiently outweighed by benefits of having some common in-core and distributed algebraic mix-ins among distributed and in-core stuff. Instead, algebraic operator-centric approach in my experience turned out much more cleaner pragmatically from distributed optimizer point of view, and resulted in much cleaner separation of in-core and distributed math concerns even in the end-user algorithms. Further on, even purely algebraic stuff is unlikely to be totally common (e.g. slice operators for vectors and elements are not supported in distributed stuff – instead, mapBlock operator is implied there to get access to in-core iterators of the blocks; in-place operators are generally bad for distributed plans too). This means even further split of API which at first seemed to be fairly same for both in-core and distributed stuff. That's my pragmatical net remainder of the Spark bindings work. (C) Another angle of attack on x2o integration IMO would be plugging x2o engines into optimizer, which this work (M-1500) doesn't target. I rate possibility of this happening as quite tepid at the moment, because x2o programming model is not rich enough to provide things like zipping identically distributed datasets, very general shuffle model (e.g. many-to-many shuffle), advanced partition management (shuffless resplit-coalesce), and so on. I am not even sure if there's a clear concept of combiner type operation. That observation leaves very bleak prospects for physical layer realization of the DRMLike scala stuff using H2O. So when Ted Dunning speaks of DSL integration, he most probably speaks of Scala bindings, not distributed DSL bindings. So this will create further fragmentation of approaches and goes against "write once, run anywhere" concept there. More likely, with this approach there would be "write once for H2O" and "write once for everything else". Which is not end of the world, but it doesn't sound appealing and it certainly doesn't seem to imply coherent H20 integration – not coherent with distributed algebra bindings, anyway. (D) And yet a third thought i probably have not yet said in this jira: I think the best path for any sort of benefits from x20 integration would be borrowing the compression techniques for columnar in-core data frame blocks, that's where x2o strength is said to be above anything else. But at this point my understanding no one has any intention to work this angle either. I am not supportive of A and B as explained. I am dubious about but not i am not sufficiently qualified to judge on C alternative. I am supportive of alternative D. Thank you for reading till the end. -d
        Hide
        Ted Dunning added a comment -

        Dmitriy Lyubimov's comments have several incorrect statements which lead to incorrect conclusions.

        These statements are both explicit and implicit and include in paraphrased form:

        • A comment about a "performance bug" means that h2o can't implement the Matrix API

        This means that use of some operations may have impacts on performance that could be surprisingly large to some programmers. The comment is intended to warn implementors that these impacts could be large enough to essentially prevent benefit from parallel computation. As such, their use would thwart some of the purpose of using a parallel system. The reference to a "performance bug" does not imply that the operations do not work and, indeed, their availability might be handy during initial implementation of algorithms.

        Section (A) makes points about validity of abstractions due to the requirements to modify existing code, but that really doesn't apply since that isn't the purpose of the current work.

        • It is the intent of the h2o support of the Matrix API that all codes that use the Matrix API should run and get parallel speedup

        This is explicitly not a goal of the current effort. The goal of the current effort is to use a well understood and stable Mahout API to experiment with implementation techniques for parallel algorithms that are based on h2o. It is a premise of this effort that the operations used in these hand built implementations will have roughly similar execution patterns as will equivalent programs that use the Scala bindings or the distributed DSL bindings. That premise is unlikely to be massively incorrect and thus the current effort is useful in terms of determining good h2o idioms for implementing matrix code.

        The pattern of usage of the matrix API by other Mahout codes is completely irrelevant to this effort.

        • The h2o system is not rich enough in capabilities to support things like zipping identically distributed data sets.

        This is simply incorrect and is based on lack of knowledge of the h2o system. The h2o primitives are different from Spark primitives. That means that different idioms have to be used to generate similar results, but it doesn't mean that h2o lacks these capabilities. In particular, the discord between what Dmitriy Lyubimov thinks that h2o can do and what it can do is large enough that the entire section (C) in his comments is essentially vacuous since it is based entirely on false premises.

        The current results indicate that there considerable promise for h2o in terms of these capabilities. More work is indicated.

        • the current work would require massive revamping of the current Mahout Matrix API.

        The current work is a technical exploration of convenient and efficient implementation techniques. It has no implications whatsoever regarding the refactoring of the Mahout Matrix API. The current work does have implications relative to any h2o shim layers that might ultimately be necessary, but that has nothing to do with the current Mahout in-core API's. Section (B) is thus also moot.

        The emotional tenor of Dmitriy Lyubimov's comments are exactly what is encouraging the h2o work to be done a bit apart. It simply isn't efficient to have to answer so many off-topic points whenever any reports on work in progress are given.

        Show
        Ted Dunning added a comment - Dmitriy Lyubimov 's comments have several incorrect statements which lead to incorrect conclusions. These statements are both explicit and implicit and include in paraphrased form: A comment about a "performance bug" means that h2o can't implement the Matrix API This means that use of some operations may have impacts on performance that could be surprisingly large to some programmers. The comment is intended to warn implementors that these impacts could be large enough to essentially prevent benefit from parallel computation. As such, their use would thwart some of the purpose of using a parallel system. The reference to a "performance bug" does not imply that the operations do not work and, indeed, their availability might be handy during initial implementation of algorithms. Section (A) makes points about validity of abstractions due to the requirements to modify existing code, but that really doesn't apply since that isn't the purpose of the current work. It is the intent of the h2o support of the Matrix API that all codes that use the Matrix API should run and get parallel speedup This is explicitly not a goal of the current effort. The goal of the current effort is to use a well understood and stable Mahout API to experiment with implementation techniques for parallel algorithms that are based on h2o. It is a premise of this effort that the operations used in these hand built implementations will have roughly similar execution patterns as will equivalent programs that use the Scala bindings or the distributed DSL bindings. That premise is unlikely to be massively incorrect and thus the current effort is useful in terms of determining good h2o idioms for implementing matrix code. The pattern of usage of the matrix API by other Mahout codes is completely irrelevant to this effort. The h2o system is not rich enough in capabilities to support things like zipping identically distributed data sets. This is simply incorrect and is based on lack of knowledge of the h2o system. The h2o primitives are different from Spark primitives. That means that different idioms have to be used to generate similar results, but it doesn't mean that h2o lacks these capabilities. In particular, the discord between what Dmitriy Lyubimov thinks that h2o can do and what it can do is large enough that the entire section (C) in his comments is essentially vacuous since it is based entirely on false premises. The current results indicate that there considerable promise for h2o in terms of these capabilities. More work is indicated. the current work would require massive revamping of the current Mahout Matrix API. The current work is a technical exploration of convenient and efficient implementation techniques. It has no implications whatsoever regarding the refactoring of the Mahout Matrix API. The current work does have implications relative to any h2o shim layers that might ultimately be necessary, but that has nothing to do with the current Mahout in-core API's. Section (B) is thus also moot. The emotional tenor of Dmitriy Lyubimov 's comments are exactly what is encouraging the h2o work to be done a bit apart. It simply isn't efficient to have to answer so many off-topic points whenever any reports on work in progress are given.
        Hide
        Dmitriy Lyubimov added a comment -

        These statements are both explicit and implicit and include in paraphrased form:

        A comment about a "performance bug" means that h2o can't implement the Matrix API

        The reference to a "performance bug" does not imply that the operations do not work and, indeed, their availability might be handy during initial implementation of algorithms.

        Paraphrasing me means admitting i did not say that. I am well aware that these APIs in question are naively supported. When a person takes an iterator (regardless of what one iterates over) it is expected as a general rule that would be a O iteration. Using it for O(1) is not a general rule (in fact, i don't know example of that over entire current codebase).

        So thus we established two things here:
        (1) abstraction is not useful for general rule. O
        (2) abstraction may be useful for non-general rule O(1).

        So according to general rule, this abstraction is not useful. Saying that general rule can be overturned by special case is a fallacy in retorics called "special pleading", i.e. arguing against general rule based on an exception.

        entire section (C) in his comments is essentially vacuous since it is based entirely on false premises

        Proposal (C) actually stated the proposal (implement optimizer plugin for h20) and also argumentation why it may be difficult. I am not sure what is moot – the proposal, or why it is unfeasible. Either way, it is based on my discussion with h20 team and their own admittance(including dev list) that programming model is not where it needs be. If they know that these goodies are provided as primitives they never stated so. I was actually very hopeful and positive about it in the beginning that there were. You are the first person on record on the topic that advertises H20 as rich programming model. If so, that's wonderful, i would be happy to re-examine my proposal C myself. That said, i already admitted I should not be considered an expert on the exact set of capabilities there. My point with proposal (C) was that giving physical layer translation for h20 things is viewed as more consistent integration path, but not so much why it is unfeasible.

        The emotional tenor of Dmitriy Lyubimov's comments are exactly what is encouraging the h2o work to be done a bit apart. It simply isn't efficient to have to answer so many off-topic points whenever any reports on work in progress are given.

        I think this has been the off-topic here.

        Calling my comments "emotional" or "non-technical", or loosely paraphrasing me.

        It is also a well known rhetorical fallacy – attacking opponent's character or expertise with hope he or she would go into defending it while distracting from the actual issue at hand. Which allows to mount even further similar attacks.

        But it wouldn't matter even if I ate children for dinner, discussion of my character (or expertise) is totally irrelevant to this Jira or strength argumentation of whomever. Rhethorics would argue this actually makes your position look weaker, making people think the rest of your argumentation base is weaker than it is.

        Anyway, I just wanted to make it clear that I don't see it as reasonable to use my name as any sort of pretext to do (or not to do) things as opposed that they are normally done in Apache. I am also willing to make it easier since I will not return to this jira and will not vote negatively on it.

        Actually quite opposite, i have always had and will encourage to bring things more forward for people to look at, not being clear on intent was what has been causing so much confusion about this all in the first place.

        Show
        Dmitriy Lyubimov added a comment - These statements are both explicit and implicit and include in paraphrased form: A comment about a "performance bug" means that h2o can't implement the Matrix API The reference to a "performance bug" does not imply that the operations do not work and, indeed, their availability might be handy during initial implementation of algorithms. Paraphrasing me means admitting i did not say that. I am well aware that these APIs in question are naively supported. When a person takes an iterator (regardless of what one iterates over) it is expected as a general rule that would be a O iteration. Using it for O(1) is not a general rule (in fact, i don't know example of that over entire current codebase). So thus we established two things here: (1) abstraction is not useful for general rule. O (2) abstraction may be useful for non-general rule O(1). So according to general rule, this abstraction is not useful. Saying that general rule can be overturned by special case is a fallacy in retorics called "special pleading", i.e. arguing against general rule based on an exception. entire section (C) in his comments is essentially vacuous since it is based entirely on false premises Proposal (C) actually stated the proposal (implement optimizer plugin for h20) and also argumentation why it may be difficult. I am not sure what is moot – the proposal, or why it is unfeasible. Either way, it is based on my discussion with h20 team and their own admittance(including dev list) that programming model is not where it needs be. If they know that these goodies are provided as primitives they never stated so. I was actually very hopeful and positive about it in the beginning that there were. You are the first person on record on the topic that advertises H20 as rich programming model. If so, that's wonderful, i would be happy to re-examine my proposal C myself. That said, i already admitted I should not be considered an expert on the exact set of capabilities there. My point with proposal (C) was that giving physical layer translation for h20 things is viewed as more consistent integration path, but not so much why it is unfeasible. The emotional tenor of Dmitriy Lyubimov's comments are exactly what is encouraging the h2o work to be done a bit apart. It simply isn't efficient to have to answer so many off-topic points whenever any reports on work in progress are given. I think this has been the off-topic here. Calling my comments "emotional" or "non-technical", or loosely paraphrasing me. It is also a well known rhetorical fallacy – attacking opponent's character or expertise with hope he or she would go into defending it while distracting from the actual issue at hand. Which allows to mount even further similar attacks. But it wouldn't matter even if I ate children for dinner, discussion of my character (or expertise) is totally irrelevant to this Jira or strength argumentation of whomever. Rhethorics would argue this actually makes your position look weaker, making people think the rest of your argumentation base is weaker than it is. Anyway, I just wanted to make it clear that I don't see it as reasonable to use my name as any sort of pretext to do (or not to do) things as opposed that they are normally done in Apache. I am also willing to make it easier since I will not return to this jira and will not vote negatively on it. Actually quite opposite, i have always had and will encourage to bring things more forward for people to look at, not being clear on intent was what has been causing so much confusion about this all in the first place.
        Hide
        Saikat Kanjilal added a comment -

        Anand,
        Just following up to make sure I understand, is there a concrete deliverable for this or is this more of an exploration with some discussion topics to be resolved through the dev mailing list? In the above comments I see this as more of an exploration and yet I also read that there is work being done offline(in a fork perhaps) to get h2o integrated?

        Show
        Saikat Kanjilal added a comment - Anand, Just following up to make sure I understand, is there a concrete deliverable for this or is this more of an exploration with some discussion topics to be resolved through the dev mailing list? In the above comments I see this as more of an exploration and yet I also read that there is work being done offline(in a fork perhaps) to get h2o integrated?
        Anand Avati made changes -
        Field Original Value New Value
        Description Integration with h2o (github.com/0xdata/h2o) in order to exploit its high performance computational abilities.

        Start with providing implementations of AbstractMatrix and AbstractVector, and more as we make progress.
        Provide H2O backend for the Mahout DSL
        Hide
        ASF GitHub Bot added a comment -

        GitHub user dlyubimov opened a pull request:

        https://github.com/apache/mahout/pull/21

        MAHOUT-1500 H20

        Creating a PR to just to be able to (re-)view what current state of diff is against master in this work.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/avati/mahout MAHOUT-1500

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/mahout/pull/21.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #21


        commit bb23a8b45250379e3c89e0a64325b144cd2aa2e7
        Author: Anand Avati <avati@redhat.com>
        Date: 2014-05-20T02:58:59Z

        MAHOUT-1500: Implement H2O backend for Mahout Scala DSL

        Barebone only, no logic yet. Compiles, tests fail with NotImplementedError

        Signed-off-by: Anand Avati <avati@redhat.com>

        commit 5b3c852a2abb677accbce4e0c6dd605e585f0a04
        Author: Anand Avati <avati@redhat.com>
        Date: 2014-06-12T00:07:10Z

        MAHOUT-1500: Implement non algebraic parts of H2O bindings

        Signed-off-by: Anand Avati <avati@redhat.com>

        commit 757a95fcce2afae14df4c5859c75fb4b8896df15
        Author: Anand Avati <avati@redhat.com>
        Date: 2014-06-17T01:30:13Z

        MAHOUT-1500: Implement Linear Algebra ops in H2O backend

        Signed-off-by: Anand Avati <avati@redhat.com>


        Show
        ASF GitHub Bot added a comment - GitHub user dlyubimov opened a pull request: https://github.com/apache/mahout/pull/21 MAHOUT-1500 H20 Creating a PR to just to be able to (re-)view what current state of diff is against master in this work. You can merge this pull request into a Git repository by running: $ git pull https://github.com/avati/mahout MAHOUT-1500 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/21.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21 commit bb23a8b45250379e3c89e0a64325b144cd2aa2e7 Author: Anand Avati <avati@redhat.com> Date: 2014-05-20T02:58:59Z MAHOUT-1500 : Implement H2O backend for Mahout Scala DSL Barebone only, no logic yet. Compiles, tests fail with NotImplementedError Signed-off-by: Anand Avati <avati@redhat.com> commit 5b3c852a2abb677accbce4e0c6dd605e585f0a04 Author: Anand Avati <avati@redhat.com> Date: 2014-06-12T00:07:10Z MAHOUT-1500 : Implement non algebraic parts of H2O bindings Signed-off-by: Anand Avati <avati@redhat.com> commit 757a95fcce2afae14df4c5859c75fb4b8896df15 Author: Anand Avati <avati@redhat.com> Date: 2014-06-17T01:30:13Z MAHOUT-1500 : Implement Linear Algebra ops in H2O backend Signed-off-by: Anand Avati <avati@redhat.com>
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r13929215

        — Diff: h2o/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala —
        @@ -0,0 +1,212 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.math.decompositions
        +
        +import org.scalatest.

        {Matchers, FunSuite}

        +import org.apache.mahout.h2obindings.test.MahoutLocalContext
        +import org.apache.mahout.math._
        +import drm._
        +import scalabindings._
        +import RLikeOps._
        +import RLikeDrmOps._
        +import org.apache.mahout.h2obindings._
        +import org.apache.mahout.common.RandomUtils
        +import scala.math._
        +
        +class MathSuite extends FunSuite with Matchers with MahoutLocalContext {
        +
        + test("thin distributed qr")

        { + + val inCoreA = dense( + (1, 2, 3, 4), + (2, 3, 4, 5), + (3, -4, 5, 6), + (4, 5, 6, 7), + (8, 6, 7, 8) + ) + + val A = drmParallelize(inCoreA, numPartitions = 2) + val (drmQ, inCoreR) = dqrThin(A, checkRankDeficiency = false) + + // Assert optimizer still knows Q and A are identically partitioned + drmQ.partitioningTag should equal(A.partitioningTag) + +// drmQ.rdd.partitions.size should be(A.rdd.partitions.size) + + // Should also be zippable +// drmQ.rdd.zip(other = A.rdd) + + val inCoreQ = drmQ.collect + + printf("A=\n%s\n", inCoreA) + printf("Q=\n%s\n", inCoreQ) + printf("R=\n%s\n", inCoreR) + + val (qControl, rControl) = qr(inCoreA) + printf("qControl=\n%s\n", qControl) + printf("rControl=\n%s\n", rControl) + + // Validate with Cholesky + val ch = chol(inCoreA.t %*% inCoreA) + printf("A'A=\n%s\n", inCoreA.t %*% inCoreA) + printf("L:\n%s\n", ch.getL) + + val rControl2 = (ch.getL cloned).t + val qControl2 = ch.solveRight(inCoreA) + printf("qControl2=\n%s\n", qControl2) + printf("rControl2=\n%s\n", rControl2) + + // Housholder approach seems to be a little bit more stable + (rControl - inCoreR).norm should be < 1E-5 + (qControl - inCoreQ).norm should be < 1E-5 + + // Assert identicity with in-core Cholesky-based -- this should be tighter. + (rControl2 - inCoreR).norm should be < 1E-10 + (qControl2 - inCoreQ).norm should be < 1E-10 + + // Assert orhtogonality: + // (a) Q[,j] dot Q[,j] == 1.0 for all j + // (b) Q[,i] dot Q[,j] == 0.0 for all i != j + for (col <- 0 until inCoreQ.ncol) + ((inCoreQ(::, col) dot inCoreQ(::, col)) - 1.0).abs should be < 1e-10 + for (col1 <- 0 until inCoreQ.ncol - 1; col2 <- col1 + 1 until inCoreQ.ncol) + (inCoreQ(::, col1) dot inCoreQ(::, col2)).abs should be < 1e-10 + + + }

        +
        + test("dssvd - the naive-est - q=0")

        { + dssvdNaive(q = 0) + }

        +
        + test("ddsvd - naive - q=1")

        { + dssvdNaive(q = 1) + }

        +
        + test("ddsvd - naive - q=2")

        { + dssvdNaive(q = 2) + }

        +
        +
        + def dssvdNaive(q: Int)

        { + val inCoreA = dense( + (1, 2, 3, 4), + (2, 3, 4, 5), + (3, -4, 5, 6), + (4, 5, 6, 7), + (8, 6, 7, 8) + ) + val drmA = drmParallelize(inCoreA, numPartitions = 2) + + val (drmU, drmV, s) = dssvd(drmA, k = 4, q = q) + val (inCoreU, inCoreV) = (drmU.collect, drmV.collect) + + printf("U:\n%s\n", inCoreU) + printf("V:\n%s\n", inCoreV) + printf("Sigma:\n%s\n", s) + + (inCoreA - (inCoreU %*%: diagv(s)) %*% inCoreV.t).norm should be < 1E-5 + }

        +
        + test("dspca")

        { + + val rnd = RandomUtils.getRandom + + // Number of points + val m = 500 + // Length of actual spectrum + val spectrumLen = 40 + + val spectrum = dvec((0 until spectrumLen).map(x => 300.0 * exp(-x) max 1e-3)) + printf("spectrum:%s\n", spectrum) + + val (u, _) = qr(new SparseRowMatrix(m, spectrumLen) := + ((r, c, v) => if (rnd.nextDouble() < 0.2) 0 else rnd.nextDouble() + 5.0)) + + // PCA Rotation matrix -- should also be orthonormal. + val (tr, _) = qr(Matrices.symmetricUniformView(spectrumLen, spectrumLen, rnd.nextInt) - 10.0) + + val input = (u %*%: diagv(spectrum)) %*% tr.t + val drmInput = drmParallelize(m = input, numPartitions = 2) + + // Calculate just first 10 principal factors and reduce dimensionality. + // Since we assert just validity of the s-pca, not stochastic error, we bump p parameter to + // ensure to zero stochastic error and assert only functional correctness of the method's pca- + // specific additions. + val k = 10 + + // Calculate just first 10 principal factors and reduce dimensionality. + var (drmPCA, _, s) = dspca(A = drmInput, k = 10, p = spectrumLen, q = 1) + // Un-normalized pca data: + drmPCA = drmPCA %*% diagv(s) + + val pca = drmPCA.checkpoint(CacheHint.NONE).collect + + // Of course, once we calculated the pca, the spectrum is going to be different since our originally + // generated input was not centered. So here, we'd just brute-solve pca to verify + val xi = input.colMeans() + for (r <- 0 until input.nrow) input(r, ::) -= xi + var (pcaControl, _, sControl) = svd(m = input) + pcaControl = (pcaControl %*%: diagv(sControl))(::, 0 until k) + + printf("pca:\n%s\n", pca(0 until 10, 0 until 10)) + printf("pcaControl:\n%s\n", pcaControl(0 until 10, 0 until 10)) + + (pca(0 until 10, 0 until 10).norm - pcaControl(0 until 10, 0 until 10).norm).abs should be < 1E-5 + + }

        +
        + test("als")

        { + + val rnd = RandomUtils.getRandom + + // Number of points + val m = 500 + val n = 500 + + // Length of actual spectrum + val spectrumLen = 40 + + // Create singluar values with decay + val spectrum = dvec((0 until spectrumLen).map(x => 300.0 * exp(-x) max 1e-3)) + printf("spectrum:%s\n", spectrum) + + // Create A as an ideal input + val inCoreA = (qr(Matrices.symmetricUniformView(m, spectrumLen, 1234))._1 %*%: diagv(spectrum)) %*% + qr(Matrices.symmetricUniformView(n, spectrumLen, 2345))._1.t + val drmA = drmParallelize(inCoreA, numPartitions = 2) + + // Decompose using ALS + val (drmU, drmV, rmse) = als(drmInput = drmA, k = 20).toTuple + val inCoreU = drmU.collect + val inCoreV = drmV.collect + + val predict = inCoreU %*% inCoreV.t + + printf("Control block:\n%s\n", inCoreA(0 until 3, 0 until 3)) + printf("ALS factorized approximation block:\n%s\n", predict(0 until 3, 0 until 3)) + + val err = (inCoreA - predict).norm + printf ("norm of residuals %f\n",err) + printf ("train iteration rmses: %s\n", rmse) + + err should be < 1e-2 + + }

        — End diff –

        I think if all these tests are passing, this would be an incredibly cool step forward in this issue..

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r13929215 — Diff: h2o/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala — @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.math.decompositions + +import org.scalatest. {Matchers, FunSuite} +import org.apache.mahout.h2obindings.test.MahoutLocalContext +import org.apache.mahout.math._ +import drm._ +import scalabindings._ +import RLikeOps._ +import RLikeDrmOps._ +import org.apache.mahout.h2obindings._ +import org.apache.mahout.common.RandomUtils +import scala.math._ + +class MathSuite extends FunSuite with Matchers with MahoutLocalContext { + + test("thin distributed qr") { + + val inCoreA = dense( + (1, 2, 3, 4), + (2, 3, 4, 5), + (3, -4, 5, 6), + (4, 5, 6, 7), + (8, 6, 7, 8) + ) + + val A = drmParallelize(inCoreA, numPartitions = 2) + val (drmQ, inCoreR) = dqrThin(A, checkRankDeficiency = false) + + // Assert optimizer still knows Q and A are identically partitioned + drmQ.partitioningTag should equal(A.partitioningTag) + +// drmQ.rdd.partitions.size should be(A.rdd.partitions.size) + + // Should also be zippable +// drmQ.rdd.zip(other = A.rdd) + + val inCoreQ = drmQ.collect + + printf("A=\n%s\n", inCoreA) + printf("Q=\n%s\n", inCoreQ) + printf("R=\n%s\n", inCoreR) + + val (qControl, rControl) = qr(inCoreA) + printf("qControl=\n%s\n", qControl) + printf("rControl=\n%s\n", rControl) + + // Validate with Cholesky + val ch = chol(inCoreA.t %*% inCoreA) + printf("A'A=\n%s\n", inCoreA.t %*% inCoreA) + printf("L:\n%s\n", ch.getL) + + val rControl2 = (ch.getL cloned).t + val qControl2 = ch.solveRight(inCoreA) + printf("qControl2=\n%s\n", qControl2) + printf("rControl2=\n%s\n", rControl2) + + // Housholder approach seems to be a little bit more stable + (rControl - inCoreR).norm should be < 1E-5 + (qControl - inCoreQ).norm should be < 1E-5 + + // Assert identicity with in-core Cholesky-based -- this should be tighter. + (rControl2 - inCoreR).norm should be < 1E-10 + (qControl2 - inCoreQ).norm should be < 1E-10 + + // Assert orhtogonality: + // (a) Q[,j] dot Q[,j] == 1.0 for all j + // (b) Q[,i] dot Q[,j] == 0.0 for all i != j + for (col <- 0 until inCoreQ.ncol) + ((inCoreQ(::, col) dot inCoreQ(::, col)) - 1.0).abs should be < 1e-10 + for (col1 <- 0 until inCoreQ.ncol - 1; col2 <- col1 + 1 until inCoreQ.ncol) + (inCoreQ(::, col1) dot inCoreQ(::, col2)).abs should be < 1e-10 + + + } + + test("dssvd - the naive-est - q=0") { + dssvdNaive(q = 0) + } + + test("ddsvd - naive - q=1") { + dssvdNaive(q = 1) + } + + test("ddsvd - naive - q=2") { + dssvdNaive(q = 2) + } + + + def dssvdNaive(q: Int) { + val inCoreA = dense( + (1, 2, 3, 4), + (2, 3, 4, 5), + (3, -4, 5, 6), + (4, 5, 6, 7), + (8, 6, 7, 8) + ) + val drmA = drmParallelize(inCoreA, numPartitions = 2) + + val (drmU, drmV, s) = dssvd(drmA, k = 4, q = q) + val (inCoreU, inCoreV) = (drmU.collect, drmV.collect) + + printf("U:\n%s\n", inCoreU) + printf("V:\n%s\n", inCoreV) + printf("Sigma:\n%s\n", s) + + (inCoreA - (inCoreU %*%: diagv(s)) %*% inCoreV.t).norm should be < 1E-5 + } + + test("dspca") { + + val rnd = RandomUtils.getRandom + + // Number of points + val m = 500 + // Length of actual spectrum + val spectrumLen = 40 + + val spectrum = dvec((0 until spectrumLen).map(x => 300.0 * exp(-x) max 1e-3)) + printf("spectrum:%s\n", spectrum) + + val (u, _) = qr(new SparseRowMatrix(m, spectrumLen) := + ((r, c, v) => if (rnd.nextDouble() < 0.2) 0 else rnd.nextDouble() + 5.0)) + + // PCA Rotation matrix -- should also be orthonormal. + val (tr, _) = qr(Matrices.symmetricUniformView(spectrumLen, spectrumLen, rnd.nextInt) - 10.0) + + val input = (u %*%: diagv(spectrum)) %*% tr.t + val drmInput = drmParallelize(m = input, numPartitions = 2) + + // Calculate just first 10 principal factors and reduce dimensionality. + // Since we assert just validity of the s-pca, not stochastic error, we bump p parameter to + // ensure to zero stochastic error and assert only functional correctness of the method's pca- + // specific additions. + val k = 10 + + // Calculate just first 10 principal factors and reduce dimensionality. + var (drmPCA, _, s) = dspca(A = drmInput, k = 10, p = spectrumLen, q = 1) + // Un-normalized pca data: + drmPCA = drmPCA %*% diagv(s) + + val pca = drmPCA.checkpoint(CacheHint.NONE).collect + + // Of course, once we calculated the pca, the spectrum is going to be different since our originally + // generated input was not centered. So here, we'd just brute-solve pca to verify + val xi = input.colMeans() + for (r <- 0 until input.nrow) input(r, ::) -= xi + var (pcaControl, _, sControl) = svd(m = input) + pcaControl = (pcaControl %*%: diagv(sControl))(::, 0 until k) + + printf("pca:\n%s\n", pca(0 until 10, 0 until 10)) + printf("pcaControl:\n%s\n", pcaControl(0 until 10, 0 until 10)) + + (pca(0 until 10, 0 until 10).norm - pcaControl(0 until 10, 0 until 10).norm).abs should be < 1E-5 + + } + + test("als") { + + val rnd = RandomUtils.getRandom + + // Number of points + val m = 500 + val n = 500 + + // Length of actual spectrum + val spectrumLen = 40 + + // Create singluar values with decay + val spectrum = dvec((0 until spectrumLen).map(x => 300.0 * exp(-x) max 1e-3)) + printf("spectrum:%s\n", spectrum) + + // Create A as an ideal input + val inCoreA = (qr(Matrices.symmetricUniformView(m, spectrumLen, 1234))._1 %*%: diagv(spectrum)) %*% + qr(Matrices.symmetricUniformView(n, spectrumLen, 2345))._1.t + val drmA = drmParallelize(inCoreA, numPartitions = 2) + + // Decompose using ALS + val (drmU, drmV, rmse) = als(drmInput = drmA, k = 20).toTuple + val inCoreU = drmU.collect + val inCoreV = drmV.collect + + val predict = inCoreU %*% inCoreV.t + + printf("Control block:\n%s\n", inCoreA(0 until 3, 0 until 3)) + printf("ALS factorized approximation block:\n%s\n", predict(0 until 3, 0 until 3)) + + val err = (inCoreA - predict).norm + printf ("norm of residuals %f\n",err) + printf ("train iteration rmses: %s\n", rmse) + + err should be < 1e-2 + + } — End diff – I think if all these tests are passing, this would be an incredibly cool step forward in this issue..
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r14043524

        — Diff: h2o/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala —
        @@ -0,0 +1,212 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.math.decompositions
        +
        +import org.scalatest.

        {Matchers, FunSuite}

        +import org.apache.mahout.h2obindings.test.MahoutLocalContext
        +import org.apache.mahout.math._
        +import drm._
        +import scalabindings._
        +import RLikeOps._
        +import RLikeDrmOps._
        +import org.apache.mahout.h2obindings._
        +import org.apache.mahout.common.RandomUtils
        +import scala.math._
        +
        +class MathSuite extends FunSuite with Matchers with MahoutLocalContext {
        +
        + test("thin distributed qr")

        { + + val inCoreA = dense( + (1, 2, 3, 4), + (2, 3, 4, 5), + (3, -4, 5, 6), + (4, 5, 6, 7), + (8, 6, 7, 8) + ) + + val A = drmParallelize(inCoreA, numPartitions = 2) + val (drmQ, inCoreR) = dqrThin(A, checkRankDeficiency = false) + + // Assert optimizer still knows Q and A are identically partitioned + drmQ.partitioningTag should equal(A.partitioningTag) + +// drmQ.rdd.partitions.size should be(A.rdd.partitions.size) + + // Should also be zippable +// drmQ.rdd.zip(other = A.rdd) + + val inCoreQ = drmQ.collect + + printf("A=\n%s\n", inCoreA) + printf("Q=\n%s\n", inCoreQ) + printf("R=\n%s\n", inCoreR) + + val (qControl, rControl) = qr(inCoreA) + printf("qControl=\n%s\n", qControl) + printf("rControl=\n%s\n", rControl) + + // Validate with Cholesky + val ch = chol(inCoreA.t %*% inCoreA) + printf("A'A=\n%s\n", inCoreA.t %*% inCoreA) + printf("L:\n%s\n", ch.getL) + + val rControl2 = (ch.getL cloned).t + val qControl2 = ch.solveRight(inCoreA) + printf("qControl2=\n%s\n", qControl2) + printf("rControl2=\n%s\n", rControl2) + + // Housholder approach seems to be a little bit more stable + (rControl - inCoreR).norm should be < 1E-5 + (qControl - inCoreQ).norm should be < 1E-5 + + // Assert identicity with in-core Cholesky-based -- this should be tighter. + (rControl2 - inCoreR).norm should be < 1E-10 + (qControl2 - inCoreQ).norm should be < 1E-10 + + // Assert orhtogonality: + // (a) Q[,j] dot Q[,j] == 1.0 for all j + // (b) Q[,i] dot Q[,j] == 0.0 for all i != j + for (col <- 0 until inCoreQ.ncol) + ((inCoreQ(::, col) dot inCoreQ(::, col)) - 1.0).abs should be < 1e-10 + for (col1 <- 0 until inCoreQ.ncol - 1; col2 <- col1 + 1 until inCoreQ.ncol) + (inCoreQ(::, col1) dot inCoreQ(::, col2)).abs should be < 1e-10 + + + }

        +
        + test("dssvd - the naive-est - q=0")

        { + dssvdNaive(q = 0) + }

        +
        + test("ddsvd - naive - q=1")

        { + dssvdNaive(q = 1) + }

        +
        + test("ddsvd - naive - q=2")

        { + dssvdNaive(q = 2) + }

        +
        +
        + def dssvdNaive(q: Int)

        { + val inCoreA = dense( + (1, 2, 3, 4), + (2, 3, 4, 5), + (3, -4, 5, 6), + (4, 5, 6, 7), + (8, 6, 7, 8) + ) + val drmA = drmParallelize(inCoreA, numPartitions = 2) + + val (drmU, drmV, s) = dssvd(drmA, k = 4, q = q) + val (inCoreU, inCoreV) = (drmU.collect, drmV.collect) + + printf("U:\n%s\n", inCoreU) + printf("V:\n%s\n", inCoreV) + printf("Sigma:\n%s\n", s) + + (inCoreA - (inCoreU %*%: diagv(s)) %*% inCoreV.t).norm should be < 1E-5 + }

        +
        + test("dspca")

        { + + val rnd = RandomUtils.getRandom + + // Number of points + val m = 500 + // Length of actual spectrum + val spectrumLen = 40 + + val spectrum = dvec((0 until spectrumLen).map(x => 300.0 * exp(-x) max 1e-3)) + printf("spectrum:%s\n", spectrum) + + val (u, _) = qr(new SparseRowMatrix(m, spectrumLen) := + ((r, c, v) => if (rnd.nextDouble() < 0.2) 0 else rnd.nextDouble() + 5.0)) + + // PCA Rotation matrix -- should also be orthonormal. + val (tr, _) = qr(Matrices.symmetricUniformView(spectrumLen, spectrumLen, rnd.nextInt) - 10.0) + + val input = (u %*%: diagv(spectrum)) %*% tr.t + val drmInput = drmParallelize(m = input, numPartitions = 2) + + // Calculate just first 10 principal factors and reduce dimensionality. + // Since we assert just validity of the s-pca, not stochastic error, we bump p parameter to + // ensure to zero stochastic error and assert only functional correctness of the method's pca- + // specific additions. + val k = 10 + + // Calculate just first 10 principal factors and reduce dimensionality. + var (drmPCA, _, s) = dspca(A = drmInput, k = 10, p = spectrumLen, q = 1) + // Un-normalized pca data: + drmPCA = drmPCA %*% diagv(s) + + val pca = drmPCA.checkpoint(CacheHint.NONE).collect + + // Of course, once we calculated the pca, the spectrum is going to be different since our originally + // generated input was not centered. So here, we'd just brute-solve pca to verify + val xi = input.colMeans() + for (r <- 0 until input.nrow) input(r, ::) -= xi + var (pcaControl, _, sControl) = svd(m = input) + pcaControl = (pcaControl %*%: diagv(sControl))(::, 0 until k) + + printf("pca:\n%s\n", pca(0 until 10, 0 until 10)) + printf("pcaControl:\n%s\n", pcaControl(0 until 10, 0 until 10)) + + (pca(0 until 10, 0 until 10).norm - pcaControl(0 until 10, 0 until 10).norm).abs should be < 1E-5 + + }

        +
        + test("als")

        { + + val rnd = RandomUtils.getRandom + + // Number of points + val m = 500 + val n = 500 + + // Length of actual spectrum + val spectrumLen = 40 + + // Create singluar values with decay + val spectrum = dvec((0 until spectrumLen).map(x => 300.0 * exp(-x) max 1e-3)) + printf("spectrum:%s\n", spectrum) + + // Create A as an ideal input + val inCoreA = (qr(Matrices.symmetricUniformView(m, spectrumLen, 1234))._1 %*%: diagv(spectrum)) %*% + qr(Matrices.symmetricUniformView(n, spectrumLen, 2345))._1.t + val drmA = drmParallelize(inCoreA, numPartitions = 2) + + // Decompose using ALS + val (drmU, drmV, rmse) = als(drmInput = drmA, k = 20).toTuple + val inCoreU = drmU.collect + val inCoreV = drmV.collect + + val predict = inCoreU %*% inCoreV.t + + printf("Control block:\n%s\n", inCoreA(0 until 3, 0 until 3)) + printf("ALS factorized approximation block:\n%s\n", predict(0 until 3, 0 until 3)) + + val err = (inCoreA - predict).norm + printf ("norm of residuals %f\n",err) + printf ("train iteration rmses: %s\n", rmse) + + err should be < 1e-2 + + }

        — End diff –

        With the latest git-push, all these tests are passing

        Show
        ASF GitHub Bot added a comment - Github user avati commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r14043524 — Diff: h2o/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala — @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.math.decompositions + +import org.scalatest. {Matchers, FunSuite} +import org.apache.mahout.h2obindings.test.MahoutLocalContext +import org.apache.mahout.math._ +import drm._ +import scalabindings._ +import RLikeOps._ +import RLikeDrmOps._ +import org.apache.mahout.h2obindings._ +import org.apache.mahout.common.RandomUtils +import scala.math._ + +class MathSuite extends FunSuite with Matchers with MahoutLocalContext { + + test("thin distributed qr") { + + val inCoreA = dense( + (1, 2, 3, 4), + (2, 3, 4, 5), + (3, -4, 5, 6), + (4, 5, 6, 7), + (8, 6, 7, 8) + ) + + val A = drmParallelize(inCoreA, numPartitions = 2) + val (drmQ, inCoreR) = dqrThin(A, checkRankDeficiency = false) + + // Assert optimizer still knows Q and A are identically partitioned + drmQ.partitioningTag should equal(A.partitioningTag) + +// drmQ.rdd.partitions.size should be(A.rdd.partitions.size) + + // Should also be zippable +// drmQ.rdd.zip(other = A.rdd) + + val inCoreQ = drmQ.collect + + printf("A=\n%s\n", inCoreA) + printf("Q=\n%s\n", inCoreQ) + printf("R=\n%s\n", inCoreR) + + val (qControl, rControl) = qr(inCoreA) + printf("qControl=\n%s\n", qControl) + printf("rControl=\n%s\n", rControl) + + // Validate with Cholesky + val ch = chol(inCoreA.t %*% inCoreA) + printf("A'A=\n%s\n", inCoreA.t %*% inCoreA) + printf("L:\n%s\n", ch.getL) + + val rControl2 = (ch.getL cloned).t + val qControl2 = ch.solveRight(inCoreA) + printf("qControl2=\n%s\n", qControl2) + printf("rControl2=\n%s\n", rControl2) + + // Housholder approach seems to be a little bit more stable + (rControl - inCoreR).norm should be < 1E-5 + (qControl - inCoreQ).norm should be < 1E-5 + + // Assert identicity with in-core Cholesky-based -- this should be tighter. + (rControl2 - inCoreR).norm should be < 1E-10 + (qControl2 - inCoreQ).norm should be < 1E-10 + + // Assert orhtogonality: + // (a) Q[,j] dot Q[,j] == 1.0 for all j + // (b) Q[,i] dot Q[,j] == 0.0 for all i != j + for (col <- 0 until inCoreQ.ncol) + ((inCoreQ(::, col) dot inCoreQ(::, col)) - 1.0).abs should be < 1e-10 + for (col1 <- 0 until inCoreQ.ncol - 1; col2 <- col1 + 1 until inCoreQ.ncol) + (inCoreQ(::, col1) dot inCoreQ(::, col2)).abs should be < 1e-10 + + + } + + test("dssvd - the naive-est - q=0") { + dssvdNaive(q = 0) + } + + test("ddsvd - naive - q=1") { + dssvdNaive(q = 1) + } + + test("ddsvd - naive - q=2") { + dssvdNaive(q = 2) + } + + + def dssvdNaive(q: Int) { + val inCoreA = dense( + (1, 2, 3, 4), + (2, 3, 4, 5), + (3, -4, 5, 6), + (4, 5, 6, 7), + (8, 6, 7, 8) + ) + val drmA = drmParallelize(inCoreA, numPartitions = 2) + + val (drmU, drmV, s) = dssvd(drmA, k = 4, q = q) + val (inCoreU, inCoreV) = (drmU.collect, drmV.collect) + + printf("U:\n%s\n", inCoreU) + printf("V:\n%s\n", inCoreV) + printf("Sigma:\n%s\n", s) + + (inCoreA - (inCoreU %*%: diagv(s)) %*% inCoreV.t).norm should be < 1E-5 + } + + test("dspca") { + + val rnd = RandomUtils.getRandom + + // Number of points + val m = 500 + // Length of actual spectrum + val spectrumLen = 40 + + val spectrum = dvec((0 until spectrumLen).map(x => 300.0 * exp(-x) max 1e-3)) + printf("spectrum:%s\n", spectrum) + + val (u, _) = qr(new SparseRowMatrix(m, spectrumLen) := + ((r, c, v) => if (rnd.nextDouble() < 0.2) 0 else rnd.nextDouble() + 5.0)) + + // PCA Rotation matrix -- should also be orthonormal. + val (tr, _) = qr(Matrices.symmetricUniformView(spectrumLen, spectrumLen, rnd.nextInt) - 10.0) + + val input = (u %*%: diagv(spectrum)) %*% tr.t + val drmInput = drmParallelize(m = input, numPartitions = 2) + + // Calculate just first 10 principal factors and reduce dimensionality. + // Since we assert just validity of the s-pca, not stochastic error, we bump p parameter to + // ensure to zero stochastic error and assert only functional correctness of the method's pca- + // specific additions. + val k = 10 + + // Calculate just first 10 principal factors and reduce dimensionality. + var (drmPCA, _, s) = dspca(A = drmInput, k = 10, p = spectrumLen, q = 1) + // Un-normalized pca data: + drmPCA = drmPCA %*% diagv(s) + + val pca = drmPCA.checkpoint(CacheHint.NONE).collect + + // Of course, once we calculated the pca, the spectrum is going to be different since our originally + // generated input was not centered. So here, we'd just brute-solve pca to verify + val xi = input.colMeans() + for (r <- 0 until input.nrow) input(r, ::) -= xi + var (pcaControl, _, sControl) = svd(m = input) + pcaControl = (pcaControl %*%: diagv(sControl))(::, 0 until k) + + printf("pca:\n%s\n", pca(0 until 10, 0 until 10)) + printf("pcaControl:\n%s\n", pcaControl(0 until 10, 0 until 10)) + + (pca(0 until 10, 0 until 10).norm - pcaControl(0 until 10, 0 until 10).norm).abs should be < 1E-5 + + } + + test("als") { + + val rnd = RandomUtils.getRandom + + // Number of points + val m = 500 + val n = 500 + + // Length of actual spectrum + val spectrumLen = 40 + + // Create singluar values with decay + val spectrum = dvec((0 until spectrumLen).map(x => 300.0 * exp(-x) max 1e-3)) + printf("spectrum:%s\n", spectrum) + + // Create A as an ideal input + val inCoreA = (qr(Matrices.symmetricUniformView(m, spectrumLen, 1234))._1 %*%: diagv(spectrum)) %*% + qr(Matrices.symmetricUniformView(n, spectrumLen, 2345))._1.t + val drmA = drmParallelize(inCoreA, numPartitions = 2) + + // Decompose using ALS + val (drmU, drmV, rmse) = als(drmInput = drmA, k = 20).toTuple + val inCoreU = drmU.collect + val inCoreV = drmV.collect + + val predict = inCoreU %*% inCoreV.t + + printf("Control block:\n%s\n", inCoreA(0 until 3, 0 until 3)) + printf("ALS factorized approximation block:\n%s\n", predict(0 until 3, 0 until 3)) + + val err = (inCoreA - predict).norm + printf ("norm of residuals %f\n",err) + printf ("train iteration rmses: %s\n", rmse) + + err should be < 1e-2 + + } — End diff – With the latest git-push, all these tests are passing
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-46732343

        Getting much closer to completion. Things which still do not work:

        • seqfile format parser to read/write off HDFS
        • String key support in DRM (int and long works)
        • Fill in implementation of Par() (currently it is a passthrough)
        • more test cases
        • more code comments

        Except the above, the integration is basically working. I have some more performance enhancement changes in mind, but they will happen later. All remaining items are highlighted with /* XXX: */ code comment.

        I will soon provide details on how others who are interested can run and test this. In the mean time, considering the above caveats, code review and comments are welcome.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-46732343 Getting much closer to completion. Things which still do not work: seqfile format parser to read/write off HDFS String key support in DRM (int and long works) Fill in implementation of Par() (currently it is a passthrough) more test cases more code comments Except the above, the integration is basically working. I have some more performance enhancement changes in mind, but they will happen later. All remaining items are highlighted with /* XXX: */ code comment. I will soon provide details on how others who are interested can run and test this. In the mean time, considering the above caveats, code review and comments are welcome.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48712178

        All the points in the previous comments are now completed. This PR is ready for final review.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48712178 All the points in the previous comments are now completed. This PR is ready for final review.
        Hide
        ASF GitHub Bot added a comment -

        Github user pferrel commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48749896

        Are the scalatests implemented in the Spark module that covers math-scala code implemented here somewhere? I'd vote against merge untils those are in all in place and passing.

        The cf stuff has a rather major bug the I'm working on so I wouldn't move this into math-scala just yet, although it would make an interesting speed comparison once completed. The cf changes will require DSL additions that will be under separate review. Don't have a pr number yet.

        Also I may have missed it but there should be clear instructions for how to build this and run it. This is like a heart transplant. Before you release the patient make sure all systems are working correctly, the DSL is not the whole body. There should at least be some end-to-end pipelines in examples that anyone can run from a local installation.

        Beyond these details I have a bigger issue with merging this. Now every time the DSL is changed it may break things in h20 specific code. It already does in cf for instance but I've signed up to fix those fro spark. No committer has signed up to fix code in both Spark and H2O. IMO this is untenable.

        To solve this the entire data prep pipeline must be virtualized to run on either engine so the tests for things like CF and ItemSimilarity (and the multitude of others to come) pass and are engine independent. As it stands any DSL change that breaks the build will have to rely on a contributor's fix. Even if one of you guys was made a committer we will have this problem where a needed change breaks one or the other engine specific code. Unless 99% of the entire pipeline is engine neutral the build will be unmaintainable.

        Crudely speaking this means doing away with all references to a SparkContext and any use of it. So it's not just a matter of reproducing the spark module but reducing the need for one. Making it so small that breakages in one or the other engines code will be infrequent.

        I raised this red flag long ago but in the heat of other issues it seemed minor, but I don't think it can be ignored anymore.

        Show
        ASF GitHub Bot added a comment - Github user pferrel commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48749896 Are the scalatests implemented in the Spark module that covers math-scala code implemented here somewhere? I'd vote against merge untils those are in all in place and passing. The cf stuff has a rather major bug the I'm working on so I wouldn't move this into math-scala just yet, although it would make an interesting speed comparison once completed. The cf changes will require DSL additions that will be under separate review. Don't have a pr number yet. Also I may have missed it but there should be clear instructions for how to build this and run it. This is like a heart transplant. Before you release the patient make sure all systems are working correctly, the DSL is not the whole body. There should at least be some end-to-end pipelines in examples that anyone can run from a local installation. Beyond these details I have a bigger issue with merging this. Now every time the DSL is changed it may break things in h20 specific code. It already does in cf for instance but I've signed up to fix those fro spark. No committer has signed up to fix code in both Spark and H2O. IMO this is untenable. To solve this the entire data prep pipeline must be virtualized to run on either engine so the tests for things like CF and ItemSimilarity (and the multitude of others to come) pass and are engine independent. As it stands any DSL change that breaks the build will have to rely on a contributor's fix. Even if one of you guys was made a committer we will have this problem where a needed change breaks one or the other engine specific code. Unless 99% of the entire pipeline is engine neutral the build will be unmaintainable. Crudely speaking this means doing away with all references to a SparkContext and any use of it. So it's not just a matter of reproducing the spark module but reducing the need for one. Making it so small that breakages in one or the other engines code will be infrequent. I raised this red flag long ago but in the heat of other issues it seemed minor, but I don't think it can be ignored anymore.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48753061

        >
        > Are the scalatests implemented in the Spark module that covers math-scala
        > code implemented here somewhere? I'd vote against merge untils those are in
        > all in place and passing.
        >
        Yes, those were the first tests to pass. You can find them in
        h2o/src/test/org/apache/mahout/math/.

        Also I may have missed it but there should be clear instructions for how to
        > build this and run it. This is like a heart transplant. Before you release
        > the patient make sure all systems are working correctly, the DSL is not the
        > whole body. There should at least be some end-to-end pipelines in examples
        > that anyone can run from a local installation.
        >
        As mentioned in the email, there is a somewhat simple "how to build and
        test" for both local and distributed mode in h2o/README.md. Larger
        end-to-end pipelines and examples are TBD.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48753061 > > Are the scalatests implemented in the Spark module that covers math-scala > code implemented here somewhere? I'd vote against merge untils those are in > all in place and passing. > Yes, those were the first tests to pass. You can find them in h2o/src/test/org/apache/mahout/math/. Also I may have missed it but there should be clear instructions for how to > build this and run it. This is like a heart transplant. Before you release > the patient make sure all systems are working correctly, the DSL is not the > whole body. There should at least be some end-to-end pipelines in examples > that anyone can run from a local installation. > As mentioned in the email, there is a somewhat simple "how to build and test" for both local and distributed mode in h2o/README.md. Larger end-to-end pipelines and examples are TBD.
        Hide
        ASF GitHub Bot added a comment -

        Github user pferrel commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48754615

        The test issue is with the tests in the spark module that actually test stuff in the math-scala module. Remember our discussion about splitting impl from test for cf? There are several things that cannot be tested without the engine in place.

        I will be vocal about objecting to TBD for pipelines. The build will be unmaintainable unless the spark module is reduced to trivial and tiny bits. Any change to the DSL could break things I do not know how to fix and really don't want to sign up for--namely h2o specific TBD stuff.

        Show
        ASF GitHub Bot added a comment - Github user pferrel commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48754615 The test issue is with the tests in the spark module that actually test stuff in the math-scala module. Remember our discussion about splitting impl from test for cf? There are several things that cannot be tested without the engine in place. I will be vocal about objecting to TBD for pipelines. The build will be unmaintainable unless the spark module is reduced to trivial and tiny bits. Any change to the DSL could break things I do not know how to fix and really don't want to sign up for--namely h2o specific TBD stuff.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48757859

        > The test issue is with the tests in the spark module that actually test
        > stuff in the math-scala module. Remember our discussion about splitting
        > impl from test for cf? There are several things that cannot be tested
        > without the engine in place.
        >
        I think we are talking about the same tests here. Please compare for
        yourself -
        https://github.com/avati/mahout/blob/MAHOUT-1500/h2o/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala
        and
        https://github.com/avati/mahout/blob/MAHOUT-1500/spark/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48757859 > The test issue is with the tests in the spark module that actually test > stuff in the math-scala module. Remember our discussion about splitting > impl from test for cf? There are several things that cannot be tested > without the engine in place. > I think we are talking about the same tests here. Please compare for yourself - https://github.com/avati/mahout/blob/MAHOUT-1500/h2o/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala and https://github.com/avati/mahout/blob/MAHOUT-1500/spark/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala
        Hide
        ASF GitHub Bot added a comment -

        Github user pferrel commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48761123

        Exactly, thanks. I see you've done the same for CF also great.

        But this illustrates the problem. I need to change 50% of the tests in CF cooccurrence because they were not catching a bug. Now the tests live in two places h2o and spark. And unless I change the tests in both places the build will break. The files look virtually identical except for the imports, which is good. If that's true, I wonder if we could we use a Scala macro to keep the code all in one file? We might be able to take the same code and produce two artifacts that are both run at build time. That would reduce the load on devs for this kind of thing.

        However currently almost all IO code is spark specific. You must have re-implemented drm.writeDrm for h2o. Until this is *not* a re-implementation but is engine neutral we are going to have a growing problem. I am the only person currently working in spark specific land and only Dmitriy and Sebastian are writing for V2. When other committers get past the Scala barrier and start committing similar stuff they will immediately face this.

        BTW I am very interested in seeing how h2o ItemSimilarityDriver compares to an h2o version. IMO this is the kind of motivation we have to see. If you implemented the driver or the reader/writers we could compare speed on h2o and spark. we have a large enough dataset to make it interesting.

        Show
        ASF GitHub Bot added a comment - Github user pferrel commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48761123 Exactly, thanks. I see you've done the same for CF also great. But this illustrates the problem. I need to change 50% of the tests in CF cooccurrence because they were not catching a bug. Now the tests live in two places h2o and spark. And unless I change the tests in both places the build will break. The files look virtually identical except for the imports, which is good. If that's true, I wonder if we could we use a Scala macro to keep the code all in one file? We might be able to take the same code and produce two artifacts that are both run at build time. That would reduce the load on devs for this kind of thing. However currently almost all IO code is spark specific. You must have re-implemented drm.writeDrm for h2o. Until this is * not * a re-implementation but is engine neutral we are going to have a growing problem. I am the only person currently working in spark specific land and only Dmitriy and Sebastian are writing for V2. When other committers get past the Scala barrier and start committing similar stuff they will immediately face this. BTW I am very interested in seeing how h2o ItemSimilarityDriver compares to an h2o version. IMO this is the kind of motivation we have to see. If you implemented the driver or the reader/writers we could compare speed on h2o and spark. we have a large enough dataset to make it interesting.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48762753

        On Fri, Jul 11, 2014 at 10:46 AM, Pat Ferrel <notifications@github.com>
        wrote:

        > Exactly, thanks. I see you've done the same for CF also great.
        >
        > But this illustrates the problem. I need to change 50% of the tests in CF
        > cooccurrence because they were not catching a bug. Now the tests live in
        > two places h2o and spark. And unless I change the tests in both places the
        > build will break. The files look virtually identical except for the
        > imports, which is good. If that's true, I wonder if we could we use a Scala
        > macro to keep the code all in one file? We might be able to take the same
        > code and produce two artifacts that are both run at build time. That would
        > reduce the load on devs for this kind of thing.
        >
        As we discussed on another email thread, I'm independently working on how
        to move tests back into math-scala. That effort should address this concern
        I think?

        However currently almost all IO code is spark specific. You must have
        > re-implemented drm.writeDrm for h2o. Until this is not a
        > re-implementation but is engine neutral we are going to have a growing
        > problem.
        >
        Why is this a problem? drm.writeDrm() accepts an engine neutral path, like
        "hdfs://.." or "file://..." and the content of what gets written is the
        well defined sequencefile format no matter what the runtime backend is. And
        as long as the path and file content are engine neutral, why should
        pipeline code worry how the IO implementation is done? Again, am I missing
        something?

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48762753 On Fri, Jul 11, 2014 at 10:46 AM, Pat Ferrel <notifications@github.com> wrote: > Exactly, thanks. I see you've done the same for CF also great. > > But this illustrates the problem. I need to change 50% of the tests in CF > cooccurrence because they were not catching a bug. Now the tests live in > two places h2o and spark. And unless I change the tests in both places the > build will break. The files look virtually identical except for the > imports, which is good. If that's true, I wonder if we could we use a Scala > macro to keep the code all in one file? We might be able to take the same > code and produce two artifacts that are both run at build time. That would > reduce the load on devs for this kind of thing. > As we discussed on another email thread, I'm independently working on how to move tests back into math-scala. That effort should address this concern I think? However currently almost all IO code is spark specific. You must have > re-implemented drm.writeDrm for h2o. Until this is not a > re-implementation but is engine neutral we are going to have a growing > problem. > Why is this a problem? drm.writeDrm() accepts an engine neutral path, like "hdfs://.." or "file://..." and the content of what gets written is the well defined sequencefile format no matter what the runtime backend is. And as long as the path and file content are engine neutral, why should pipeline code worry how the IO implementation is done? Again, am I missing something?
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48766569

        Look at #28. Just spent 30 mins doing quick refactoring, should help you with test independence. Every engine should run some common asserts which are included in the `*SuiteBase` traits

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48766569 Look at #28. Just spent 30 mins doing quick refactoring, should help you with test independence. Every engine should run some common asserts which are included in the `*SuiteBase` traits
        Hide
        ASF GitHub Bot added a comment -

        Github user pferrel commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48767998

        So you don't see how changing the drm API or storage format will now break code in two places written for two different engines? If I make the change to drm I can fix spark breakage but not h2o. This bit of code is extremely stable and super simple for spark so may be a bad example but new code will not be so stable just the opposite. For each new IO operation (SparkContext dependent) or engine tuning (SparkConf dependent) we will grow the problem. The core will become untouchable or breakage will happen in places one engineer will not be able to fix.

        This is a real issue, I need to change code in math-scala today, already have but it isn't pushed. Who knows what that will break in h2o implementations? I will be changing cooccurrence tests, so have to make them in two places. Maybe I can do that but when they diverge further than this example I won't be able to.

        You guys need to address these issues as if you were supporting two engines for all Mahout code or you will never see what Mahout committers problems will be.

        Show
        ASF GitHub Bot added a comment - Github user pferrel commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48767998 So you don't see how changing the drm API or storage format will now break code in two places written for two different engines? If I make the change to drm I can fix spark breakage but not h2o. This bit of code is extremely stable and super simple for spark so may be a bad example but new code will not be so stable just the opposite. For each new IO operation (SparkContext dependent) or engine tuning (SparkConf dependent) we will grow the problem. The core will become untouchable or breakage will happen in places one engineer will not be able to fix. This is a real issue, I need to change code in math-scala today, already have but it isn't pushed. Who knows what that will break in h2o implementations? I will be changing cooccurrence tests, so have to make them in two places. Maybe I can do that but when they diverge further than this example I won't be able to. You guys need to address these issues as if you were supporting two engines for all Mahout code or you will never see what Mahout committers problems will be.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48769459

        @pferrel (in case you are talking to me) sorry don't have time to read the whole discussion. if you can point me to concrete places in the code what you think is needed to be done and why, i may be able to try to figure it. But as for as h20 issue, independent tests have nothing really new that @avati hasn't already done (except he cut-and-pasted them, and now he needs just to remove all cut-and-paste and just pull in a trait form math-scala).

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48769459 @pferrel (in case you are talking to me) sorry don't have time to read the whole discussion. if you can point me to concrete places in the code what you think is needed to be done and why, i may be able to try to figure it. But as for as h20 issue, independent tests have nothing really new that @avati hasn't already done (except he cut-and-pasted them, and now he needs just to remove all cut-and-paste and just pull in a trait form math-scala).
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-48771908

        On Fri, Jul 11, 2014 at 11:46 AM, Pat Ferrel <notifications@github.com>
        wrote:

        > So you don't see how changing the drm API or storage format will now break
        > code in two places written for two different engines?
        >
        Changing DRM API? Yes, of course - that is the nature of the beast of
        supporting multiple implementations behind a single abstraction. Change in
        abstraction API will need corresponding change in all backends. That's the
        reason why APIs must be designed carefully so that future changes to them
        are estimated to be most minimum. I don't see how this by itself qualifies
        as an objection.

        Storage format? Neither spark nor h2o is defining any storage formats. The
        current APIs read and write to sequence files whose formats are very well
        defined and standardized. As far the they both read and write that common
        format from engine neutral locations, I don't see any problems at all.

        If I make the change to drm I can fix spark breakage but not h2o. This bit
        > of code is extremely stable and super simple for spark so may be a bad
        > example but new code will not be so stable just the opposite. For each new
        > IO operation (SparkContext dependent) or engine tuning (SparkConf
        > dependent) we will grow the problem. The core will become untouchable or
        > breakage will happen in places one engineer will not be able to fix.
        >
        Can you please provide a more concrete example for both "make change do
        drm" and "new IO operation (SparkContext dependent)"? It is hard for me to
        visualize the problems you are foreseeing without more specifics.

        This is a real issue, I need to change code in math-scala today, already
        > have but it isn't pushed. Who knows what that will break in h2o
        > implementations? I will be changing cooccurrence tests, so have to make
        > them in two places. Maybe I can do that but when they diverge further than
        > this example I won't be able to.
        >
        Well, as long as you are fixing a bug in cf logic, that should be engine
        independent. However if you are adding a new DRM API or modifying an
        existing DRM API - that will need corresponding changes in all the engines.
        There's no getting around that. That's something we all have to live with,
        no matter what project it is.

        > You guys need to address these issues as if you were supporting two
        > engines for all Mahout code or you will never see what Mahout committers
        > problems will be.
        >

        As I said before, please provide a concrete example of what the issues are.
        I don't know what to fix yet.

        Thanks

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-48771908 On Fri, Jul 11, 2014 at 11:46 AM, Pat Ferrel <notifications@github.com> wrote: > So you don't see how changing the drm API or storage format will now break > code in two places written for two different engines? > Changing DRM API? Yes, of course - that is the nature of the beast of supporting multiple implementations behind a single abstraction. Change in abstraction API will need corresponding change in all backends. That's the reason why APIs must be designed carefully so that future changes to them are estimated to be most minimum. I don't see how this by itself qualifies as an objection. Storage format? Neither spark nor h2o is defining any storage formats. The current APIs read and write to sequence files whose formats are very well defined and standardized. As far the they both read and write that common format from engine neutral locations, I don't see any problems at all. If I make the change to drm I can fix spark breakage but not h2o. This bit > of code is extremely stable and super simple for spark so may be a bad > example but new code will not be so stable just the opposite. For each new > IO operation (SparkContext dependent) or engine tuning (SparkConf > dependent) we will grow the problem. The core will become untouchable or > breakage will happen in places one engineer will not be able to fix. > Can you please provide a more concrete example for both "make change do drm" and "new IO operation (SparkContext dependent)"? It is hard for me to visualize the problems you are foreseeing without more specifics. This is a real issue, I need to change code in math-scala today, already > have but it isn't pushed. Who knows what that will break in h2o > implementations? I will be changing cooccurrence tests, so have to make > them in two places. Maybe I can do that but when they diverge further than > this example I won't be able to. > Well, as long as you are fixing a bug in cf logic, that should be engine independent. However if you are adding a new DRM API or modifying an existing DRM API - that will need corresponding changes in all the engines. There's no getting around that. That's something we all have to live with, no matter what project it is. > You guys need to address these issues as if you were supporting two > engines for all Mahout code or you will never see what Mahout committers > problems will be. > As I said before, please provide a concrete example of what the issues are. I don't know what to fix yet. Thanks
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/28#issuecomment-48945841

        @dlyubimov do you intend to merge this soon? I plan to rebase MAHOUT-1500 on top of this.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/28#issuecomment-48945841 @dlyubimov do you intend to merge this soon? I plan to rebase MAHOUT-1500 on top of this.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49679780

        Please note this PR is fully "working" now that #29 and #28 are merged. Please consider this for merge.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49679780 Please note this PR is fully "working" now that #29 and #28 are merged. Please consider this for merge.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15203732

        — Diff: h2o/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala —
        @@ -0,0 +1,212 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.math.decompositions
        +
        +import org.scalatest.

        {Matchers, FunSuite}

        +import org.apache.mahout.h2obindings.test.MahoutLocalContext
        +import org.apache.mahout.math._
        +import drm._
        +import scalabindings._
        +import RLikeOps._
        +import RLikeDrmOps._
        +import org.apache.mahout.h2obindings._
        +import org.apache.mahout.common.RandomUtils
        +import scala.math._
        +
        +class MathSuite extends FunSuite with Matchers with MahoutLocalContext {
        — End diff –

        hm. i though this is not part of distributed decompositions suite and has been moved out to math-scala?

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15203732 — Diff: h2o/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala — @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.math.decompositions + +import org.scalatest. {Matchers, FunSuite} +import org.apache.mahout.h2obindings.test.MahoutLocalContext +import org.apache.mahout.math._ +import drm._ +import scalabindings._ +import RLikeOps._ +import RLikeDrmOps._ +import org.apache.mahout.h2obindings._ +import org.apache.mahout.common.RandomUtils +import scala.math._ + +class MathSuite extends FunSuite with Matchers with MahoutLocalContext { — End diff – hm. i though this is not part of distributed decompositions suite and has been moved out to math-scala?
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49680647

        will it merge with rbind() code?

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49680647 will it merge with rbind() code?
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15203859

        — Diff: h2o/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala —
        @@ -0,0 +1,212 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.math.decompositions
        +
        +import org.scalatest.

        {Matchers, FunSuite}

        +import org.apache.mahout.h2obindings.test.MahoutLocalContext
        +import org.apache.mahout.math._
        +import drm._
        +import scalabindings._
        +import RLikeOps._
        +import RLikeDrmOps._
        +import org.apache.mahout.h2obindings._
        +import org.apache.mahout.common.RandomUtils
        +import scala.math._
        +
        +class MathSuite extends FunSuite with Matchers with MahoutLocalContext {
        — End diff –

        Ah, I forgot to git-rm this. Let me do that right away.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15203859 — Diff: h2o/src/test/scala/org/apache/mahout/math/decompositions/MathSuite.scala — @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.math.decompositions + +import org.scalatest. {Matchers, FunSuite} +import org.apache.mahout.h2obindings.test.MahoutLocalContext +import org.apache.mahout.math._ +import drm._ +import scalabindings._ +import RLikeOps._ +import RLikeDrmOps._ +import org.apache.mahout.h2obindings._ +import org.apache.mahout.common.RandomUtils +import scala.math._ + +class MathSuite extends FunSuite with Matchers with MahoutLocalContext { — End diff – Ah, I forgot to git-rm this. Let me do that right away.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49680913

        rbind() is not yet added, I wasn't even sure if the DRM api would be accepted before I implemented for H2O. I plan to submit a separate PR for rbind().

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49680913 rbind() is not yet added, I wasn't even sure if the DRM api would be accepted before I implemented for H2O. I plan to submit a separate PR for rbind().
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49681182

        Removed MathSuite. Re-ran mvn test and everything is passing.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49681182 Removed MathSuite. Re-ran mvn test and everything is passing.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49681458

        rbind() is not yet added, I wasn't even sure if the DRM api would be accepted before I
        implemented for H2O. I plan to submit a separate PR for rbind().

        ok. contingent on this promise, +1 on merging.

        given magnitude of this review, i suggest 2 more votes/reviewers. Additional non-binding reviews/sign-offs from 0xdata members are also IMO desirable.

        And IMO we need to resolve whatever concerns Pat may have with this PR.

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49681458 rbind() is not yet added, I wasn't even sure if the DRM api would be accepted before I implemented for H2O. I plan to submit a separate PR for rbind(). ok. contingent on this promise, +1 on merging. given magnitude of this review, i suggest 2 more votes/reviewers. Additional non-binding reviews/sign-offs from 0xdata members are also IMO desirable. And IMO we need to resolve whatever concerns Pat may have with this PR.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49681859

        On Mon, Jul 21, 2014 at 4:49 PM, Dmitriy Lyubimov <notifications@github.com>
        wrote:

        > rbind() is not yet added, I wasn't even sure if the DRM api would be accepted before I
        > implemented for H2O. I plan to submit a separate PR for rbind().
        >
        > ok. contingent on this promise, +1 on merging.
        >
        > given magnitude of this review, i suggest 2 more votes/reviewers.
        > Additional non-binding reviews/sign-offs from 0xdata members are also IMO
        > desirable.
        >

        I will ping 0xdata members.

        And IMO we need to resolve whatever concerns Pat may have with this PR.
        >

        I assumed the concerns were resolved on the dev@ email list (ref: "Call for
        vote on integrating h2o")

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49681859 On Mon, Jul 21, 2014 at 4:49 PM, Dmitriy Lyubimov <notifications@github.com> wrote: > rbind() is not yet added, I wasn't even sure if the DRM api would be accepted before I > implemented for H2O. I plan to submit a separate PR for rbind(). > > ok. contingent on this promise, +1 on merging. > > given magnitude of this review, i suggest 2 more votes/reviewers. > Additional non-binding reviews/sign-offs from 0xdata members are also IMO > desirable. > I will ping 0xdata members. And IMO we need to resolve whatever concerns Pat may have with this PR. > I assumed the concerns were resolved on the dev@ email list (ref: "Call for vote on integrating h2o")
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49682101

        I would like @pferrel to sign off here

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49682101 I would like @pferrel to sign off here
        Hide
        ASF GitHub Bot added a comment -

        Github user pferrel commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49684400

        Call me the loyal opposition. I'd rather merge math with h2o than h2o with Mahout but will bow to the majority and I count the vote at 2 to one (me).

        Show
        ASF GitHub Bot added a comment - Github user pferrel commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49684400 Call me the loyal opposition. I'd rather merge math with h2o than h2o with Mahout but will bow to the majority and I count the vote at 2 to one (me).
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49685224

        On Mon, Jul 21, 2014 at 5:31 PM, Pat Ferrel <notifications@github.com>
        wrote:

        > Call me the loyal opposition. I'd rather merge math with h2o than h2o with
        > Mahout but will bow to the majority and I count the vote at 2 to one (me).
        >
        is it +0? http://www.apache.org/foundation/voting.html

        There is a subtle danger that introducing new DRMLike operations will then
        require H20 symmetric implementation. So if there's a lot still expected,
        i'd say -0 is validated. It is important that you tell us that, because
        as it stands you are the only one working on a method at the moment. (well
        i do some internally as well, but my additions are strictly minor, i don't
        need anything earth shattering).

        > —
        > Reply to this email directly or view it on GitHub
        > <https://github.com/apache/mahout/pull/21#issuecomment-49684400>.
        >

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49685224 On Mon, Jul 21, 2014 at 5:31 PM, Pat Ferrel <notifications@github.com> wrote: > Call me the loyal opposition. I'd rather merge math with h2o than h2o with > Mahout but will bow to the majority and I count the vote at 2 to one (me). > is it +0? http://www.apache.org/foundation/voting.html There is a subtle danger that introducing new DRMLike operations will then require H20 symmetric implementation. So if there's a lot still expected, i'd say -0 is validated. It is important that you tell us that, because as it stands you are the only one working on a method at the moment. (well i do some internally as well, but my additions are strictly minor, i don't need anything earth shattering). > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/21#issuecomment-49684400 >. >
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49685806

        FYI, I am volunteering to keep h2obindings up to date as new DRM api are
        added. I Dont think any R like or MATLAB like operators are fundamentally
        impossible on h2o back end.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49685806 FYI, I am volunteering to keep h2obindings up to date as new DRM api are added. I Dont think any R like or MATLAB like operators are fundamentally impossible on h2o back end.
        Hide
        ASF GitHub Bot added a comment -

        Github user pferrel commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49738832

        If you are asking me if there is likelihood of significant additions to the DSL or core operations that will require "symmetric" implementations in two engines, the answer is yes. Look at Ted's wishlist. To get cooccurrence data prep working has brought up two issues and this is one of the simplest algos.

        Show
        ASF GitHub Bot added a comment - Github user pferrel commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49738832 If you are asking me if there is likelihood of significant additions to the DSL or core operations that will require "symmetric" implementations in two engines, the answer is yes. Look at Ted's wishlist. To get cooccurrence data prep working has brought up two issues and this is one of the simplest algos.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49760146

        We already discussed this till exhaustion on the mailing list about this
        model, advantages and how it has been working successfully in other
        projects. We even agreed if new api is added, then just add an empty stub
        in h2o bindings which throws unimpl. I am also volunteering to keep the
        bindings up to date.

        If you still do not feel like working together again, I shall rest my case
        at this.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49760146 We already discussed this till exhaustion on the mailing list about this model, advantages and how it has been working successfully in other projects. We even agreed if new api is added, then just add an empty stub in h2o bindings which throws unimpl. I am also volunteering to keep the bindings up to date. If you still do not feel like working together again, I shall rest my case at this.
        Hide
        ASF GitHub Bot added a comment -

        Github user pferrel commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49791289

        Don't overreact here. Dmitriy asked a question so I answered it. I have no intention of further debate on this. I wouldn't block this if I could. It would take a lot more committers making a fuss to do that and I don't see it. I'll be happy to live with the majority view and try to constructively keep the project on track.

        Show
        ASF GitHub Bot added a comment - Github user pferrel commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49791289 Don't overreact here. Dmitriy asked a question so I answered it. I have no intention of further debate on this. I wouldn't block this if I could. It would take a lot more committers making a fuss to do that and I don't see it. I'll be happy to live with the majority view and try to constructively keep the project on track.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49803586

        @avati Anand, i would like to try and squeeze #33 ahead of this.

        #33 changes DrmLike by adding a new lazy evaluation to plans (canHaveMissingRows) to track potentially missing row condition thruout DAGs if it ever was (lazily) detected in the original sources.

        it also fixes A+1 case on spark side. Spark side is fairly agnostic of other engines, it really up to them if they would allow missing implied rows or not. Spark engine chooses to allow that and perform lazy evaluation whenever required.

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49803586 @avati Anand, i would like to try and squeeze #33 ahead of this. #33 changes DrmLike by adding a new lazy evaluation to plans (canHaveMissingRows) to track potentially missing row condition thruout DAGs if it ever was (lazily) detected in the original sources. it also fixes A+1 case on spark side. Spark side is fairly agnostic of other engines, it really up to them if they would allow missing implied rows or not. Spark engine chooses to allow that and perform lazy evaluation whenever required.
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49813616

        Hi @avati, is there something Java 1.7 specific in the dependencies here? I'm getting a test failure in the h2o module:

        Discovery starting.

            • RUN ABORTED ***
              java.lang.UnsupportedClassVersionError: water/MRTask : Unsupported major.minor version 51.0
        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49813616 Hi @avati, is there something Java 1.7 specific in the dependencies here? I'm getting a test failure in the h2o module: Discovery starting. RUN ABORTED *** java.lang.UnsupportedClassVersionError: water/MRTask : Unsupported major.minor version 51.0
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49813755

        some of the byte code classes are of higher version than the JRE you are
        running on. I suppose, yes, there are some 1.7 specific dependency jars
        there

        On Tue, Jul 22, 2014 at 4:07 PM, Andrew Palumbo <notifications@github.com>
        wrote:

        > Hi @avati <https://github.com/avati>, is there something Java 1.7
        > specific in the dependencies here? I'm getting a test failure in the h2o
        > module:
        >
        > Discovery starting.
        > *** RUN ABORTED ***
        > java.lang.UnsupportedClassVersionError: water/MRTask : Unsupported
        > major.minor version 51.0
        >
        > —
        > Reply to this email directly or view it on GitHub
        > <https://github.com/apache/mahout/pull/21#issuecomment-49813616>.
        >

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49813755 some of the byte code classes are of higher version than the JRE you are running on. I suppose, yes, there are some 1.7 specific dependency jars there On Tue, Jul 22, 2014 at 4:07 PM, Andrew Palumbo <notifications@github.com> wrote: > Hi @avati < https://github.com/avati >, is there something Java 1.7 > specific in the dependencies here? I'm getting a test failure in the h2o > module: > > Discovery starting. > *** RUN ABORTED *** > java.lang.UnsupportedClassVersionError: water/MRTask : Unsupported > major.minor version 51.0 > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/21#issuecomment-49813616 >. >
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49813969

        Andrew - I think have been testing on Java 1.7 (can't say for sure until i
        get to my workstation).

        On Tue, Jul 22, 2014 at 4:07 PM, Andrew Palumbo <notifications@github.com>
        wrote:

        > Hi @avati <https://github.com/avati>, is there something Java 1.7
        > specific in the dependencies here? I'm getting a test failure in the h2o
        > module:
        >
        > Discovery starting.
        > *** RUN ABORTED ***
        > java.lang.UnsupportedClassVersionError: water/MRTask : Unsupported
        > major.minor version 51.0
        >
        > —
        > Reply to this email directly or view it on GitHub
        > <https://github.com/apache/mahout/pull/21#issuecomment-49813616>.
        >

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49813969 Andrew - I think have been testing on Java 1.7 (can't say for sure until i get to my workstation). On Tue, Jul 22, 2014 at 4:07 PM, Andrew Palumbo <notifications@github.com> wrote: > Hi @avati < https://github.com/avati >, is there something Java 1.7 > specific in the dependencies here? I'm getting a test failure in the h2o > module: > > Discovery starting. > *** RUN ABORTED *** > java.lang.UnsupportedClassVersionError: water/MRTask : Unsupported > major.minor version 51.0 > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/21#issuecomment-49813616 >. >
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49816008

        @andrewpalumbo - yes, please use 1.7 JRE. Please let me know how your testing goes.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49816008 @andrewpalumbo - yes, please use 1.7 JRE. Please let me know how your testing goes.
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49817944

        Willdo. I've been running 1.6 on this machine because I think that's what we're officially stuck at.

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49817944 Willdo. I've been running 1.6 on this machine because I think that's what we're officially stuck at.
        Hide
        ASF GitHub Bot added a comment -

        Github user cliffclick commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49894450

        This is a very basic port, focused on correctness & completeness, with no effort for performance.
        Expectation Setting: There's easy 2x to 10x speedups in most of the operator inner loops. The HDFS sequence-file reader/writers are single-threaded-single-node; H2O's internal CSV reader will be easily 100x faster.
        Performance work should be in later commits.

        Minor comments:
        Lots of places, esp reduce() calls, could/should call ArrayUtils.add(this,that) instead of a loop over the arrays being added.

        H2OHelper.empty_frame looks a ton like it should call "Vec.makeZero()" in a loop instead of hand rolling Vecs of zeros; there's a version which will take a hand-rolled layout. This call probably should move into Frame class directly.

        The technique for row-labeling seems... awkward at best. Or at least I'm reading that to be the purpose of using Tuple2. I think this design needs more exploring - e.g. insert a row-column in front of the "normal" Frame columns, and teach the follow-on code to skip 1st column. Note that many datasets have non-numeric cols (e.g. name, address) that cannot participate in math ops, and so most H2O algos already carry forward a notion of a set of columns being worked on.

        Cliff

        Show
        ASF GitHub Bot added a comment - Github user cliffclick commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49894450 This is a very basic port, focused on correctness & completeness, with no effort for performance. Expectation Setting: There's easy 2x to 10x speedups in most of the operator inner loops. The HDFS sequence-file reader/writers are single-threaded-single-node; H2O's internal CSV reader will be easily 100x faster. Performance work should be in later commits. Minor comments: Lots of places, esp reduce() calls, could/should call ArrayUtils.add(this,that) instead of a loop over the arrays being added. H2OHelper.empty_frame looks a ton like it should call "Vec.makeZero()" in a loop instead of hand rolling Vecs of zeros; there's a version which will take a hand-rolled layout. This call probably should move into Frame class directly. The technique for row-labeling seems... awkward at best. Or at least I'm reading that to be the purpose of using Tuple2. I think this design needs more exploring - e.g. insert a row-column in front of the "normal" Frame columns, and teach the follow-on code to skip 1st column. Note that many datasets have non-numeric cols (e.g. name, address) that cannot participate in math ops, and so most H2O algos already carry forward a notion of a set of columns being worked on. Cliff
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49899401

        Thanks for the comments @cliffclick . I'll work on ArrayUtils and Vec.makeZero() usage.

        Regarding row labeling, I wanted to keep the operator inner-loop free of if() and else to skip first row optionally (i.e keep the inner loop focussed on just the math.) However, now that I think, it should be possible to filter out the label vec optionally even before entering MRTask, and have both the matrix and row labels within the same Frame.

        I'll work on these comments and re-post. Thanks!

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49899401 Thanks for the comments @cliffclick . I'll work on ArrayUtils and Vec.makeZero() usage. Regarding row labeling, I wanted to keep the operator inner-loop free of if() and else to skip first row optionally (i.e keep the inner loop focussed on just the math.) However, now that I think, it should be possible to filter out the label vec optionally even before entering MRTask, and have both the matrix and row labels within the same Frame. I'll work on these comments and re-post. Thanks!
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49946698

        @cliffclick I have updated with review comments. Note that even though I did away with Tuple2, I am using a new H2ODrm in place. Having the optional row in the same Frame made things very confusing for a reviewer to instantly identify if a given Frame was with row labels or without. H2ODrm has potential future uses (extra members) as well.

        I have also made the drmfromHdfs() api fall back to H2O parser (csv etc) if a given file is not a sequence file format. So this opens up the possibility to tweak the job pipeline to use csv files instead of seqfiles and gain in performance and compression.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49946698 @cliffclick I have updated with review comments. Note that even though I did away with Tuple2, I am using a new H2ODrm in place. Having the optional row in the same Frame made things very confusing for a reviewer to instantly identify if a given Frame was with row labels or without. H2ODrm has potential future uses (extra members) as well. I have also made the drmfromHdfs() api fall back to H2O parser (csv etc) if a given file is not a sequence file format. So this opens up the possibility to tweak the job pipeline to use csv files instead of seqfiles and gain in performance and compression.
        Hide
        ASF GitHub Bot added a comment -

        Github user cliffclick commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-49948794

        Looks good to me.

        Show
        ASF GitHub Bot added a comment - Github user cliffclick commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-49948794 Looks good to me.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50363968

        Ping. Requesting some review/merge attention from the committers.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50363968 Ping. Requesting some review/merge attention from the committers.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50531008

        Implemented canHaveMissingRows(). All tests are passing. Please let me know if anything else required for merge.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50531008 Implemented canHaveMissingRows(). All tests are passing. Please let me know if anything else required for merge.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50553289

        @dlyubimov Thanks for merging #30. I have now added Rbind operator and refreshed the PR. All tests are passing. Let me know if this is sufficient for merge.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50553289 @dlyubimov Thanks for merging #30. I have now added Rbind operator and refreshed the PR. All tests are passing. Let me know if this is sufficient for merge.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50554340

        As i indicated, i am waiting on 2 more votes. We had what i have no choice
        but interpret as +0 from Pat, and +1 from me.

        On Tue, Jul 29, 2014 at 4:24 PM, Anand Avati <notifications@github.com>
        wrote:

        > @dlyubimov <https://github.com/dlyubimov> Thanks for merging #30
        > <https://github.com/apache/mahout/pull/30>. I have now added Rbind
        > operator and refreshed the PR. All tests are passing. Let me know if this
        > is sufficient for merge.
        >
        > —
        > Reply to this email directly or view it on GitHub
        > <https://github.com/apache/mahout/pull/21#issuecomment-50553289>.
        >

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50554340 As i indicated, i am waiting on 2 more votes. We had what i have no choice but interpret as +0 from Pat, and +1 from me. On Tue, Jul 29, 2014 at 4:24 PM, Anand Avati <notifications@github.com> wrote: > @dlyubimov < https://github.com/dlyubimov > Thanks for merging #30 > < https://github.com/apache/mahout/pull/30 >. I have now added Rbind > operator and refreshed the PR. All tests are passing. Let me know if this > is sufficient for merge. > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/21#issuecomment-50553289 >. >
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50572257

        All tests are passing for me now running Java 1.7

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50572257 All tests are passing for me now running Java 1.7
        Hide
        ASF GitHub Bot added a comment -

        Github user gcapan commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50625655

        Tests pass for me for various profiles, and the code looks good. I am a supporter of engine-agnostic architecture and separation of actual algorithms from backends, and multiple backends (in addition both Spark and H2O being very promising platforms) would force us implement generic solutions for data preprocessing, vectorization, machine learning and big data mining. In summary, my vote is +1 for that contribution.

        PS: Not H2O specific, but wanted to add here: I believe the next step should be standardizing minimal Matrix I/O capability (i.e. a couple file formats other than [row_id, VectorWritable] SequenceFiles) required for a distributed computation engine, and adding data frame like structures those allow text columns.

        Show
        ASF GitHub Bot added a comment - Github user gcapan commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50625655 Tests pass for me for various profiles, and the code looks good. I am a supporter of engine-agnostic architecture and separation of actual algorithms from backends, and multiple backends (in addition both Spark and H2O being very promising platforms) would force us implement generic solutions for data preprocessing, vectorization, machine learning and big data mining. In summary, my vote is +1 for that contribution. PS: Not H2O specific, but wanted to add here: I believe the next step should be standardizing minimal Matrix I/O capability (i.e. a couple file formats other than [row_id, VectorWritable] SequenceFiles) required for a distributed computation engine, and adding data frame like structures those allow text columns.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50654169

        (1) No review from most vocal backers?

        (2) m-1500 is unassigned. Whoever wishes to commit this issue, please take over m-1500 and continue.

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50654169 (1) No review from most vocal backers? (2) m-1500 is unassigned. Whoever wishes to commit this issue, please take over m-1500 and continue.
        Hide
        ASF GitHub Bot added a comment -

        Github user pferrel commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50664882

        So to be clear, this will require 1.7 on all machines from now on? Not just build and running h2o?

        Show
        ASF GitHub Bot added a comment - Github user pferrel commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50664882 So to be clear, this will require 1.7 on all machines from now on? Not just build and running h2o?
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50665695

        > So to be clear, this will require 1.7 on all machines from now on? Not
        > just build and running h2o?
        >

        It requires 1.7 only if you are running h2o (because h2o-core artifact is a
        1.7 binary). You can build in 1.6 with/without h2obindings.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50665695 > So to be clear, this will require 1.7 on all machines from now on? Not > just build and running h2o? > It requires 1.7 only if you are running h2o (because h2o-core artifact is a 1.7 binary). You can build in 1.6 with/without h2obindings.
        Hide
        ASF GitHub Bot added a comment -

        Github user pferrel commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50668509

        cool, BTW I read Ted and Suneel as +1 in the email thread

        Show
        ASF GitHub Bot added a comment - Github user pferrel commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50668509 cool, BTW I read Ted and Suneel as +1 in the email thread
        Hide
        Ted Dunning added a comment -

        I am out of town but will get on this when I get back. Connectivity is too bad here to use seriously.

        Sent from my iPhone

        Show
        Ted Dunning added a comment - I am out of town but will get on this when I get back. Connectivity is too bad here to use seriously. Sent from my iPhone
        Hide
        Ted Dunning added a comment -

        I plan to vote +1 unless is see something horrible. I can't believe I will.

        I don't understand why we need three votes here in any case.

        Sent from my iPhone

        Show
        Ted Dunning added a comment - I plan to vote +1 unless is see something horrible. I can't believe I will. I don't understand why we need three votes here in any case. Sent from my iPhone
        Hide
        Pat Ferrel added a comment -

        Even without a vote someone needs to self-assign this issue, merge, and close it. Dmitriy has said he won't.

        Show
        Pat Ferrel added a comment - Even without a vote someone needs to self-assign this issue, merge, and close it. Dmitriy has said he won't.
        Hide
        Dmitriy Lyubimov added a comment -

        The reason additional review on github is needed is because i only spent ~5 minutes eyeballing less than 10% of the code which spawned 5 or so notes and subsequent fixes. In that sense, help is needed. Surely, benefits of cross peer code review should really require no specific justification. This is code review on github (which is our reviewboard replacement) – you just need to look at the code, poke it if possible, leave suggestions, or just say "ship it". I know, reviewboard was pain to use, but github PRs are so easy, so there's practically no pretext left not to do it as much as possible.

        Which begs the question. All these people who apparently said "ship it" on github, excluding 0xdata review, produced exactly 0 code notes. Either review were even more superficial than mine, or it is the most impeccable code patch in the history of the project.

        Show
        Dmitriy Lyubimov added a comment - The reason additional review on github is needed is because i only spent ~5 minutes eyeballing less than 10% of the code which spawned 5 or so notes and subsequent fixes. In that sense, help is needed. Surely, benefits of cross peer code review should really require no specific justification. This is code review on github (which is our reviewboard replacement) – you just need to look at the code, poke it if possible, leave suggestions, or just say "ship it". I know, reviewboard was pain to use, but github PRs are so easy, so there's practically no pretext left not to do it as much as possible. Which begs the question. All these people who apparently said "ship it" on github, excluding 0xdata review, produced exactly 0 code notes. Either review were even more superficial than mine, or it is the most impeccable code patch in the history of the project.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15654236

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/ABt.java —
        @@ -0,0 +1,63 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import org.apache.mahout.h2obindings.H2OHelper;
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +
        +public class ABt {
        + /* Calculate AB' */
        — End diff –

        Well one fundamental thing that i surely missed is that this module's physical operators are written in java wheres all tests and apis are in scala. So the maven module is fragmented between java and scala code. Something i was intentionally tried to avoid (either module is 100% java, or 100% scala).

        I suppose it is not going to stop it from committing now, but it just shows how superficial my initial review was.

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15654236 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/ABt.java — @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import org.apache.mahout.h2obindings.H2OHelper; +import org.apache.mahout.h2obindings.drm.H2ODrm; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; + +public class ABt { + /* Calculate AB' */ — End diff – Well one fundamental thing that i surely missed is that this module's physical operators are written in java wheres all tests and apis are in scala. So the maven module is fragmented between java and scala code. Something i was intentionally tried to avoid (either module is 100% java, or 100% scala). I suppose it is not going to stop it from committing now, but it just shows how superficial my initial review was.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15654554

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtB.java —
        @@ -0,0 +1,66 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import org.apache.mahout.h2obindings.H2OHelper;
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +
        +public class AtB {
        + /* Calculate A'B */
        + public static H2ODrm AtB(H2ODrm DrmA, H2ODrm DrmB) {
        + final Frame A = DrmA.frame;
        + final Frame B = DrmB.frame;
        +
        + /* First create an empty frame of the required dimensions */
        + Frame AtB = H2OHelper.empty_frame(A.numCols(), B.numCols(), -1, -1);
        +
        + /* Execute MRTask on the new Frame, and fill each cell (initially 0) by
        + computing appropriate values from A and B.
        +
        + chks.length == B.numCols()
        + */
        + new MRTask() {
        + public void map(Chunk chks[]) {
        + int chunk_size = chks[0].len();
        + long start = chks[0].start();
        + long A_rows = A.numRows();
        + Vec A_vecs[] = A.vecs();
        + Vec B_vecs[] = B.vecs();
        +
        + for (int c = 0; c < chks.length; c++) {
        + for (int r = 0; r < chunk_size; r++) {
        + double v = 0;
        + for (long i = 0; i < A_rows; i++) {
        + v += (A_vecs[(int)(start+r)].at * B_vecs[c].at);
        — End diff –

        Here and elsewhere. Operator spacing style. please use autoformatting features in idea.

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15654554 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtB.java — @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import org.apache.mahout.h2obindings.H2OHelper; +import org.apache.mahout.h2obindings.drm.H2ODrm; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; + +public class AtB { + /* Calculate A'B */ + public static H2ODrm AtB(H2ODrm DrmA, H2ODrm DrmB) { + final Frame A = DrmA.frame; + final Frame B = DrmB.frame; + + /* First create an empty frame of the required dimensions */ + Frame AtB = H2OHelper.empty_frame(A.numCols(), B.numCols(), -1, -1); + + /* Execute MRTask on the new Frame, and fill each cell (initially 0) by + computing appropriate values from A and B. + + chks.length == B.numCols() + */ + new MRTask() { + public void map(Chunk chks[]) { + int chunk_size = chks [0] .len(); + long start = chks [0] .start(); + long A_rows = A.numRows(); + Vec A_vecs[] = A.vecs(); + Vec B_vecs[] = B.vecs(); + + for (int c = 0; c < chks.length; c++) { + for (int r = 0; r < chunk_size; r++) { + double v = 0; + for (long i = 0; i < A_rows; i++) { + v += (A_vecs [(int)(start+r)] .at * B_vecs [c] .at ); — End diff – Here and elsewhere. Operator spacing style. please use autoformatting features in idea.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15654625

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/Atx.java —
        @@ -0,0 +1,76 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import org.apache.mahout.math.Vector;
        +import org.apache.mahout.math.DenseVector;
        +import org.apache.mahout.math.Matrix;
        +import org.apache.mahout.math.DenseMatrix;
        +import org.apache.mahout.h2obindings.H2OHelper;
        +import org.apache.mahout.h2obindings.drm.H2OBCast;
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +import water.util.ArrayUtils;
        +
        +public class Atx {
        + /* Calculate A'x (where x is an in-core Vector) */
        + public static H2ODrm Atx(H2ODrm DrmA, Vector x) {
        + Frame A = DrmA.frame;
        + final H2OBCast<Vector> bx = new H2OBCast<Vector>;
        +
        + /* A'x is computed into _atx[] with an MRTask on A (with
        + x available as a Broadcast
        +
        + x.size() == A.numRows()
        + _atx.length == chks.length == A.numCols()
        + */
        + class MRTaskAtx extends MRTask<MRTaskAtx> {
        + double _atx[];
        — End diff –

        Mahout doesn't use underline prefixes for class attributes. We follow standard Sun style conventions as far as java code is concerned.

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15654625 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/Atx.java — @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import org.apache.mahout.math.Vector; +import org.apache.mahout.math.DenseVector; +import org.apache.mahout.math.Matrix; +import org.apache.mahout.math.DenseMatrix; +import org.apache.mahout.h2obindings.H2OHelper; +import org.apache.mahout.h2obindings.drm.H2OBCast; +import org.apache.mahout.h2obindings.drm.H2ODrm; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; +import water.util.ArrayUtils; + +public class Atx { + /* Calculate A'x (where x is an in-core Vector) */ + public static H2ODrm Atx(H2ODrm DrmA, Vector x) { + Frame A = DrmA.frame; + final H2OBCast<Vector> bx = new H2OBCast<Vector> ; + + /* A'x is computed into _atx[] with an MRTask on A (with + x available as a Broadcast + + x.size() == A.numRows() + _atx.length == chks.length == A.numCols() + */ + class MRTaskAtx extends MRTask<MRTaskAtx> { + double _atx[]; — End diff – Mahout doesn't use underline prefixes for class attributes. We follow standard Sun style conventions as far as java code is concerned.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15654715

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java —
        @@ -0,0 +1,83 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import scala.collection.immutable.Range;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +import water.parser.ValueString;
        +
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +public class RowRange {
        + /* Filter operation */
        + public static H2ODrm RowRange(H2ODrm DrmA, final Range R) {
        + Frame A = DrmA.frame;
        + Vec keys = DrmA.keys;
        +
        + /* Run a filtering MRTask on A. If row number falls within R.start() and
        + R.end(), then the row makes it into the output
        + */
        + Frame Arr = new MRTask() {
        + public void map(Chunk chks[], NewChunk ncs[]) {
        + int chunk_size = chks[0].len();
        + long chunk_start = chks[0].start();
        +
        + /* First check if the entire chunk even overlaps with R */
        + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start())
        + return;
        +
        + /* This chunk overlaps, filter out just the overlapping rows */
        + for (int r = 0; r < chunk_size; r++) {
        + if (!R.contains (chunk_start + r))
        — End diff –

        spacing

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15654715 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java — @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import scala.collection.immutable.Range; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; +import water.parser.ValueString; + +import org.apache.mahout.h2obindings.drm.H2ODrm; + +public class RowRange { + /* Filter operation */ + public static H2ODrm RowRange(H2ODrm DrmA, final Range R) { + Frame A = DrmA.frame; + Vec keys = DrmA.keys; + + /* Run a filtering MRTask on A. If row number falls within R.start() and + R.end(), then the row makes it into the output + */ + Frame Arr = new MRTask() { + public void map(Chunk chks[], NewChunk ncs[]) { + int chunk_size = chks [0] .len(); + long chunk_start = chks [0] .start(); + + /* First check if the entire chunk even overlaps with R */ + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start()) + return; + + /* This chunk overlaps, filter out just the overlapping rows */ + for (int r = 0; r < chunk_size; r++) { + if (!R.contains (chunk_start + r)) — End diff – spacing
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15654963

        — Diff: h2o/src/test/scala/org/apache/mahout/math/decompositions/DistributedDecompositionsSuite.scala —
        @@ -0,0 +1,34 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.math.decompositions
        +
        +import org.apache.mahout.math._
        +import drm._
        +import scalabindings._
        +import RLikeOps._
        +import RLikeDrmOps._
        +import org.apache.mahout.h2obindings._
        +import org.apache.mahout.common.RandomUtils
        +import scala.math._
        +import org.scalatest.

        {Matchers, FunSuite}

        +import org.apache.mahout.h2obindings.test.DistributedH2OSuite
        +
        +class DistributedDecompositionsSuite extends FunSuite with DistributedH2OSuite with DistributedDecompositionsSuiteBase {
        +
        — End diff –

        empty body should not be specified with {}

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15654963 — Diff: h2o/src/test/scala/org/apache/mahout/math/decompositions/DistributedDecompositionsSuite.scala — @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.math.decompositions + +import org.apache.mahout.math._ +import drm._ +import scalabindings._ +import RLikeOps._ +import RLikeDrmOps._ +import org.apache.mahout.h2obindings._ +import org.apache.mahout.common.RandomUtils +import scala.math._ +import org.scalatest. {Matchers, FunSuite} +import org.apache.mahout.h2obindings.test.DistributedH2OSuite + +class DistributedDecompositionsSuite extends FunSuite with DistributedH2OSuite with DistributedDecompositionsSuiteBase { + — End diff – empty body should not be specified with {}
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15655025

        — Diff: h2o/src/test/scala/org/apache/mahout/h2obindings/ops/AtASuite.scala —
        @@ -0,0 +1,50 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops
        +
        +import org.scalatest.FunSuite
        +import org.apache.mahout.h2obindings.test.DistributedH2OSuite
        +import org.apache.mahout.math.scalabindings._
        +import org.apache.mahout.math.drm._
        +import org.apache.mahout.h2obindings._
        +import org.apache.mahout.h2obindings.drm._
        +import RLikeOps._
        +import RLikeDrmOps._
        +import org.apache.mahout.math.drm._
        +
        +/** Tests for

        {@link XtX}

        */
        +class AtASuite extends FunSuite with DistributedH2OSuite {
        +
        + test("AtA slim")

        { + + val inCoreA = dense((1, 2), (2, 3)) + val drmA = drmParallelize(inCoreA) + + val M = drmA.t %*% drmA + val inCoreAtA = M.collect + println(inCoreAtA) + + val expectedAtA = inCoreA.t %*% inCoreA + println(expectedAtA) + + assert(expectedAtA === inCoreAtA) + + }

        +
        +
        — End diff –

        remove extra line

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15655025 — Diff: h2o/src/test/scala/org/apache/mahout/h2obindings/ops/AtASuite.scala — @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops + +import org.scalatest.FunSuite +import org.apache.mahout.h2obindings.test.DistributedH2OSuite +import org.apache.mahout.math.scalabindings._ +import org.apache.mahout.math.drm._ +import org.apache.mahout.h2obindings._ +import org.apache.mahout.h2obindings.drm._ +import RLikeOps._ +import RLikeDrmOps._ +import org.apache.mahout.math.drm._ + +/** Tests for {@link XtX} */ +class AtASuite extends FunSuite with DistributedH2OSuite { + + test("AtA slim") { + + val inCoreA = dense((1, 2), (2, 3)) + val drmA = drmParallelize(inCoreA) + + val M = drmA.t %*% drmA + val inCoreAtA = M.collect + println(inCoreAtA) + + val expectedAtA = inCoreA.t %*% inCoreA + println(expectedAtA) + + assert(expectedAtA === inCoreAtA) + + } + + — End diff – remove extra line
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15655056

        — Diff: h2o/src/test/scala/org/apache/mahout/h2obindings/ops/AtSuite.scala —
        @@ -0,0 +1,46 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops
        +
        +import org.scalatest.FunSuite
        +import org.apache.mahout.h2obindings.test.DistributedH2OSuite
        +import org.apache.mahout.math.scalabindings._
        +import org.apache.mahout.math.drm._
        +import org.apache.mahout.h2obindings._
        +import org.apache.mahout.h2obindings.drm._
        +import RLikeOps._
        +import RLikeDrmOps._
        +import org.apache.mahout.math.drm._
        +
        +/** Tests for A' algorithms */
        +class AtSuite extends FunSuite with DistributedH2OSuite {
        +
        + test("At") {
        + val inCoreA = dense((1, 2, 3), (2, 3, 4), (3, 4, 5))
        + val A = drmParallelize(m = inCoreA, numPartitions = 2)
        +
        + val AtDrm = A.t
        + val inCoreAt = AtDrm.collect
        + val inCoreControlAt = inCoreA.t
        +
        + println(inCoreAt)
        + assert((inCoreAt - inCoreControlAt).norm < 1E-5)
        +
        — End diff –

        remove extra line please (probably in prototype of this test, too)

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15655056 — Diff: h2o/src/test/scala/org/apache/mahout/h2obindings/ops/AtSuite.scala — @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops + +import org.scalatest.FunSuite +import org.apache.mahout.h2obindings.test.DistributedH2OSuite +import org.apache.mahout.math.scalabindings._ +import org.apache.mahout.math.drm._ +import org.apache.mahout.h2obindings._ +import org.apache.mahout.h2obindings.drm._ +import RLikeOps._ +import RLikeDrmOps._ +import org.apache.mahout.math.drm._ + +/** Tests for A' algorithms */ +class AtSuite extends FunSuite with DistributedH2OSuite { + + test("At") { + val inCoreA = dense((1, 2, 3), (2, 3, 4), (3, 4, 5)) + val A = drmParallelize(m = inCoreA, numPartitions = 2) + + val AtDrm = A.t + val inCoreAt = AtDrm.collect + val inCoreControlAt = inCoreA.t + + println(inCoreAt) + assert((inCoreAt - inCoreControlAt).norm < 1E-5) + — End diff – remove extra line please (probably in prototype of this test, too)
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15655171

        — Diff: pom.xml —
        @@ -110,6 +110,7 @@
        <scala.major>2.10</scala.major>
        <scala.version>2.10.3</scala.version>
        <spark.version>0.9.1</spark.version>
        + <h2o.version>0.1.1-SNAPSHOT</h2o.version>
        — End diff –

        This -SNAPSHOT dependency should be fixed ASAP.

        First, we cannot even release with snapshot dependencies. Second, the artifact commit level is undefined with snapshots, may cause problems for people trying to compile later and having different commit levels of this artifact in the cache.

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15655171 — Diff: pom.xml — @@ -110,6 +110,7 @@ <scala.major>2.10</scala.major> <scala.version>2.10.3</scala.version> <spark.version>0.9.1</spark.version> + <h2o.version>0.1.1-SNAPSHOT</h2o.version> — End diff – This -SNAPSHOT dependency should be fixed ASAP. First, we cannot even release with snapshot dependencies. Second, the artifact commit level is undefined with snapshots, may cause problems for people trying to compile later and having different commit levels of this artifact in the cache.
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r15657066

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java —
        @@ -0,0 +1,68 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import org.apache.mahout.h2obindings.H2OHelper;
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +
        +public class AewScalar {
        + /* Element-wise DRM-DRM operations */
        + public static H2ODrm AewScalar(H2ODrm DrmA, final double s, final String op) {
        — End diff –

        here and elsewhere: camel case for variables is standard in Sun style conventions, i.e. `drmA`, not `DrmA`

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r15657066 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java — @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import org.apache.mahout.h2obindings.H2OHelper; +import org.apache.mahout.h2obindings.drm.H2ODrm; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; + +public class AewScalar { + /* Element-wise DRM-DRM operations */ + public static H2ODrm AewScalar(H2ODrm DrmA, final double s, final String op) { — End diff – here and elsewhere: camel case for variables is standard in Sun style conventions, i.e. `drmA`, not `DrmA`
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50795818

        Is there a code style doc used by mahout? I dont use idea (just emacs)

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50795818 Is there a code style doc used by mahout? I dont use idea (just emacs)
        Hide
        ASF GitHub Bot added a comment -

        Github user dlyubimov commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50797941

        there used to be a guide on old website... but i guess it got axed. it
        basically said it is Sun style + 120 character line width constraint. I
        think there also were templates for Eclipse somewhere.

        There's also checkstyle maven plugin tuned up to report those (i remember
        people were making me to run it and make sure every one of those went away)
        but my take the checkstyle currently is tuned very aggressively. My minimum
        hygiene thing is line width, indentations, naming convention and operator
        spacing (out of which, IDE can take care of everything). With Sean's
        departure, out standards on style are nowhere where they used to be.

        in scala, style is still evolving, but I pushed a few things in particular.
        Baseline is to follow the Spark code style (and btw they are very strict
        about it; e..g they insist that i write comments which are starting with
        capital letter and aligned at 100th character on the right, which IDE
        cannot do automatically). We also were discussing closure styles elsewhere.

        On Thu, Jul 31, 2014 at 11:04 AM, Anand Avati <notifications@github.com>
        wrote:

        > Is there a code style doc used by mahout? I dont use idea (just emacs)
        >
        > —
        > Reply to this email directly or view it on GitHub
        > <https://github.com/apache/mahout/pull/21#issuecomment-50795818>.
        >

        Show
        ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50797941 there used to be a guide on old website... but i guess it got axed. it basically said it is Sun style + 120 character line width constraint. I think there also were templates for Eclipse somewhere. There's also checkstyle maven plugin tuned up to report those (i remember people were making me to run it and make sure every one of those went away) but my take the checkstyle currently is tuned very aggressively. My minimum hygiene thing is line width, indentations, naming convention and operator spacing (out of which, IDE can take care of everything). With Sean's departure, out standards on style are nowhere where they used to be. in scala, style is still evolving, but I pushed a few things in particular. Baseline is to follow the Spark code style (and btw they are very strict about it; e..g they insist that i write comments which are starting with capital letter and aligned at 100th character on the right, which IDE cannot do automatically). We also were discussing closure styles elsewhere. On Thu, Jul 31, 2014 at 11:04 AM, Anand Avati <notifications@github.com> wrote: > Is there a code style doc used by mahout? I dont use idea (just emacs) > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/21#issuecomment-50795818 >. >
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50800742

        Where can I find the Sun coding style? Is it this - http://www.oracle.com/technetwork/java/codeconventions-150003.pdf ?

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50800742 Where can I find the Sun coding style? Is it this - http://www.oracle.com/technetwork/java/codeconventions-150003.pdf ?
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50802726

        There's a link to sun coding conventions on the how to contribute page:

        http://mahout.apache.org/developers/how-to-contribute.html

        From the "conventions" link under "making changes" #4.

        Sent from my Verizon Wireless 4G LTE smartphone

        <div>-------- Original message --------</div><div>From: Anand Avati <notifications@github.com> </div><div>Date:07/31/2014 2:40 PM (GMT-05:00) </div><div>To: apache/mahout <mahout@noreply.github.com> </div><div>Cc: Andrew Palumbo <ap.dev@outlook.com> </div><div>Subject: Re: [mahout] MAHOUT-1500 H20 (#21) </div><div>
        </div>Where can I find the Sun coding style? Is it this - http://www.oracle.com/technetwork/java/codeconventions-150003.pdf ?


        Reply to this email directly or view it on GitHub.

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50802726 There's a link to sun coding conventions on the how to contribute page: http://mahout.apache.org/developers/how-to-contribute.html From the "conventions" link under "making changes" #4. Sent from my Verizon Wireless 4G LTE smartphone <div>-------- Original message --------</div><div>From: Anand Avati <notifications@github.com> </div><div>Date:07/31/2014 2:40 PM (GMT-05:00) </div><div>To: apache/mahout <mahout@noreply.github.com> </div><div>Cc: Andrew Palumbo <ap.dev@outlook.com> </div><div>Subject: Re: [mahout] MAHOUT-1500 H20 (#21) </div><div> </div>Where can I find the Sun coding style? Is it this - http://www.oracle.com/technetwork/java/codeconventions-150003.pdf ? — Reply to this email directly or view it on GitHub.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-50847925

        @dlyubimov the previous batch of commits address the review comments. Dependency on h2o-core SNAPSHOT is replaced with a published RELEASE, and all the code changes have been done.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-50847925 @dlyubimov the previous batch of commits address the review comments. Dependency on h2o-core SNAPSHOT is replaced with a published RELEASE, and all the code changes have been done.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-51826713

        ping.

        Any progress?

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-51826713 ping. Any progress?
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-51835301

        resolved pom.xml merge conflict (spark/scala version update)

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-51835301 resolved pom.xml merge conflict (spark/scala version update)
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16274064

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java —
        @@ -0,0 +1,83 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import scala.collection.immutable.Range;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +import water.parser.ValueString;
        +
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +public class RowRange {
        + /* Filter operation */
        + public static H2ODrm RowRange(H2ODrm drmA, final Range R) {
        + Frame A = drmA.frame;
        + Vec keys = drmA.keys;
        +
        + /* Run a filtering MRTask on A. If row number falls within R.start() and
        + R.end(), then the row makes it into the output
        + */
        + Frame Arr = new MRTask() {
        + public void map(Chunk chks[], NewChunk ncs[]) {
        + int chunk_size = chks[0].len();
        + long chunk_start = chks[0].start();
        +
        + /* First check if the entire chunk even overlaps with R */
        + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start())
        + return;
        +
        + /* This chunk overlaps, filter out just the overlapping rows */
        + for (int r = 0; r < chunk_size; r++) {
        + if (!R.contains(chunk_start + r))
        + continue;
        — End diff –

        if statment needs braces

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16274064 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java — @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import scala.collection.immutable.Range; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; +import water.parser.ValueString; + +import org.apache.mahout.h2obindings.drm.H2ODrm; + +public class RowRange { + /* Filter operation */ + public static H2ODrm RowRange(H2ODrm drmA, final Range R) { + Frame A = drmA.frame; + Vec keys = drmA.keys; + + /* Run a filtering MRTask on A. If row number falls within R.start() and + R.end(), then the row makes it into the output + */ + Frame Arr = new MRTask() { + public void map(Chunk chks[], NewChunk ncs[]) { + int chunk_size = chks [0] .len(); + long chunk_start = chks [0] .start(); + + /* First check if the entire chunk even overlaps with R */ + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start()) + return; + + /* This chunk overlaps, filter out just the overlapping rows */ + for (int r = 0; r < chunk_size; r++) { + if (!R.contains(chunk_start + r)) + continue; — End diff – if statment needs braces
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16274105

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java —
        @@ -0,0 +1,83 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import scala.collection.immutable.Range;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +import water.parser.ValueString;
        +
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +public class RowRange {
        + /* Filter operation */
        + public static H2ODrm RowRange(H2ODrm drmA, final Range R) {
        + Frame A = drmA.frame;
        + Vec keys = drmA.keys;
        +
        + /* Run a filtering MRTask on A. If row number falls within R.start() and
        + R.end(), then the row makes it into the output
        + */
        + Frame Arr = new MRTask() {
        + public void map(Chunk chks[], NewChunk ncs[]) {
        + int chunk_size = chks[0].len();
        + long chunk_start = chks[0].start();
        +
        + /* First check if the entire chunk even overlaps with R */
        + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start())
        + return;
        +
        + /* This chunk overlaps, filter out just the overlapping rows */
        + for (int r = 0; r < chunk_size; r++)

        { + if (!R.contains(chunk_start + r)) + continue; + + for (int c = 0; c < chks.length; c++) + ncs[c].addNum(chks[c].at0(r)); + }

        — End diff –

        for loop needs braces

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16274105 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java — @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import scala.collection.immutable.Range; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; +import water.parser.ValueString; + +import org.apache.mahout.h2obindings.drm.H2ODrm; + +public class RowRange { + /* Filter operation */ + public static H2ODrm RowRange(H2ODrm drmA, final Range R) { + Frame A = drmA.frame; + Vec keys = drmA.keys; + + /* Run a filtering MRTask on A. If row number falls within R.start() and + R.end(), then the row makes it into the output + */ + Frame Arr = new MRTask() { + public void map(Chunk chks[], NewChunk ncs[]) { + int chunk_size = chks [0] .len(); + long chunk_start = chks [0] .start(); + + /* First check if the entire chunk even overlaps with R */ + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start()) + return; + + /* This chunk overlaps, filter out just the overlapping rows */ + for (int r = 0; r < chunk_size; r++) { + if (!R.contains(chunk_start + r)) + continue; + + for (int c = 0; c < chks.length; c++) + ncs[c].addNum(chks[c].at0(r)); + } — End diff – for loop needs braces
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16274130

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java —
        @@ -0,0 +1,83 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import scala.collection.immutable.Range;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +import water.parser.ValueString;
        +
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +public class RowRange {
        + /* Filter operation */
        + public static H2ODrm RowRange(H2ODrm drmA, final Range R) {
        + Frame A = drmA.frame;
        + Vec keys = drmA.keys;
        +
        + /* Run a filtering MRTask on A. If row number falls within R.start() and
        + R.end(), then the row makes it into the output
        + */
        + Frame Arr = new MRTask() {
        + public void map(Chunk chks[], NewChunk ncs[]) {
        + int chunk_size = chks[0].len();
        + long chunk_start = chks[0].start();
        +
        + /* First check if the entire chunk even overlaps with R */
        + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start())
        + return;
        +
        + /* This chunk overlaps, filter out just the overlapping rows */
        + for (int r = 0; r < chunk_size; r++)

        { + if (!R.contains(chunk_start + r)) + continue; + + for (int c = 0; c < chks.length; c++) + ncs[c].addNum(chks[c].at0(r)); + }

        + }
        + }.doAll(A.numCols(), A).outputFrame(null, null);
        +
        + Vec Vrr = (keys == null) ? null : new MRTask() {
        + /* This is a String keyed DRM. Do the same thing as above,
        + but this time just one column of Strings.
        + */
        + public void map(Chunk chk, NewChunk nc) {
        + int chunk_size = chk.len();
        + long chunk_start = chk.start();
        + ValueString vstr = new ValueString();
        +
        + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start())
        + return;
        — End diff –

        if statement needs braces

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16274130 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java — @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import scala.collection.immutable.Range; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; +import water.parser.ValueString; + +import org.apache.mahout.h2obindings.drm.H2ODrm; + +public class RowRange { + /* Filter operation */ + public static H2ODrm RowRange(H2ODrm drmA, final Range R) { + Frame A = drmA.frame; + Vec keys = drmA.keys; + + /* Run a filtering MRTask on A. If row number falls within R.start() and + R.end(), then the row makes it into the output + */ + Frame Arr = new MRTask() { + public void map(Chunk chks[], NewChunk ncs[]) { + int chunk_size = chks [0] .len(); + long chunk_start = chks [0] .start(); + + /* First check if the entire chunk even overlaps with R */ + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start()) + return; + + /* This chunk overlaps, filter out just the overlapping rows */ + for (int r = 0; r < chunk_size; r++) { + if (!R.contains(chunk_start + r)) + continue; + + for (int c = 0; c < chks.length; c++) + ncs[c].addNum(chks[c].at0(r)); + } + } + }.doAll(A.numCols(), A).outputFrame(null, null); + + Vec Vrr = (keys == null) ? null : new MRTask() { + /* This is a String keyed DRM. Do the same thing as above, + but this time just one column of Strings. + */ + public void map(Chunk chk, NewChunk nc) { + int chunk_size = chk.len(); + long chunk_start = chk.start(); + ValueString vstr = new ValueString(); + + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start()) + return; — End diff – if statement needs braces
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16274188

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java —
        @@ -0,0 +1,83 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import scala.collection.immutable.Range;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +import water.parser.ValueString;
        +
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +public class RowRange {
        + /* Filter operation */
        + public static H2ODrm RowRange(H2ODrm drmA, final Range R) {
        + Frame A = drmA.frame;
        + Vec keys = drmA.keys;
        +
        + /* Run a filtering MRTask on A. If row number falls within R.start() and
        + R.end(), then the row makes it into the output
        + */
        + Frame Arr = new MRTask() {
        + public void map(Chunk chks[], NewChunk ncs[]) {
        + int chunk_size = chks[0].len();
        + long chunk_start = chks[0].start();
        +
        + /* First check if the entire chunk even overlaps with R */
        + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start())
        + return;
        +
        + /* This chunk overlaps, filter out just the overlapping rows */
        + for (int r = 0; r < chunk_size; r++)

        { + if (!R.contains(chunk_start + r)) + continue; + + for (int c = 0; c < chks.length; c++) + ncs[c].addNum(chks[c].at0(r)); + }

        + }
        + }.doAll(A.numCols(), A).outputFrame(null, null);
        +
        + Vec Vrr = (keys == null) ? null : new MRTask() {
        + /* This is a String keyed DRM. Do the same thing as above,
        + but this time just one column of Strings.
        + */
        + public void map(Chunk chk, NewChunk nc) {
        + int chunk_size = chk.len();
        + long chunk_start = chk.start();
        + ValueString vstr = new ValueString();
        +
        + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start())
        + return;
        +
        + for (int r = 0; r < chunk_size; r++) {
        + if (!R.contains(chunk_start + r))
        + continue;
        — End diff –

        if statement needs braces... a few more occurences i believe as well in other files

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16274188 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java — @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import scala.collection.immutable.Range; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; +import water.parser.ValueString; + +import org.apache.mahout.h2obindings.drm.H2ODrm; + +public class RowRange { + /* Filter operation */ + public static H2ODrm RowRange(H2ODrm drmA, final Range R) { + Frame A = drmA.frame; + Vec keys = drmA.keys; + + /* Run a filtering MRTask on A. If row number falls within R.start() and + R.end(), then the row makes it into the output + */ + Frame Arr = new MRTask() { + public void map(Chunk chks[], NewChunk ncs[]) { + int chunk_size = chks [0] .len(); + long chunk_start = chks [0] .start(); + + /* First check if the entire chunk even overlaps with R */ + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start()) + return; + + /* This chunk overlaps, filter out just the overlapping rows */ + for (int r = 0; r < chunk_size; r++) { + if (!R.contains(chunk_start + r)) + continue; + + for (int c = 0; c < chks.length; c++) + ncs[c].addNum(chks[c].at0(r)); + } + } + }.doAll(A.numCols(), A).outputFrame(null, null); + + Vec Vrr = (keys == null) ? null : new MRTask() { + /* This is a String keyed DRM. Do the same thing as above, + but this time just one column of Strings. + */ + public void map(Chunk chk, NewChunk nc) { + int chunk_size = chk.len(); + long chunk_start = chk.start(); + ValueString vstr = new ValueString(); + + if (chunk_start > R.end() || (chunk_start + chunk_size) < R.start()) + return; + + for (int r = 0; r < chunk_size; r++) { + if (!R.contains(chunk_start + r)) + continue; — End diff – if statement needs braces... a few more occurences i believe as well in other files
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16274291

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/Cbind.java —
        @@ -0,0 +1,94 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +
        +import org.apache.mahout.h2obindings.H2OHelper;
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +public class Cbind {
        + /* R's cbind like operator, on drmA and drmB */
        + public static H2ODrm Cbind(H2ODrm drmA, H2ODrm drmB)

        { + Frame fra = drmA.frame; + Vec keysa = drmA.keys; + Frame frb = drmB.frame; + Vec keysb = drmB.keys; + + /* If A and B are similarly partitioned, .. */ + if (fra.anyVec().group() == frb.anyVec().group()) + /* .. then, do a light weight zip() */ + return zip(fra, keysa, frb, keysb); + else + /* .. else, do a heavy weight join() which involves moving data over the wire */ + return join(fra, keysa, frb, keysb); + }

        +
        + /* Light weight zip(), no data movement */
        + private static H2ODrm zip(final Frame fra, final Vec keysa, final Frame frb, final Vec keysb) {
        + /* Create a new Vec[] to hold the concatenated list of A and B's column vectors */
        + Vec vecs[] = new Vec[fra.vecs().length + frb.vecs().length];
        + int d = 0;
        + /* fill A's column vectors */
        + for (Vec vfra : fra.vecs())
        + vecs[d++] = vfra;
        — End diff –

        for loop needs braces ... here and in a few other places

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16274291 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/Cbind.java — @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; + +import org.apache.mahout.h2obindings.H2OHelper; +import org.apache.mahout.h2obindings.drm.H2ODrm; + +public class Cbind { + /* R's cbind like operator, on drmA and drmB */ + public static H2ODrm Cbind(H2ODrm drmA, H2ODrm drmB) { + Frame fra = drmA.frame; + Vec keysa = drmA.keys; + Frame frb = drmB.frame; + Vec keysb = drmB.keys; + + /* If A and B are similarly partitioned, .. */ + if (fra.anyVec().group() == frb.anyVec().group()) + /* .. then, do a light weight zip() */ + return zip(fra, keysa, frb, keysb); + else + /* .. else, do a heavy weight join() which involves moving data over the wire */ + return join(fra, keysa, frb, keysb); + } + + /* Light weight zip(), no data movement */ + private static H2ODrm zip(final Frame fra, final Vec keysa, final Frame frb, final Vec keysb) { + /* Create a new Vec[] to hold the concatenated list of A and B's column vectors */ + Vec vecs[] = new Vec [fra.vecs().length + frb.vecs().length] ; + int d = 0; + /* fill A's column vectors */ + for (Vec vfra : fra.vecs()) + vecs [d++] = vfra; — End diff – for loop needs braces ... here and in a few other places
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16274324

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2OBCast.java —
        @@ -0,0 +1,93 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.drm;
        +
        +import org.apache.mahout.math.drm.BCast;
        +import org.apache.mahout.math.Matrix;
        +import org.apache.mahout.math.Vector;
        +import org.apache.mahout.math.MatrixWritable;
        +import org.apache.mahout.math.VectorWritable;
        +
        +import org.apache.hadoop.io.Writable;
        +
        +import java.io.Serializable;
        +import java.io.ByteArrayOutputStream;
        +import java.io.ByteArrayInputStream;
        +import java.io.ObjectOutputStream;
        +import java.io.ObjectInputStream;
        +
        +/* Handle Matrix and Vector separately so that we can live with
        + just importing MatrixWritable and VectorWritable.
        +*/
        +
        +public class H2OBCast<T> implements BCast<T>, Serializable {
        + transient T obj;
        + byte buf[];
        + boolean is_matrix;
        +
        + public H2OBCast(T o) {
        + obj = o;
        +
        + if (o instanceof Matrix)

        { + buf = serialize(new MatrixWritable((Matrix)o)); + is_matrix = true; + }

        else if (o instanceof Vector)

        { + buf = serialize(new VectorWritable((Vector)o)); + }

        else

        { + throw new IllegalArgumentException("Only Matrix or Vector supported for now"); + }

        + }
        +
        + public T value()

        { + if (obj == null) + obj = deserialize(buf); + return obj; + }

        +
        + private byte[] serialize(Writable w) {
        + ByteArrayOutputStream bos = new ByteArrayOutputStream();
        + try

        { + ObjectOutputStream oos = new ObjectOutputStream(bos); + w.write(oos); + oos.close(); + }

        catch (java.io.IOException e)

        { + return null; + }

        + return bos.toByteArray();
        + }
        +
        + private T deserialize(byte buf[]) {
        + T ret = null;
        + ByteArrayInputStream bis = new ByteArrayInputStream(buf);
        + try {
        + ObjectInputStream ois = new ObjectInputStream(bis);
        + if (is_matrix)

        { + MatrixWritable w = new MatrixWritable(); + w.readFields(ois); + ret = (T) w.get(); + }

        else

        { + VectorWritable w = new VectorWritable(); + w.readFields(ois); + ret = (T) w.get(); + }

        + } catch (java.io.IOException e) {
        + System.out.println("Caught exception: " + e);
        — End diff –

        indentation after catch

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16274324 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2OBCast.java — @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.drm; + +import org.apache.mahout.math.drm.BCast; +import org.apache.mahout.math.Matrix; +import org.apache.mahout.math.Vector; +import org.apache.mahout.math.MatrixWritable; +import org.apache.mahout.math.VectorWritable; + +import org.apache.hadoop.io.Writable; + +import java.io.Serializable; +import java.io.ByteArrayOutputStream; +import java.io.ByteArrayInputStream; +import java.io.ObjectOutputStream; +import java.io.ObjectInputStream; + +/* Handle Matrix and Vector separately so that we can live with + just importing MatrixWritable and VectorWritable. +*/ + +public class H2OBCast<T> implements BCast<T>, Serializable { + transient T obj; + byte buf[]; + boolean is_matrix; + + public H2OBCast(T o) { + obj = o; + + if (o instanceof Matrix) { + buf = serialize(new MatrixWritable((Matrix)o)); + is_matrix = true; + } else if (o instanceof Vector) { + buf = serialize(new VectorWritable((Vector)o)); + } else { + throw new IllegalArgumentException("Only Matrix or Vector supported for now"); + } + } + + public T value() { + if (obj == null) + obj = deserialize(buf); + return obj; + } + + private byte[] serialize(Writable w) { + ByteArrayOutputStream bos = new ByteArrayOutputStream(); + try { + ObjectOutputStream oos = new ObjectOutputStream(bos); + w.write(oos); + oos.close(); + } catch (java.io.IOException e) { + return null; + } + return bos.toByteArray(); + } + + private T deserialize(byte buf[]) { + T ret = null; + ByteArrayInputStream bis = new ByteArrayInputStream(buf); + try { + ObjectInputStream ois = new ObjectInputStream(bis); + if (is_matrix) { + MatrixWritable w = new MatrixWritable(); + w.readFields(ois); + ret = (T) w.get(); + } else { + VectorWritable w = new VectorWritable(); + w.readFields(ois); + ret = (T) w.get(); + } + } catch (java.io.IOException e) { + System.out.println("Caught exception: " + e); — End diff – indentation after catch
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16274365

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2OBCast.java —
        @@ -0,0 +1,93 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.drm;
        +
        +import org.apache.mahout.math.drm.BCast;
        +import org.apache.mahout.math.Matrix;
        +import org.apache.mahout.math.Vector;
        +import org.apache.mahout.math.MatrixWritable;
        +import org.apache.mahout.math.VectorWritable;
        +
        +import org.apache.hadoop.io.Writable;
        +
        +import java.io.Serializable;
        +import java.io.ByteArrayOutputStream;
        +import java.io.ByteArrayInputStream;
        +import java.io.ObjectOutputStream;
        +import java.io.ObjectInputStream;
        +
        +/* Handle Matrix and Vector separately so that we can live with
        + just importing MatrixWritable and VectorWritable.
        +*/
        +
        +public class H2OBCast<T> implements BCast<T>, Serializable {
        + transient T obj;
        + byte buf[];
        + boolean is_matrix;
        +
        + public H2OBCast(T o) {
        + obj = o;
        +
        + if (o instanceof Matrix)

        { + buf = serialize(new MatrixWritable((Matrix)o)); + is_matrix = true; + }

        else if (o instanceof Vector)

        { + buf = serialize(new VectorWritable((Vector)o)); + }

        else

        { + throw new IllegalArgumentException("Only Matrix or Vector supported for now"); + }

        + }
        +
        + public T value()

        { + if (obj == null) + obj = deserialize(buf); + return obj; + }

        +
        + private byte[] serialize(Writable w) {
        + ByteArrayOutputStream bos = new ByteArrayOutputStream();
        + try

        { + ObjectOutputStream oos = new ObjectOutputStream(bos); + w.write(oos); + oos.close(); + }

        catch (java.io.IOException e) {
        + return null;
        — End diff –

        indentation after catch here and a few more as well

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16274365 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2OBCast.java — @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.drm; + +import org.apache.mahout.math.drm.BCast; +import org.apache.mahout.math.Matrix; +import org.apache.mahout.math.Vector; +import org.apache.mahout.math.MatrixWritable; +import org.apache.mahout.math.VectorWritable; + +import org.apache.hadoop.io.Writable; + +import java.io.Serializable; +import java.io.ByteArrayOutputStream; +import java.io.ByteArrayInputStream; +import java.io.ObjectOutputStream; +import java.io.ObjectInputStream; + +/* Handle Matrix and Vector separately so that we can live with + just importing MatrixWritable and VectorWritable. +*/ + +public class H2OBCast<T> implements BCast<T>, Serializable { + transient T obj; + byte buf[]; + boolean is_matrix; + + public H2OBCast(T o) { + obj = o; + + if (o instanceof Matrix) { + buf = serialize(new MatrixWritable((Matrix)o)); + is_matrix = true; + } else if (o instanceof Vector) { + buf = serialize(new VectorWritable((Vector)o)); + } else { + throw new IllegalArgumentException("Only Matrix or Vector supported for now"); + } + } + + public T value() { + if (obj == null) + obj = deserialize(buf); + return obj; + } + + private byte[] serialize(Writable w) { + ByteArrayOutputStream bos = new ByteArrayOutputStream(); + try { + ObjectOutputStream oos = new ObjectOutputStream(bos); + w.write(oos); + oos.close(); + } catch (java.io.IOException e) { + return null; — End diff – indentation after catch here and a few more as well
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16274418

        — Diff: h2o/pom.xml —
        @@ -0,0 +1,252 @@
        +<?xml version="1.0" encoding="UTF-8"?>
        +
        +<!--
        + Licensed to the Apache Software Foundation (ASF) under one or more
        + contributor license agreements. See the NOTICE file distributed with
        + this work for additional information regarding copyright ownership.
        + The ASF licenses this file to You under the Apache License, Version 2.0
        + (the "License"); you may not use this file except in compliance with
        + the License. You may obtain a copy of the License at
        +
        + http://www.apache.org/licenses/LICENSE-2.0
        +
        + Unless required by applicable law or agreed to in writing, software
        + distributed under the License is distributed on an "AS IS" BASIS,
        + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + See the License for the specific language governing permissions and
        + limitations under the License.
        +-->
        +
        +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
        + <modelVersion>4.0.0</modelVersion>
        +
        + <parent>
        + <groupId>org.apache.mahout</groupId>
        + <artifactId>mahout</artifactId>
        + <version>1.0-SNAPSHOT</version>
        + <relativePath>../pom.xml</relativePath>
        + </parent>
        +
        + <artifactId>mahout-h2o</artifactId>
        + <name>Mahout H2O backend</name>
        + <description>
        + H2O Backend for Mahout DSL
        + </description>
        +
        + <packaging>jar</packaging>
        +
        + <repositories>
        + <repository>
        + <id>oss.sonatype.org</id>
        + <url>http://oss.sonatype.org/content/repositories</url>
        + <releases>
        + <enabled>true</enabled>
        + </releases>
        + <snapshots>
        + <enabled>true</enabled>
        + </snapshots>
        + </repository>
        + </repositories>
        +
        + <!-- this is needed for scalatest plugin until they publish it to central -->
        + <pluginRepositories>
        + <pluginRepository>
        + <id>sonatype</id>
        + <url>https://oss.sonatype.org/content/groups/public</url>
        + <releases>
        + <enabled>true</enabled>
        + </releases>
        + </pluginRepository>
        + </pluginRepositories>
        +
        + <build>
        + <defaultGoal>install</defaultGoal>
        +
        + <plugins>
        +
        + <plugin>
        + <groupId>org.codehaus.mojo</groupId>
        + <artifactId>build-helper-maven-plugin</artifactId>
        + <executions>
        + <execution>
        + <id>add-source</id>
        + <phase>generate-sources</phase>
        + <goals>
        + <goal>add-source</goal>
        + </goals>
        + <configuration>
        + <sources>
        + <source>$

        {project.build.directory}/generated-sources/mahout</source>
        + </sources>
        + </configuration>
        + </execution>
        + <execution>
        + <id>add-test-source</id>
        + <phase>generate-sources</phase>
        + <goals>
        + <goal>add-test-source</goal>
        + </goals>
        + <configuration>
        + <sources>
        + <source>${project.build.directory}

        /generated-test-sources/mahout</source>
        + </sources>
        + </configuration>
        + </execution>
        + </executions>
        + </plugin>
        +
        + <!-- create test jar so other modules can reuse the math test utility classes. -->
        + <plugin>
        + <groupId>org.apache.maven.plugins</groupId>
        + <artifactId>maven-jar-plugin</artifactId>
        + <executions>
        + <execution>
        + <goals>
        + <goal>test-jar</goal>
        + </goals>
        + <phase>package</phase>
        + </execution>
        + </executions>
        + </plugin>
        +
        + <plugin>
        + <artifactId>maven-assembly-plugin</artifactId>
        + <configuration>
        + <descriptorRefs>
        + <descriptorRef>jar-with-dependencies</descriptorRef>
        + </descriptorRefs>
        + <archive>
        + <manifest>
        + <mainClass>water.H2O</mainClass>
        + </manifest>
        + </archive>
        + </configuration>
        + <executions>
        + <execution>
        + <phase>package</phase>
        + <goals>
        + <goal>single</goal>
        + </goals>
        + </execution>
        + </executions>
        + </plugin>
        +
        + <plugin>
        + <artifactId>maven-javadoc-plugin</artifactId>
        + </plugin>
        +
        + <plugin>
        + <artifactId>maven-source-plugin</artifactId>
        + </plugin>
        +
        + <plugin>
        + <groupId>org.scala-tools</groupId>
        + <artifactId>maven-scala-plugin</artifactId>
        + <executions>
        + <execution>
        + <id>scala-compile-first</id>
        + <phase>process-resources</phase>
        + <goals>
        + <goal>add-source</goal>
        + <goal>compile</goal>
        + </goals>
        + </execution>
        + <execution>
        + <goals>
        + <goal>compile</goal>
        + <goal>testCompile</goal>
        + </goals>
        + </execution>
        + </executions>
        + <configuration>
        + <sourceDir>src/main/scala</sourceDir>
        + <jvmArgs>
        + <jvmArg>-Xms64m</jvmArg>
        + <jvmArg>-Xmx1024m</jvmArg>
        + </jvmArgs>
        + </configuration>
        + </plugin>
        +
        + <!--this is what scalatest recommends to do to enable scala tests -->
        +
        + <!-- disable surefire -->
        + <!-<plugin>->
        + <!-<groupId>org.apache.maven.plugins</groupId>->
        + <!-<artifactId>maven-surefire-plugin</artifactId>->
        + <!-<version>2.7</version>->
        + <!-<configuration>->
        + <!-<skipTests>true</skipTests>->
        + <!-</configuration>->
        + <!-</plugin>->
        + <!-- enable scalatest -->
        + <plugin>
        + <groupId>org.scalatest</groupId>
        + <artifactId>scalatest-maven-plugin</artifactId>
        + <version>1.0-M2</version>
        + <configuration>
        + <reportsDirectory>$

        {project.build.directory}

        /scalatest-reports</reportsDirectory>
        + <junitxml>.</junitxml>
        + <filereports>WDF TestSuite.txt</filereports>
        + </configuration>
        + <executions>
        + <execution>
        + <id>test</id>
        + <goals>
        + <goal>test</goal>
        + </goals>
        + </execution>
        + </executions>
        + </plugin>
        +
        + </plugins>
        + </build>
        +
        + <dependencies>
        +
        + <dependency>
        + <groupId>org.apache.mahout</groupId>
        + <artifactId>mahout-math-scala</artifactId>
        — End diff –

        need to update this to new mahout-math-scala artifact name

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16274418 — Diff: h2o/pom.xml — @@ -0,0 +1,252 @@ +<?xml version="1.0" encoding="UTF-8"?> + +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd "> + <modelVersion>4.0.0</modelVersion> + + <parent> + <groupId>org.apache.mahout</groupId> + <artifactId>mahout</artifactId> + <version>1.0-SNAPSHOT</version> + <relativePath>../pom.xml</relativePath> + </parent> + + <artifactId>mahout-h2o</artifactId> + <name>Mahout H2O backend</name> + <description> + H2O Backend for Mahout DSL + </description> + + <packaging>jar</packaging> + + <repositories> + <repository> + <id>oss.sonatype.org</id> + <url> http://oss.sonatype.org/content/repositories </url> + <releases> + <enabled>true</enabled> + </releases> + <snapshots> + <enabled>true</enabled> + </snapshots> + </repository> + </repositories> + + <!-- this is needed for scalatest plugin until they publish it to central --> + <pluginRepositories> + <pluginRepository> + <id>sonatype</id> + <url> https://oss.sonatype.org/content/groups/public </url> + <releases> + <enabled>true</enabled> + </releases> + </pluginRepository> + </pluginRepositories> + + <build> + <defaultGoal>install</defaultGoal> + + <plugins> + + <plugin> + <groupId>org.codehaus.mojo</groupId> + <artifactId>build-helper-maven-plugin</artifactId> + <executions> + <execution> + <id>add-source</id> + <phase>generate-sources</phase> + <goals> + <goal>add-source</goal> + </goals> + <configuration> + <sources> + <source>$ {project.build.directory}/generated-sources/mahout</source> + </sources> + </configuration> + </execution> + <execution> + <id>add-test-source</id> + <phase>generate-sources</phase> + <goals> + <goal>add-test-source</goal> + </goals> + <configuration> + <sources> + <source>${project.build.directory} /generated-test-sources/mahout</source> + </sources> + </configuration> + </execution> + </executions> + </plugin> + + <!-- create test jar so other modules can reuse the math test utility classes. --> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-jar-plugin</artifactId> + <executions> + <execution> + <goals> + <goal>test-jar</goal> + </goals> + <phase>package</phase> + </execution> + </executions> + </plugin> + + <plugin> + <artifactId>maven-assembly-plugin</artifactId> + <configuration> + <descriptorRefs> + <descriptorRef>jar-with-dependencies</descriptorRef> + </descriptorRefs> + <archive> + <manifest> + <mainClass>water.H2O</mainClass> + </manifest> + </archive> + </configuration> + <executions> + <execution> + <phase>package</phase> + <goals> + <goal>single</goal> + </goals> + </execution> + </executions> + </plugin> + + <plugin> + <artifactId>maven-javadoc-plugin</artifactId> + </plugin> + + <plugin> + <artifactId>maven-source-plugin</artifactId> + </plugin> + + <plugin> + <groupId>org.scala-tools</groupId> + <artifactId>maven-scala-plugin</artifactId> + <executions> + <execution> + <id>scala-compile-first</id> + <phase>process-resources</phase> + <goals> + <goal>add-source</goal> + <goal>compile</goal> + </goals> + </execution> + <execution> + <goals> + <goal>compile</goal> + <goal>testCompile</goal> + </goals> + </execution> + </executions> + <configuration> + <sourceDir>src/main/scala</sourceDir> + <jvmArgs> + <jvmArg>-Xms64m</jvmArg> + <jvmArg>-Xmx1024m</jvmArg> + </jvmArgs> + </configuration> + </plugin> + + <!--this is what scalatest recommends to do to enable scala tests --> + + <!-- disable surefire --> + <!- <plugin> -> + <!- <groupId>org.apache.maven.plugins</groupId> -> + <!- <artifactId>maven-surefire-plugin</artifactId> -> + <!- <version>2.7</version> -> + <!- <configuration> -> + <!- <skipTests>true</skipTests> -> + <!- </configuration> -> + <!- </plugin> -> + <!-- enable scalatest --> + <plugin> + <groupId>org.scalatest</groupId> + <artifactId>scalatest-maven-plugin</artifactId> + <version>1.0-M2</version> + <configuration> + <reportsDirectory>$ {project.build.directory} /scalatest-reports</reportsDirectory> + <junitxml>.</junitxml> + <filereports>WDF TestSuite.txt</filereports> + </configuration> + <executions> + <execution> + <id>test</id> + <goals> + <goal>test</goal> + </goals> + </execution> + </executions> + </plugin> + + </plugins> + </build> + + <dependencies> + + <dependency> + <groupId>org.apache.mahout</groupId> + <artifactId>mahout-math-scala</artifactId> — End diff – need to update this to new mahout-math-scala artifact name
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16274443

        — Diff: h2o/pom.xml —
        @@ -0,0 +1,252 @@
        +<?xml version="1.0" encoding="UTF-8"?>
        +
        +<!--
        + Licensed to the Apache Software Foundation (ASF) under one or more
        + contributor license agreements. See the NOTICE file distributed with
        + this work for additional information regarding copyright ownership.
        + The ASF licenses this file to You under the Apache License, Version 2.0
        + (the "License"); you may not use this file except in compliance with
        + the License. You may obtain a copy of the License at
        +
        + http://www.apache.org/licenses/LICENSE-2.0
        +
        + Unless required by applicable law or agreed to in writing, software
        + distributed under the License is distributed on an "AS IS" BASIS,
        + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + See the License for the specific language governing permissions and
        + limitations under the License.
        +-->
        +
        +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
        + <modelVersion>4.0.0</modelVersion>
        +
        + <parent>
        + <groupId>org.apache.mahout</groupId>
        + <artifactId>mahout</artifactId>
        + <version>1.0-SNAPSHOT</version>
        + <relativePath>../pom.xml</relativePath>
        + </parent>
        +
        + <artifactId>mahout-h2o</artifactId>
        + <name>Mahout H2O backend</name>
        + <description>
        + H2O Backend for Mahout DSL
        + </description>
        +
        + <packaging>jar</packaging>
        +
        + <repositories>
        + <repository>
        + <id>oss.sonatype.org</id>
        + <url>http://oss.sonatype.org/content/repositories</url>
        + <releases>
        + <enabled>true</enabled>
        + </releases>
        + <snapshots>
        + <enabled>true</enabled>
        + </snapshots>
        + </repository>
        + </repositories>
        +
        + <!-- this is needed for scalatest plugin until they publish it to central -->
        + <pluginRepositories>
        + <pluginRepository>
        + <id>sonatype</id>
        + <url>https://oss.sonatype.org/content/groups/public</url>
        + <releases>
        + <enabled>true</enabled>
        + </releases>
        + </pluginRepository>
        + </pluginRepositories>
        +
        + <build>
        + <defaultGoal>install</defaultGoal>
        +
        + <plugins>
        +
        + <plugin>
        + <groupId>org.codehaus.mojo</groupId>
        + <artifactId>build-helper-maven-plugin</artifactId>
        + <executions>
        + <execution>
        + <id>add-source</id>
        + <phase>generate-sources</phase>
        + <goals>
        + <goal>add-source</goal>
        + </goals>
        + <configuration>
        + <sources>
        + <source>$

        {project.build.directory}/generated-sources/mahout</source>
        + </sources>
        + </configuration>
        + </execution>
        + <execution>
        + <id>add-test-source</id>
        + <phase>generate-sources</phase>
        + <goals>
        + <goal>add-test-source</goal>
        + </goals>
        + <configuration>
        + <sources>
        + <source>${project.build.directory}

        /generated-test-sources/mahout</source>
        + </sources>
        + </configuration>
        + </execution>
        + </executions>
        + </plugin>
        +
        + <!-- create test jar so other modules can reuse the math test utility classes. -->
        + <plugin>
        + <groupId>org.apache.maven.plugins</groupId>
        + <artifactId>maven-jar-plugin</artifactId>
        + <executions>
        + <execution>
        + <goals>
        + <goal>test-jar</goal>
        + </goals>
        + <phase>package</phase>
        + </execution>
        + </executions>
        + </plugin>
        +
        + <plugin>
        + <artifactId>maven-assembly-plugin</artifactId>
        + <configuration>
        + <descriptorRefs>
        + <descriptorRef>jar-with-dependencies</descriptorRef>
        + </descriptorRefs>
        + <archive>
        + <manifest>
        + <mainClass>water.H2O</mainClass>
        + </manifest>
        + </archive>
        + </configuration>
        + <executions>
        + <execution>
        + <phase>package</phase>
        + <goals>
        + <goal>single</goal>
        + </goals>
        + </execution>
        + </executions>
        + </plugin>
        +
        + <plugin>
        + <artifactId>maven-javadoc-plugin</artifactId>
        + </plugin>
        +
        + <plugin>
        + <artifactId>maven-source-plugin</artifactId>
        + </plugin>
        +
        + <plugin>
        + <groupId>org.scala-tools</groupId>
        + <artifactId>maven-scala-plugin</artifactId>
        + <executions>
        + <execution>
        + <id>scala-compile-first</id>
        + <phase>process-resources</phase>
        + <goals>
        + <goal>add-source</goal>
        + <goal>compile</goal>
        + </goals>
        + </execution>
        + <execution>
        + <goals>
        + <goal>compile</goal>
        + <goal>testCompile</goal>
        + </goals>
        + </execution>
        + </executions>
        + <configuration>
        + <sourceDir>src/main/scala</sourceDir>
        + <jvmArgs>
        + <jvmArg>-Xms64m</jvmArg>
        + <jvmArg>-Xmx1024m</jvmArg>
        + </jvmArgs>
        + </configuration>
        + </plugin>
        +
        + <!--this is what scalatest recommends to do to enable scala tests -->
        +
        + <!-- disable surefire -->
        + <!-<plugin>->
        + <!-<groupId>org.apache.maven.plugins</groupId>->
        + <!-<artifactId>maven-surefire-plugin</artifactId>->
        + <!-<version>2.7</version>->
        + <!-<configuration>->
        + <!-<skipTests>true</skipTests>->
        + <!-</configuration>->
        + <!-</plugin>->
        + <!-- enable scalatest -->
        + <plugin>
        + <groupId>org.scalatest</groupId>
        + <artifactId>scalatest-maven-plugin</artifactId>
        + <version>1.0-M2</version>
        + <configuration>
        + <reportsDirectory>$

        {project.build.directory}

        /scalatest-reports</reportsDirectory>
        + <junitxml>.</junitxml>
        + <filereports>WDF TestSuite.txt</filereports>
        + </configuration>
        + <executions>
        + <execution>
        + <id>test</id>
        + <goals>
        + <goal>test</goal>
        + </goals>
        + </execution>
        + </executions>
        + </plugin>
        +
        + </plugins>
        + </build>
        +
        + <dependencies>
        +
        + <dependency>
        + <groupId>org.apache.mahout</groupId>
        + <artifactId>mahout-math-scala</artifactId>
        + <version>$

        {project.version}</version>
        + </dependency>
        +
        + <dependency>
        + <!-- for MatrixWritable and VectorWritable -->
        + <groupId>org.apache.mahout</groupId>
        + <artifactId>mahout-mrlegacy</artifactId>
        + <version>${project.version}

        </version>
        + </dependency>
        +
        + <dependency>
        + <groupId>org.apache.mahout</groupId>
        + <artifactId>mahout-math-scala</artifactId>
        + <classifier>tests</classifier>
        — End diff –

        need to update to new mahout-math-scala artifact name

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16274443 — Diff: h2o/pom.xml — @@ -0,0 +1,252 @@ +<?xml version="1.0" encoding="UTF-8"?> + +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd "> + <modelVersion>4.0.0</modelVersion> + + <parent> + <groupId>org.apache.mahout</groupId> + <artifactId>mahout</artifactId> + <version>1.0-SNAPSHOT</version> + <relativePath>../pom.xml</relativePath> + </parent> + + <artifactId>mahout-h2o</artifactId> + <name>Mahout H2O backend</name> + <description> + H2O Backend for Mahout DSL + </description> + + <packaging>jar</packaging> + + <repositories> + <repository> + <id>oss.sonatype.org</id> + <url> http://oss.sonatype.org/content/repositories </url> + <releases> + <enabled>true</enabled> + </releases> + <snapshots> + <enabled>true</enabled> + </snapshots> + </repository> + </repositories> + + <!-- this is needed for scalatest plugin until they publish it to central --> + <pluginRepositories> + <pluginRepository> + <id>sonatype</id> + <url> https://oss.sonatype.org/content/groups/public </url> + <releases> + <enabled>true</enabled> + </releases> + </pluginRepository> + </pluginRepositories> + + <build> + <defaultGoal>install</defaultGoal> + + <plugins> + + <plugin> + <groupId>org.codehaus.mojo</groupId> + <artifactId>build-helper-maven-plugin</artifactId> + <executions> + <execution> + <id>add-source</id> + <phase>generate-sources</phase> + <goals> + <goal>add-source</goal> + </goals> + <configuration> + <sources> + <source>$ {project.build.directory}/generated-sources/mahout</source> + </sources> + </configuration> + </execution> + <execution> + <id>add-test-source</id> + <phase>generate-sources</phase> + <goals> + <goal>add-test-source</goal> + </goals> + <configuration> + <sources> + <source>${project.build.directory} /generated-test-sources/mahout</source> + </sources> + </configuration> + </execution> + </executions> + </plugin> + + <!-- create test jar so other modules can reuse the math test utility classes. --> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-jar-plugin</artifactId> + <executions> + <execution> + <goals> + <goal>test-jar</goal> + </goals> + <phase>package</phase> + </execution> + </executions> + </plugin> + + <plugin> + <artifactId>maven-assembly-plugin</artifactId> + <configuration> + <descriptorRefs> + <descriptorRef>jar-with-dependencies</descriptorRef> + </descriptorRefs> + <archive> + <manifest> + <mainClass>water.H2O</mainClass> + </manifest> + </archive> + </configuration> + <executions> + <execution> + <phase>package</phase> + <goals> + <goal>single</goal> + </goals> + </execution> + </executions> + </plugin> + + <plugin> + <artifactId>maven-javadoc-plugin</artifactId> + </plugin> + + <plugin> + <artifactId>maven-source-plugin</artifactId> + </plugin> + + <plugin> + <groupId>org.scala-tools</groupId> + <artifactId>maven-scala-plugin</artifactId> + <executions> + <execution> + <id>scala-compile-first</id> + <phase>process-resources</phase> + <goals> + <goal>add-source</goal> + <goal>compile</goal> + </goals> + </execution> + <execution> + <goals> + <goal>compile</goal> + <goal>testCompile</goal> + </goals> + </execution> + </executions> + <configuration> + <sourceDir>src/main/scala</sourceDir> + <jvmArgs> + <jvmArg>-Xms64m</jvmArg> + <jvmArg>-Xmx1024m</jvmArg> + </jvmArgs> + </configuration> + </plugin> + + <!--this is what scalatest recommends to do to enable scala tests --> + + <!-- disable surefire --> + <!- <plugin> -> + <!- <groupId>org.apache.maven.plugins</groupId> -> + <!- <artifactId>maven-surefire-plugin</artifactId> -> + <!- <version>2.7</version> -> + <!- <configuration> -> + <!- <skipTests>true</skipTests> -> + <!- </configuration> -> + <!- </plugin> -> + <!-- enable scalatest --> + <plugin> + <groupId>org.scalatest</groupId> + <artifactId>scalatest-maven-plugin</artifactId> + <version>1.0-M2</version> + <configuration> + <reportsDirectory>$ {project.build.directory} /scalatest-reports</reportsDirectory> + <junitxml>.</junitxml> + <filereports>WDF TestSuite.txt</filereports> + </configuration> + <executions> + <execution> + <id>test</id> + <goals> + <goal>test</goal> + </goals> + </execution> + </executions> + </plugin> + + </plugins> + </build> + + <dependencies> + + <dependency> + <groupId>org.apache.mahout</groupId> + <artifactId>mahout-math-scala</artifactId> + <version>$ {project.version}</version> + </dependency> + + <dependency> + <!-- for MatrixWritable and VectorWritable --> + <groupId>org.apache.mahout</groupId> + <artifactId>mahout-mrlegacy</artifactId> + <version>${project.version} </version> + </dependency> + + <dependency> + <groupId>org.apache.mahout</groupId> + <artifactId>mahout-math-scala</artifactId> + <classifier>tests</classifier> — End diff – need to update to new mahout-math-scala artifact name
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16274540

        — Diff: h2o/src/test/scala/org/apache/mahout/h2obindings/test/LoggerConfiguration.scala —
        @@ -0,0 +1,13 @@
        +package org.apache.mahout.h2obindings.test
        +
        +import org.scalatest.Suite
        +import org.apache.log4j.

        {Level, Logger, BasicConfigurator}

        +
        +trait LoggerConfiguration extends org.apache.mahout.test.LoggerConfiguration {
        + this: Suite =>
        +
        + override protected def beforeAll(): Unit = {
        + super.beforeAll()
        — End diff –

        need to update this to:

        override protected def beforeAll(configMap: ConfigMap) {
        super.beforeAll(configMap)

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16274540 — Diff: h2o/src/test/scala/org/apache/mahout/h2obindings/test/LoggerConfiguration.scala — @@ -0,0 +1,13 @@ +package org.apache.mahout.h2obindings.test + +import org.scalatest.Suite +import org.apache.log4j. {Level, Logger, BasicConfigurator} + +trait LoggerConfiguration extends org.apache.mahout.test.LoggerConfiguration { + this: Suite => + + override protected def beforeAll(): Unit = { + super.beforeAll() — End diff – need to update this to: override protected def beforeAll(configMap: ConfigMap) { super.beforeAll(configMap)
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-52261287

        As far as I can tell, this is just waiting to be merged. The vote has passed in favor of merging. I've been looking at this a bit over the past week. I wrote some simple tests for Naive Bayes from M-1493 on top of this and found that it integrated very easily (as far as writing tests in Math-Scala and then extending them in h2o and spark test suites).

        I'm not familiar with the inner workings of h2o and am new to scala and the DSL, but the code looks good to me. From what I can see there are a couple of more (very minor) style points that i've noticed (see above comments). And a couple of updates that need to be made to get this working against the current master.

        My issue has been with the Java 1.7 h2o-core artifact. I've brought it up a couple of times, and it seems that its not as much of a problem as I'd originally thought. I am still a little concerned with tests will fail for someone running 1.6. Is there a way to get a 1.6 artifact in here? Please someone let me know if I'm being overly cautious here.

        Long story short- looks good to me: +1 from me on merging if we can get that artifact issue solved (or if it is really a non-issue).

        Looking back at the email archive over the past few months, I do share many of the concerns that have been brought up, Especially regarding documentation of spark/h20 supported algorithms, and think that we need to get that up quickly.

        Someone with a better working knowledge of h2o and scala/DSL may want to assign this and review it further and merge. If its just a question simply of needing someone to assign this to and merging it, I can do it.

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-52261287 As far as I can tell, this is just waiting to be merged. The vote has passed in favor of merging. I've been looking at this a bit over the past week. I wrote some simple tests for Naive Bayes from M-1493 on top of this and found that it integrated very easily (as far as writing tests in Math-Scala and then extending them in h2o and spark test suites). I'm not familiar with the inner workings of h2o and am new to scala and the DSL, but the code looks good to me. From what I can see there are a couple of more (very minor) style points that i've noticed (see above comments). And a couple of updates that need to be made to get this working against the current master. My issue has been with the Java 1.7 h2o-core artifact. I've brought it up a couple of times, and it seems that its not as much of a problem as I'd originally thought. I am still a little concerned with tests will fail for someone running 1.6. Is there a way to get a 1.6 artifact in here? Please someone let me know if I'm being overly cautious here. Long story short- looks good to me: +1 from me on merging if we can get that artifact issue solved (or if it is really a non-issue). Looking back at the email archive over the past few months, I do share many of the concerns that have been brought up, Especially regarding documentation of spark/h20 supported algorithms, and think that we need to get that up quickly. Someone with a better working knowledge of h2o and scala/DSL may want to assign this and review it further and merge. If its just a question simply of needing someone to assign this to and merging it, I can do it.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-52412203

        Addressed review comments from Andrew, except the comment on indentation of catch. Even though indentation around catch is not like "Java standard", it is consistent with the rest of Mahout code's style. Let me know if you still want to change it only in h2o module.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-52412203 Addressed review comments from Andrew, except the comment on indentation of catch. Even though indentation around catch is not like "Java standard", it is consistent with the rest of Mahout code's style. Let me know if you still want to change it only in h2o module.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-52860273

        PING.

        Requesting some attention to this PR.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-52860273 PING. Requesting some attention to this PR.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-53345246

        The latest push makes this PR runtime compatible with Java 1.6 (depends on h2o-core 0.1.5 which starts Java6 backward compat)

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-53345246 The latest push makes this PR runtime compatible with Java 1.6 (depends on h2o-core 0.1.5 which starts Java6 backward compat)
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-53359715

        Looks good- All my tests pass here with 1.6 and 1.7.

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-53359715 Looks good- All my tests pass here with 1.6 and 1.7.
        Hide
        Pat Ferrel added a comment -

        Andrew Palumbo are you planning to assign this to yourself and do the merge?

        Show
        Pat Ferrel added a comment - Andrew Palumbo are you planning to assign this to yourself and do the merge?
        Hide
        Andrew Palumbo added a comment -

        Pat Ferrel I do have a couple comments for anand, and need to look it over again - but yes- I can merge it. I could use some guidance though as far as pushing a new module. If someone could look over the h20/pom.xml

        https://github.com/avati/mahout/blob/MAHOUT-1500/h2o/pom.xml

        for me I'd appreciate it.

        Show
        Andrew Palumbo added a comment - Pat Ferrel I do have a couple comments for anand, and need to look it over again - but yes- I can merge it. I could use some guidance though as far as pushing a new module. If someone could look over the h20/pom.xml https://github.com/avati/mahout/blob/MAHOUT-1500/h2o/pom.xml for me I'd appreciate it.
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-53455542

        @avati - it looks like some of your changes from Dmitriy's style reviews have made it back into this branch could you please update those?

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-53455542 @avati - it looks like some of your changes from Dmitriy's style reviews have made it back into this branch could you please update those?
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-53476236

        @andrewpalumbo re-applied the commit. Not sure how it got missed! Thanks for pointing..

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-53476236 @andrewpalumbo re-applied the commit. Not sure how it got missed! Thanks for pointing..
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16739580

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java —
        @@ -0,0 +1,68 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import org.apache.mahout.h2obindings.H2OHelper;
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +
        +public class AewScalar {
        + /* Element-wise DRM-DRM operations */
        + public static H2ODrm AewScalar(H2ODrm DrmA, final double s, final String op) {
        — End diff –

        Possibly one more commit missing?

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16739580 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java — @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import org.apache.mahout.h2obindings.H2OHelper; +import org.apache.mahout.h2obindings.drm.H2ODrm; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; + +public class AewScalar { + /* Element-wise DRM-DRM operations */ + public static H2ODrm AewScalar(H2ODrm DrmA, final double s, final String op) { — End diff – Possibly one more commit missing?
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/21#discussion_r16741328

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java —
        @@ -0,0 +1,68 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.h2obindings.ops;
        +
        +import org.apache.mahout.h2obindings.H2OHelper;
        +import org.apache.mahout.h2obindings.drm.H2ODrm;
        +
        +import water.MRTask;
        +import water.fvec.Frame;
        +import water.fvec.Vec;
        +import water.fvec.Chunk;
        +import water.fvec.NewChunk;
        +
        +public class AewScalar {
        + /* Element-wise DRM-DRM operations */
        + public static H2ODrm AewScalar(H2ODrm DrmA, final double s, final String op) {
        — End diff –

        Oops, pushed the camelcase styling as well. Looks like I had accidentally overwrote a couple of commits when switching between workstation and laptop.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16741328 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java — @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import org.apache.mahout.h2obindings.H2OHelper; +import org.apache.mahout.h2obindings.drm.H2ODrm; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; + +public class AewScalar { + /* Element-wise DRM-DRM operations */ + public static H2ODrm AewScalar(H2ODrm DrmA, final double s, final String op) { — End diff – Oops, pushed the camelcase styling as well. Looks like I had accidentally overwrote a couple of commits when switching between workstation and laptop.
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/21#issuecomment-53492639

        Tests pass in distributed mode for me. Could someone please double check the h2o/pom.xml for me? I'm not sure if there's anything that needs to be added to not break the nightly build.

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-53492639 Tests pass in distributed mode for me. Could someone please double check the h2o/pom.xml for me? I'm not sure if there's anything that needs to be added to not break the nightly build.
        Hide
        Andrew Palumbo added a comment -

        Pat Ferrel barring any problems with the h20/pom.xml, I think this is good to go. I'd like to merge it. I'm unable to assign JIRA issues. Could you assign this to me?

        Show
        Andrew Palumbo added a comment - Pat Ferrel barring any problems with the h20/pom.xml, I think this is good to go. I'd like to merge it. I'm unable to assign JIRA issues. Could you assign this to me?
        Ted Dunning made changes -
        Assignee Ted Dunning [ tdunning ]
        Hide
        Ted Dunning added a comment -

        Andrew,

        Go ahead and do the merge without the assignment. I can't assign this to you for some JIRA config reason. I successfully assigned this to me, though, so I will chase down the config problem.

        Show
        Ted Dunning added a comment - Andrew, Go ahead and do the merge without the assignment. I can't assign this to you for some JIRA config reason. I successfully assigned this to me, though, so I will chase down the config problem.
        Ted Dunning made changes -
        Assignee Ted Dunning [ tdunning ] Andrew Palumbo [ andrew_palumbo ]
        Hide
        Andrew Palumbo added a comment -

        thanks Ted Dunning- It looks like its assigned to me now and i can change the assignee. Any thoughts on the h20/pom.xml? I only ask because I kind of remember the nightly build breaking around the time that the spark module was added, and having to run `mvn clean package install` for a few days while it was fixed. I'm not sure if this had anything to do with adding a new module or not- just wanted to double check.

        Appreciate it!

        Show
        Andrew Palumbo added a comment - thanks Ted Dunning - It looks like its assigned to me now and i can change the assignee. Any thoughts on the h20/pom.xml? I only ask because I kind of remember the nightly build breaking around the time that the spark module was added, and having to run `mvn clean package install` for a few days while it was fixed. I'm not sure if this had anything to do with adding a new module or not- just wanted to double check. Appreciate it!
        Hide
        Anand Avati added a comment -

        Andrew Palumbo, is it not possible to do a mock run to verify that definitively?

        Show
        Anand Avati added a comment - Andrew Palumbo , is it not possible to do a mock run to verify that definitively?
        Hide
        Anand Avati added a comment -

        Or optimistically merge, and fix up if things break with a specific error?

        Show
        Anand Avati added a comment - Or optimistically merge, and fix up if things break with a specific error?
        Hide
        Ted Dunning added a comment -


        optimism seems warranted. Worst case is a revert.

        Show
        Ted Dunning added a comment - optimism seems warranted. Worst case is a revert.
        Hide
        Andrew Palumbo added a comment -

        sounds good.

        Show
        Andrew Palumbo added a comment - sounds good.
        Hide
        ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/mahout/pull/21

        Show
        ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/mahout/pull/21
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2763 (See https://builds.apache.org/job/Mahout-Quality/2763/)
        MAHOUT-1500: H2O Integration (Anand Avati via apalumbo) closes apache/mahout#21 (ap.dev: rev f870a630291bd9d623b32c21f087ba19e69eb1fc)

        • h2o/src/main/scala/org/apache/mahout/h2obindings/package.scala
        • h2o/src/main/scala/org/apache/mahout/h2obindings/ops/MapBlockHelper.scala
        • h2o/pom.xml
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtB.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/At.java
        • h2o/src/test/scala/org/apache/mahout/h2obindings/drm/DrmLikeOpsSuite.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2ODrm.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Ax.java
        • CHANGELOG
        • h2o/src/test/scala/org/apache/mahout/h2obindings/test/DistributedH2OSuite.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/H2OHdfs.java
        • h2o/src/test/scala/org/apache/mahout/math/decompositions/DistributedDecompositionsSuite.scala
        • h2o/src/test/scala/org/apache/mahout/h2obindings/drm/DrmLikeSuite.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Par.java
        • h2o/src/test/scala/org/apache/mahout/h2obindings/test/LoggerConfiguration.scala
        • h2o/src/test/scala/org/apache/mahout/h2obindings/drm/RLikeDrmOpsSuite.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/H2OContext.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtA.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/MapBlock.java
        • bin/mahout
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewB.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java
        • pom.xml
        • h2o/src/main/scala/org/apache/mahout/h2obindings/H2ODistributedContext.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/H2OHelper.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2OBCast.java
        • h2o/src/test/scala/org/apache/mahout/h2obindings/ops/ABtSuite.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/H2OBlockMatrix.java
        • h2o/src/main/scala/org/apache/mahout/h2obindings/drm/CheckpointedDrmH2O.scala
        • h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala
        • h2o/src/test/scala/org/apache/mahout/h2obindings/ops/AewBSuite.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Cbind.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Atx.java
        • h2o/README.md
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/TimesRightMatrix.java
        • h2o/src/test/scala/org/apache/mahout/h2obindings/ops/AtSuite.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/ABt.java
        • h2o/src/test/scala/org/apache/mahout/h2obindings/ops/AtASuite.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Rbind.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2763 (See https://builds.apache.org/job/Mahout-Quality/2763/ ) MAHOUT-1500 : H2O Integration (Anand Avati via apalumbo) closes apache/mahout#21 (ap.dev: rev f870a630291bd9d623b32c21f087ba19e69eb1fc) h2o/src/main/scala/org/apache/mahout/h2obindings/package.scala h2o/src/main/scala/org/apache/mahout/h2obindings/ops/MapBlockHelper.scala h2o/pom.xml h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtB.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/At.java h2o/src/test/scala/org/apache/mahout/h2obindings/drm/DrmLikeOpsSuite.scala h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2ODrm.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Ax.java CHANGELOG h2o/src/test/scala/org/apache/mahout/h2obindings/test/DistributedH2OSuite.scala h2o/src/main/java/org/apache/mahout/h2obindings/H2OHdfs.java h2o/src/test/scala/org/apache/mahout/math/decompositions/DistributedDecompositionsSuite.scala h2o/src/test/scala/org/apache/mahout/h2obindings/drm/DrmLikeSuite.scala h2o/src/main/java/org/apache/mahout/h2obindings/ops/Par.java h2o/src/test/scala/org/apache/mahout/h2obindings/test/LoggerConfiguration.scala h2o/src/test/scala/org/apache/mahout/h2obindings/drm/RLikeDrmOpsSuite.scala h2o/src/main/java/org/apache/mahout/h2obindings/H2OContext.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtA.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/MapBlock.java bin/mahout h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewB.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java pom.xml h2o/src/main/scala/org/apache/mahout/h2obindings/H2ODistributedContext.scala h2o/src/main/java/org/apache/mahout/h2obindings/H2OHelper.java h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2OBCast.java h2o/src/test/scala/org/apache/mahout/h2obindings/ops/ABtSuite.scala h2o/src/main/java/org/apache/mahout/h2obindings/H2OBlockMatrix.java h2o/src/main/scala/org/apache/mahout/h2obindings/drm/CheckpointedDrmH2O.scala h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala h2o/src/test/scala/org/apache/mahout/h2obindings/ops/AewBSuite.scala h2o/src/main/java/org/apache/mahout/h2obindings/ops/Cbind.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Atx.java h2o/README.md h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/TimesRightMatrix.java h2o/src/test/scala/org/apache/mahout/h2obindings/ops/AtSuite.scala h2o/src/main/java/org/apache/mahout/h2obindings/ops/ABt.java h2o/src/test/scala/org/apache/mahout/h2obindings/ops/AtASuite.scala h2o/src/main/java/org/apache/mahout/h2obindings/ops/Rbind.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2765 (See https://builds.apache.org/job/Mahout-Quality/2765/)
        MAHOUT-1500: H2O Integration - style revisions (ap.dev: rev c96498680df551f2cbd2a4735d9408044a0c7bc3)

        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Cbind.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/TimesRightMatrix.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java
        • h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala
        • h2o/src/main/scala/org/apache/mahout/h2obindings/drm/CheckpointedDrmH2O.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewB.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtB.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/At.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/ABt.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Atx.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/H2OHdfs.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/H2OHelper.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtA.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Par.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Rbind.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2765 (See https://builds.apache.org/job/Mahout-Quality/2765/ ) MAHOUT-1500 : H2O Integration - style revisions (ap.dev: rev c96498680df551f2cbd2a4735d9408044a0c7bc3) h2o/src/main/java/org/apache/mahout/h2obindings/ops/Cbind.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/TimesRightMatrix.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala h2o/src/main/scala/org/apache/mahout/h2obindings/drm/CheckpointedDrmH2O.scala h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewB.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtB.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/At.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/ABt.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Atx.java h2o/src/main/java/org/apache/mahout/h2obindings/H2OHdfs.java h2o/src/main/java/org/apache/mahout/h2obindings/H2OHelper.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtA.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Par.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Rbind.java
        Hide
        Andrew Palumbo added a comment -

        Merged, and everything looks good!

        I'm trying to clean up some high priority warnings right now that jenkins is complaining about.

        java style issues eg: Method names should not start with capital letters:

        public static H2ODrm At(H2ODrm drmA)
        public static H2ODrm AtA(H2ODrm drmA)

        etc..

        any thoughts on a naming convention here?

        Show
        Andrew Palumbo added a comment - Merged, and everything looks good! I'm trying to clean up some high priority warnings right now that jenkins is complaining about. java style issues eg: Method names should not start with capital letters: public static H2ODrm At(H2ODrm drmA) public static H2ODrm AtA(H2ODrm drmA) etc.. any thoughts on a naming convention here?
        Hide
        Anand Avati added a comment -

        One option might be to rename all operator methods with a generic name, like "exec" (as done in the Spark module) and just have operator specific class name.

        I will create a PR with that change, unless someone has a different suggestion

        Show
        Anand Avati added a comment - One option might be to rename all operator methods with a generic name, like "exec" (as done in the Spark module) and just have operator specific class name. I will create a PR with that change, unless someone has a different suggestion
        Hide
        Andrew Palumbo added a comment -

        thanks anand- also need to change method names with underscores- I got most of them but didn't go through TimesRightMatrix.java.

        Show
        Andrew Palumbo added a comment - thanks anand- also need to change method names with underscores- I got most of them but didn't go through TimesRightMatrix.java.
        Hide
        ASF GitHub Bot added a comment -

        GitHub user avati opened a pull request:

        https://github.com/apache/mahout/pull/48

        MAHOUT-1500: function name fixes

        • rename operators to "exec"
        • remove underscore from method names

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/avati/mahout MAHOUT-1500a

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/mahout/pull/48.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #48


        commit ae766d34d4475f5ea5d9faddf861158587929b7f
        Author: Anand Avati <avati@redhat.com>
        Date: 2014-08-27T22:35:33Z

        MAHOUT-1500: rename operator methods per standards

        Signed-off-by: Anand Avati <avati@redhat.com>

        commit b6e8f31be71c1a2a5d0ee0c7f86b7050b18e2fdc
        Author: Anand Avati <avati@redhat.com>
        Date: 2014-08-27T22:39:14Z

        MAHOUT-1500: remove underscore in method names

        Signed-off-by: Anand Avati <avati@redhat.com>


        Show
        ASF GitHub Bot added a comment - GitHub user avati opened a pull request: https://github.com/apache/mahout/pull/48 MAHOUT-1500 : function name fixes rename operators to "exec" remove underscore from method names You can merge this pull request into a Git repository by running: $ git pull https://github.com/avati/mahout MAHOUT-1500 a Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/48.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #48 commit ae766d34d4475f5ea5d9faddf861158587929b7f Author: Anand Avati <avati@redhat.com> Date: 2014-08-27T22:35:33Z MAHOUT-1500 : rename operator methods per standards Signed-off-by: Anand Avati <avati@redhat.com> commit b6e8f31be71c1a2a5d0ee0c7f86b7050b18e2fdc Author: Anand Avati <avati@redhat.com> Date: 2014-08-27T22:39:14Z MAHOUT-1500 : remove underscore in method names Signed-off-by: Anand Avati <avati@redhat.com>
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/48#issuecomment-53772200

        @andrewpalumbo - does this look good?

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/48#issuecomment-53772200 @andrewpalumbo - does this look good?
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/48#issuecomment-53788269

        Looks good anand, thanks.

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/48#issuecomment-53788269 Looks good anand, thanks.
        Hide
        ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/mahout/pull/48

        Show
        ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/mahout/pull/48
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2769 (See https://builds.apache.org/job/Mahout-Quality/2769/)
        MAHOUT-1500: H2O Integration - more style revisions closes apache/mahout#48 (ap.dev: rev 03a5bb61ed56daccd207d7a255956e21612cf995)

        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/TimesRightMatrix.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/At.java
        • h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewB.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtB.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtA.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Cbind.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Ax.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/ABt.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Atx.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Rbind.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2769 (See https://builds.apache.org/job/Mahout-Quality/2769/ ) MAHOUT-1500 : H2O Integration - more style revisions closes apache/mahout#48 (ap.dev: rev 03a5bb61ed56daccd207d7a255956e21612cf995) h2o/src/main/java/org/apache/mahout/h2obindings/ops/TimesRightMatrix.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/At.java h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewB.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtB.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtA.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Cbind.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Ax.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/ABt.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Atx.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Rbind.java
        Hide
        Andrew Palumbo added a comment -

        Anand Avati- Looks good! Thanks for cleaning that up. I think that the only thing left to do is add in some java/scaladoc comments and we can close this up.

        Show
        Andrew Palumbo added a comment - Anand Avati - Looks good! Thanks for cleaning that up. I think that the only thing left to do is add in some java/scaladoc comments and we can close this up.
        Hide
        ASF GitHub Bot added a comment -

        GitHub user avati opened a pull request:

        https://github.com/apache/mahout/pull/50

        MAHOUT-1500: Code cleanup

        • Add javadoc and scaladoc comments.
        • Fix code comment style per standards.
        • Fix some more camelCase naming.

        Signed-off-by: Anand Avati <avati@redhat.com>

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/avati/mahout MAHOUT-1500doc

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/mahout/pull/50.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #50


        commit fdacd68c86b4b26e0d9affac500f35840ac99e8d
        Author: Anand Avati <avati@redhat.com>
        Date: 2014-09-03T01:04:10Z

        MAHOUT-1500: Code cleanup

        • Add javadoc and scaladoc comments.
        • Fix code comment style per standards.
        • Fix some more camelCase naming.

        Signed-off-by: Anand Avati <avati@redhat.com>


        Show
        ASF GitHub Bot added a comment - GitHub user avati opened a pull request: https://github.com/apache/mahout/pull/50 MAHOUT-1500 : Code cleanup Add javadoc and scaladoc comments. Fix code comment style per standards. Fix some more camelCase naming. Signed-off-by: Anand Avati <avati@redhat.com> You can merge this pull request into a Git repository by running: $ git pull https://github.com/avati/mahout MAHOUT-1500 doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/50.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #50 commit fdacd68c86b4b26e0d9affac500f35840ac99e8d Author: Anand Avati <avati@redhat.com> Date: 2014-09-03T01:04:10Z MAHOUT-1500 : Code cleanup Add javadoc and scaladoc comments. Fix code comment style per standards. Fix some more camelCase naming. Signed-off-by: Anand Avati <avati@redhat.com>
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/50#issuecomment-54366692

        @andrewpalumbo - Added scaladoc and javadoc comments. I have also included some variable renaming to replace underscore with camelcasing in the same commit (because the javadoc had to use the right parameter name etc.)

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/50#issuecomment-54366692 @andrewpalumbo - Added scaladoc and javadoc comments. I have also included some variable renaming to replace underscore with camelcasing in the same commit (because the javadoc had to use the right parameter name etc.)
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/50#issuecomment-54626604

        @avati thanks!

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/50#issuecomment-54626604 @avati thanks!
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/50#discussion_r17184390

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/H2OContext.java —
        @@ -19,13 +19,21 @@

        import water.H2O;

        +/**
        + * Context to an H2O Cloud.
        + */
        public class H2OContext {
        + /** Stores the name of the H2O Cloud. Typically a free form string. */
        String masterURL;
        — End diff –

        @avati - is there any need to store the masterURL? i dont see any usage of it.

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/50#discussion_r17184390 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/H2OContext.java — @@ -19,13 +19,21 @@ import water.H2O; +/** + * Context to an H2O Cloud. + */ public class H2OContext { + /** Stores the name of the H2O Cloud. Typically a free form string. */ String masterURL; — End diff – @avati - is there any need to store the masterURL? i dont see any usage of it.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on a diff in the pull request:

        https://github.com/apache/mahout/pull/50#discussion_r17191513

        — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/H2OContext.java —
        @@ -19,13 +19,21 @@

        import water.H2O;

        +/**
        + * Context to an H2O Cloud.
        + */
        public class H2OContext {
        + /** Stores the name of the H2O Cloud. Typically a free form string. */
        String masterURL;
        — End diff –

        @andrewpalumbo - probably no use of storing. I think I wasn't sure about that while coding. Let me remove it.

        Show
        ASF GitHub Bot added a comment - Github user avati commented on a diff in the pull request: https://github.com/apache/mahout/pull/50#discussion_r17191513 — Diff: h2o/src/main/java/org/apache/mahout/h2obindings/H2OContext.java — @@ -19,13 +19,21 @@ import water.H2O; +/** + * Context to an H2O Cloud. + */ public class H2OContext { + /** Stores the name of the H2O Cloud. Typically a free form string. */ String masterURL; — End diff – @andrewpalumbo - probably no use of storing. I think I wasn't sure about that while coding. Let me remove it.
        Hide
        ASF GitHub Bot added a comment -

        Github user andrewpalumbo commented on the pull request:

        https://github.com/apache/mahout/pull/50#issuecomment-54668342

        @avati - Don't worry about it- I've already made some changes- i'll just take it out. Thx.

        Show
        ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/50#issuecomment-54668342 @avati - Don't worry about it- I've already made some changes- i'll just take it out. Thx.
        Hide
        ASF GitHub Bot added a comment -

        Github user avati commented on the pull request:

        https://github.com/apache/mahout/pull/50#issuecomment-54669875

        @andrewpalumbo OK thanks

        Show
        ASF GitHub Bot added a comment - Github user avati commented on the pull request: https://github.com/apache/mahout/pull/50#issuecomment-54669875 @andrewpalumbo OK thanks
        Hide
        ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/mahout/pull/50

        Show
        ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/mahout/pull/50
        Hide
        Andrew Palumbo added a comment -

        thanks Anand Avati commited 1500doc with a few revisions.

        Show
        Andrew Palumbo added a comment - thanks Anand Avati commited 1500doc with a few revisions.
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2781 (See https://builds.apache.org/job/Mahout-Quality/2781/)
        MAHOUT-1500: Code cleanup and javadocs closes apache/mahout#50 (ap.dev: rev 2d1b0bf632724ceb091035582274201269cfe3e3)

        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Par.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/H2OBlockMatrix.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2OBCast.java
        • h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Rbind.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Ax.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/H2OHelper.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Cbind.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtA.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/At.java
        • h2o/src/main/scala/org/apache/mahout/h2obindings/drm/CheckpointedDrmH2O.scala
        • h2o/src/main/java/org/apache/mahout/h2obindings/H2OHdfs.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/ABt.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2ODrm.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/TimesRightMatrix.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/MapBlock.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/H2OContext.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewB.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtB.java
        • h2o/src/main/java/org/apache/mahout/h2obindings/ops/Atx.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2781 (See https://builds.apache.org/job/Mahout-Quality/2781/ ) MAHOUT-1500 : Code cleanup and javadocs closes apache/mahout#50 (ap.dev: rev 2d1b0bf632724ceb091035582274201269cfe3e3) h2o/src/main/java/org/apache/mahout/h2obindings/ops/Par.java h2o/src/main/java/org/apache/mahout/h2obindings/H2OBlockMatrix.java h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2OBCast.java h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala h2o/src/main/java/org/apache/mahout/h2obindings/ops/Rbind.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Ax.java h2o/src/main/java/org/apache/mahout/h2obindings/H2OHelper.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/RowRange.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Cbind.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtA.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/At.java h2o/src/main/scala/org/apache/mahout/h2obindings/drm/CheckpointedDrmH2O.scala h2o/src/main/java/org/apache/mahout/h2obindings/H2OHdfs.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/ABt.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java h2o/src/main/java/org/apache/mahout/h2obindings/drm/H2ODrm.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/TimesRightMatrix.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/MapBlock.java h2o/src/main/java/org/apache/mahout/h2obindings/H2OContext.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewB.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/AtB.java h2o/src/main/java/org/apache/mahout/h2obindings/ops/Atx.java
        Hide
        Andrew Palumbo added a comment -

        Looks good Anand, we'll probably want to update the scaladocs in the future, but I think this is done for now.

        Show
        Andrew Palumbo added a comment - Looks good Anand, we'll probably want to update the scaladocs in the future, but I think this is done for now.
        Hide
        Andrew Palumbo added a comment -

        Anand Avati Thanks alot for the great contribution!

        Show
        Andrew Palumbo added a comment - Anand Avati Thanks alot for the great contribution!
        Andrew Palumbo made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Suneel Marthi made changes -
        Fix Version/s 0.10.0 [ 12329709 ]
        Fix Version/s 1.0.0 [ 12316358 ]
        Suneel Marthi made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        157d 16h 30m 1 Andrew Palumbo 06/Sep/14 00:33
        Resolved Resolved Closed Closed
        219d 10h 47m 1 Suneel Marthi 13/Apr/15 11:21

          People

          • Assignee:
            Andrew Palumbo
            Reporter:
            Anand Avati
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development