Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1570

Adding support for Apache Flink as a backend for the Mahout DSL

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: 0.11.2
    • Fix Version/s: 0.12.0
    • Component/s: Flink
    • Labels:

      Description

      With the finalized abstraction of the Mahout DSL plans from the backend operations (MAHOUT-1529), it should be possible to integrate further backends for the Mahout DSL. Apache Flink would be a suitable candidate to act as a good execution backend.

      With respect to the implementation, the biggest difference between Spark and Flink at the moment is probably the incremental rollout of plans, which is triggered by Spark's actions and which is not supported by Flink yet. However, the Flink community is working on this issue. For the moment, it should be possible to circumvent this problem by writing intermediate results required by an action to HDFS and reading from there.

        Issue Links

          Activity

          Hide
          dlyubimov Dmitriy Lyubimov added a comment -

          +1 with !

          Show
          dlyubimov Dmitriy Lyubimov added a comment - +1 with !
          Hide
          smarthi Suneel Marthi added a comment -

          Sebastian, Till: Do we keep this open? It would be good to have Flink integration to Mahout IMO.

          Show
          smarthi Suneel Marthi added a comment - Sebastian, Till: Do we keep this open? It would be good to have Flink integration to Mahout IMO.
          Hide
          smarthi Suneel Marthi added a comment -

          Till Rohrmann Is the Jira description still valid? It still has references to old Stratosphere, please feel free to update.

          Show
          smarthi Suneel Marthi added a comment - Till Rohrmann Is the Jira description still valid? It still has references to old Stratosphere, please feel free to update.
          Hide
          till.rohrmann Till Rohrmann added a comment -

          I really like to see Flink support for the Mahout DSL, too. I'm optimistic that Flink in its current state has everything needed to fully support the Mahout DSL. Some time ago, I started an implementation but due to other tasks I haven't made much progress.

          I know that the TU Berlin wants to hire a master student who is supposed to take care of this implementation. But I don't know how long this will take.

          Show
          till.rohrmann Till Rohrmann added a comment - I really like to see Flink support for the Mahout DSL, too. I'm optimistic that Flink in its current state has everything needed to fully support the Mahout DSL. Some time ago, I started an implementation but due to other tasks I haven't made much progress. I know that the TU Berlin wants to hire a master student who is supposed to take care of this implementation. But I don't know how long this will take.
          Hide
          agrigorev Alexey Grigorev added a comment -

          Hey all, I'm the master student that TU Berlin hired to take care of this implementation.
          I've already started working on this, my fork is at https://github.com/alexeygrigorev/mahout

          Show
          agrigorev Alexey Grigorev added a comment - Hey all, I'm the master student that TU Berlin hired to take care of this implementation. I've already started working on this, my fork is at https://github.com/alexeygrigorev/mahout
          Hide
          ssc Sebastian Schelter added a comment -

          great to see this finally happening

          Show
          ssc Sebastian Schelter added a comment - great to see this finally happening
          Hide
          suneel.marthi@gmail.com Suneel Marthi added a comment -

          Welcome Alexy, great to see this happening, was told that u were already on this.

          Sent from my iPhone

          Show
          suneel.marthi@gmail.com Suneel Marthi added a comment - Welcome Alexy, great to see this happening, was told that u were already on this. Sent from my iPhone
          Hide
          dlyubimov Dmitriy Lyubimov added a comment -

          I think the code is very nice and streamlines some of by now convoluted history of general algebraic operators we did for Spark.

          There are few comments there i have, e.g. AewScalar (i.e. A + 5 kind of stuff) is now deprecated in favor of more generic AUnaryFunc which is part of #135.

          #135 is going in this week, so it would be nice to do a few adaptations to some changes on logical level and make a PR. without PR it is hard to make suggestions.

          If there are any Flink-specific instructions about how to try this out (apart from unit tests which i suppose run actual Flink in local mode), it would be very welcome to have that added to Mahout website. If there's more than one page, we can add a series of menus there.

          Thank you for doing this, this is great.

          Show
          dlyubimov Dmitriy Lyubimov added a comment - I think the code is very nice and streamlines some of by now convoluted history of general algebraic operators we did for Spark. There are few comments there i have, e.g. AewScalar (i.e. A + 5 kind of stuff) is now deprecated in favor of more generic AUnaryFunc which is part of #135. #135 is going in this week, so it would be nice to do a few adaptations to some changes on logical level and make a PR. without PR it is hard to make suggestions. If there are any Flink-specific instructions about how to try this out (apart from unit tests which i suppose run actual Flink in local mode), it would be very welcome to have that added to Mahout website. If there's more than one page, we can add a series of menus there. Thank you for doing this, this is great.
          Hide
          agrigorev Alexey Grigorev added a comment -

          Do you say that I should do a pull request with the code we have so far even though it's not complete?

          Yeah we'll also take care of documentation - after finishing with implementing I/O (reading and writing to hdfs), because otherwise it won't be very useful. But I think for the start one page should be enough.

          Show
          agrigorev Alexey Grigorev added a comment - Do you say that I should do a pull request with the code we have so far even though it's not complete? Yeah we'll also take care of documentation - after finishing with implementing I/O (reading and writing to hdfs), because otherwise it won't be very useful. But I think for the start one page should be enough.
          Hide
          ssc Sebastian Schelter added a comment -

          I don't think it makes sense to issue pull requests with unfinished code.

          Show
          ssc Sebastian Schelter added a comment - I don't think it makes sense to issue pull requests with unfinished code.
          Hide
          dlyubimov Dmitriy Lyubimov added a comment -

          I would like to emphasize that this work gets us all in Mahout totally excited.

          Yes, with the current state of the code I personally would encourage you to do a PR.

          I think Sebastian Schelter is missing the point that PR is tool of collaboration but not just "request to commit". Making suggestions on PR is no different than making suggestion on Jira, except it is more powerful since one could "point and elaborate" on the code. Nobody will rush to merge your work prematurely and until you say so, but it enables collaboration on the issue to a greater extent than just published github branch or Jira comments – IMO.

          Ultimately it is up to you to if you want to limit the scope of external input and collaboration before you present things to public. This is a totally valid approach if you want to stay 100% focused on your work.

          However it seems since you have published some ongoing work on this Jira, you are at least somewhat ready to interact with reviews and suggestions. If so, PR would alleviate that to a greater extent than this Jira.

          Another reason to do PR is if we eventually all agree that something needs to be changed, it is less work to do it when things are still fluid rather than when things are all nailed down.

          Show
          dlyubimov Dmitriy Lyubimov added a comment - I would like to emphasize that this work gets us all in Mahout totally excited. Yes, with the current state of the code I personally would encourage you to do a PR. I think Sebastian Schelter is missing the point that PR is tool of collaboration but not just "request to commit". Making suggestions on PR is no different than making suggestion on Jira, except it is more powerful since one could "point and elaborate" on the code. Nobody will rush to merge your work prematurely and until you say so, but it enables collaboration on the issue to a greater extent than just published github branch or Jira comments – IMO. Ultimately it is up to you to if you want to limit the scope of external input and collaboration before you present things to public. This is a totally valid approach if you want to stay 100% focused on your work. However it seems since you have published some ongoing work on this Jira, you are at least somewhat ready to interact with reviews and suggestions. If so, PR would alleviate that to a greater extent than this Jira. Another reason to do PR is if we eventually all agree that something needs to be changed, it is less work to do it when things are still fluid rather than when things are all nailed down.
          Hide
          agrigorev Alexey Grigorev added a comment -

          Ok I see, thanks for your input.

          I can definitely do that, but first I'd like to finish MAHOUT-1734 (I/O) so the DSL is usable for more that just parallelizing in-core matrices. In a day or two there'll be a commit in Flink which should make it possible.

          Show
          agrigorev Alexey Grigorev added a comment - Ok I see, thanks for your input. I can definitely do that, but first I'd like to finish MAHOUT-1734 (I/O) so the DSL is usable for more that just parallelizing in-core matrices. In a day or two there'll be a commit in Flink which should make it possible.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user alexeygrigorev opened a pull request:

          https://github.com/apache/mahout/pull/137

          MAHOUT-1570: Flink backend for Mahout DSL

          Implementation for MAHOUT-1570: Adding support for Apache Flink as a backend for the Mahout DSL

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/alexeygrigorev/mahout flink-binding

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/mahout/pull/137.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #137


          commit 37df8551c9c1b71eba251d545cc03b0dc37cd50a
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-04-09T09:54:01Z

          MAHOUT-1570: initial skeleton for Mahout DSL on Apache Flink

          commit e4b059346f59fccb9c1e7ffa3c0e7059802358b9
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-06-16T14:46:15Z

          MAHOUT-1712: Flink: Ax, At, Atx operators

          commit 685566a6f701d29775c685200db999edcc773776
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-05-05T18:05:21Z

          MAHOUT-1701: Flink: AtB implemented, ABt and AtA expressed via AtB

          commit 8152d4b853dc14f5963a85eac9f00430e59f1716
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-05-12T16:01:08Z

          MAHOUT-1702: Flink: AewScalar and AewB

          commit 679ffd9a1287410dcd6ec0cc73f7eb89f99267fd
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-05-19T15:47:39Z

          MAHOUT-1703: Flink: cbind, rbind and mapBlock

          commit cdb8b2ffde7e3fb8c097114926d8622653aaf890
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-05-26T13:43:26Z

          MAHOUT-1709: Flink: slicing operator

          commit 2310cfba4eb4846f8d10df2ae88edead36818d04
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-05-26T13:51:36Z

          MAHOUT-1710: Flink: A times incoreB operator

          commit 5350f52c1cf85e89d5881207f9e302b32496e19a
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-05-26T14:17:14Z

          MAHOUT-1570: Flink: calculating ncol, nrow; colSum, colMean, norm methods

          commit fbbfe9b73608e1c92d8fc1f178ca62cec83bcb72
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-05-28T13:03:17Z

          MAHOUT-1570: rebased to latest upstream

          commit f338f13ce2c5f4c7ed056e6aeff904afd0f72f7e
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-05-28T13:18:32Z

          NOJIRA: added .cache to gitignore

          commit f9aff54a262747cd6e3421877541a1489243b934
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-06-02T12:25:16Z

          MAHOUT-1570: upgraded to flink 0.9-SNAPSHOT for IO

          commit c0c07f695f776ec805f1749d3b5bce4b9e95c35d
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-06-02T14:04:47Z

          MAHOUT-1734: Flink: DRM IO

          commit c83e441adede177350ad4b5e1e35ff9f3bbb85f9
          Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
          Date: 2015-06-16T13:18:42Z

          MAHOUT-1570: Flink: added headers, comments and aknowledgements


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user alexeygrigorev opened a pull request: https://github.com/apache/mahout/pull/137 MAHOUT-1570 : Flink backend for Mahout DSL Implementation for MAHOUT-1570 : Adding support for Apache Flink as a backend for the Mahout DSL You can merge this pull request into a Git repository by running: $ git pull https://github.com/alexeygrigorev/mahout flink-binding Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/137.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #137 commit 37df8551c9c1b71eba251d545cc03b0dc37cd50a Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-04-09T09:54:01Z MAHOUT-1570 : initial skeleton for Mahout DSL on Apache Flink commit e4b059346f59fccb9c1e7ffa3c0e7059802358b9 Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-06-16T14:46:15Z MAHOUT-1712 : Flink: Ax, At, Atx operators commit 685566a6f701d29775c685200db999edcc773776 Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-05-05T18:05:21Z MAHOUT-1701 : Flink: AtB implemented, ABt and AtA expressed via AtB commit 8152d4b853dc14f5963a85eac9f00430e59f1716 Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-05-12T16:01:08Z MAHOUT-1702 : Flink: AewScalar and AewB commit 679ffd9a1287410dcd6ec0cc73f7eb89f99267fd Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-05-19T15:47:39Z MAHOUT-1703 : Flink: cbind, rbind and mapBlock commit cdb8b2ffde7e3fb8c097114926d8622653aaf890 Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-05-26T13:43:26Z MAHOUT-1709 : Flink: slicing operator commit 2310cfba4eb4846f8d10df2ae88edead36818d04 Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-05-26T13:51:36Z MAHOUT-1710 : Flink: A times incoreB operator commit 5350f52c1cf85e89d5881207f9e302b32496e19a Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-05-26T14:17:14Z MAHOUT-1570 : Flink: calculating ncol, nrow; colSum, colMean, norm methods commit fbbfe9b73608e1c92d8fc1f178ca62cec83bcb72 Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-05-28T13:03:17Z MAHOUT-1570 : rebased to latest upstream commit f338f13ce2c5f4c7ed056e6aeff904afd0f72f7e Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-05-28T13:18:32Z NOJIRA: added .cache to gitignore commit f9aff54a262747cd6e3421877541a1489243b934 Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-06-02T12:25:16Z MAHOUT-1570 : upgraded to flink 0.9-SNAPSHOT for IO commit c0c07f695f776ec805f1749d3b5bce4b9e95c35d Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-06-02T14:04:47Z MAHOUT-1734 : Flink: DRM IO commit c83e441adede177350ad4b5e1e35ff9f3bbb85f9 Author: Alexey Grigorev <alexey.s.grigoriev@gmail.com> Date: 2015-06-16T13:18:42Z MAHOUT-1570 : Flink: added headers, comments and aknowledgements
          Hide
          agrigorev Alexey Grigorev added a comment -

          Now when IO is working, I am ready to do a pull request, and here it is: https://github.com/apache/mahout/pull/137

          There are still tons of things to do and to improve, but I hope it should be a good start.

          Show
          agrigorev Alexey Grigorev added a comment - Now when IO is working, I am ready to do a pull request, and here it is: https://github.com/apache/mahout/pull/137 There are still tons of things to do and to improve, but I hope it should be a good start.
          Hide
          agrigorev Alexey Grigorev added a comment -

          How should I go about creating a documentation page for this?

          Show
          agrigorev Alexey Grigorev added a comment - How should I go about creating a documentation page for this?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/137#discussion_r35591432

          — Diff: flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAewScalar.scala —
          @@ -0,0 +1,105 @@
          +/**
          — End diff –

          A elementwise scalar is deprecated now. Better use unaryfunc operator. Then things like A+5 can be represented as

          OpUnaryFunc(A, x=> x+5).

          Unary function mechanism is more generic and affords for unary function fusion optimizations.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/137#discussion_r35591432 — Diff: flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAewScalar.scala — @@ -0,0 +1,105 @@ +/** — End diff – A elementwise scalar is deprecated now. Better use unaryfunc operator. Then things like A+5 can be represented as OpUnaryFunc(A, x=> x+5). Unary function mechanism is more generic and affords for unary function fusion optimizations.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-125352778

          also on the topic of test suite coverage: we need to pass our standard tests. The base clases for them are:

          https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/decompositions/DistributedDecompositionsSuiteBase.scala

          https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/DrmLikeOpsSuiteBase.scala

          https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/DrmLikeSuiteBase.scala

          https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/RLikeDrmOpsSuiteBase.scala

          The technique here is to take these test cases as a base class for a distributed test case (you may want to see how it was done for Spark and H2O). This is our basic assertion that our main algorithms are passing on a toy problem for a given backend.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-125352778 also on the topic of test suite coverage: we need to pass our standard tests. The base clases for them are: https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/decompositions/DistributedDecompositionsSuiteBase.scala https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/DrmLikeOpsSuiteBase.scala https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/DrmLikeSuiteBase.scala https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/RLikeDrmOpsSuiteBase.scala The technique here is to take these test cases as a base class for a distributed test case (you may want to see how it was done for Spark and H2O). This is our basic assertion that our main algorithms are passing on a toy problem for a given backend.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/137#discussion_r36376711

          — Diff: flink/pom.xml —
          @@ -0,0 +1,187 @@
          +<?xml version="1.0" encoding="UTF-8"?>
          +
          +<!--
          + Licensed to the Apache Software Foundation (ASF) under one or more
          + contributor license agreements. See the NOTICE file distributed with
          + this work for additional information regarding copyright ownership.
          + The ASF licenses this file to You under the Apache License, Version 2.0
          + (the "License"); you may not use this file except in compliance with
          + the License. You may obtain a copy of the License at
          +
          + http://www.apache.org/licenses/LICENSE-2.0
          +
          + Unless required by applicable law or agreed to in writing, software
          + distributed under the License is distributed on an "AS IS" BASIS,
          + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + See the License for the specific language governing permissions and
          + limitations under the License.
          +-->
          +
          +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
          + <modelVersion>4.0.0</modelVersion>
          +
          + <parent>
          + <groupId>org.apache.mahout</groupId>
          + <artifactId>mahout</artifactId>
          + <version>0.11.0-SNAPSHOT</version>
          — End diff –

          Update this to 0.11.1-SNAPSHOT, since we will be releasing 0.11.0 this week

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on a diff in the pull request: https://github.com/apache/mahout/pull/137#discussion_r36376711 — Diff: flink/pom.xml — @@ -0,0 +1,187 @@ +<?xml version="1.0" encoding="UTF-8"?> + +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd "> + <modelVersion>4.0.0</modelVersion> + + <parent> + <groupId>org.apache.mahout</groupId> + <artifactId>mahout</artifactId> + <version>0.11.0-SNAPSHOT</version> — End diff – Update this to 0.11.1-SNAPSHOT, since we will be releasing 0.11.0 this week
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-143259429

          I rebased and cleaned the commit history a bit to make sure every commit has a jira id. The documentation and the current status can be found at https://github.com/alexeygrigorev/mahout/wiki/Samsara-Flink-Bindings

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-143259429 I rebased and cleaned the commit history a bit to make sure every commit has a jira id. The documentation and the current status can be found at https://github.com/alexeygrigorev/mahout/wiki/Samsara-Flink-Bindings
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/137#discussion_r40705610

          — Diff: math-scala/src/main/scala/org/apache/mahout/math/drm/logical/CheckpointAction.scala —
          @@ -44,5 +44,6 @@ abstract class CheckpointAction[K: ClassTag] extends DrmLike[K]

          { case Some(cp) => cp }

          + val classTag = implicitly[ClassTag[K]]
          — End diff –

          maybe we should expose that in DrmLike.
          I so far had a need for this in logical operators (AbstractBinaryOp$classTagK, AbstractUnaryOp$classTagK) for optimizer's needs. so maybe it needs to be promoted up if it turns out every implementation must keep implicit classtag inside.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/137#discussion_r40705610 — Diff: math-scala/src/main/scala/org/apache/mahout/math/drm/logical/CheckpointAction.scala — @@ -44,5 +44,6 @@ abstract class CheckpointAction [K: ClassTag] extends DrmLike [K] { case Some(cp) => cp } + val classTag = implicitly[ClassTag [K] ] — End diff – maybe we should expose that in DrmLike. I so far had a need for this in logical operators (AbstractBinaryOp$classTagK, AbstractUnaryOp$classTagK) for optimizer's needs. so maybe it needs to be promoted up if it turns out every implementation must keep implicit classtag inside.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/137#discussion_r40706926

          — Diff: math-scala/src/main/scala/org/apache/mahout/math/drm/logical/CheckpointAction.scala —
          @@ -44,5 +44,6 @@ abstract class CheckpointAction[K: ClassTag] extends DrmLike[K]

          { case Some(cp) => cp }

          + val classTag = implicitly[ClassTag[K]]
          — End diff –

          I remember trying to attach it to DrmLike, but I had some problems because it's a trait

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on a diff in the pull request: https://github.com/apache/mahout/pull/137#discussion_r40706926 — Diff: math-scala/src/main/scala/org/apache/mahout/math/drm/logical/CheckpointAction.scala — @@ -44,5 +44,6 @@ abstract class CheckpointAction [K: ClassTag] extends DrmLike [K] { case Some(cp) => cp } + val classTag = implicitly[ClassTag [K] ] — End diff – I remember trying to attach it to DrmLike, but I had some problems because it's a trait
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/137#discussion_r40707550

          — Diff: math-scala/src/main/scala/org/apache/mahout/math/drm/logical/CheckpointAction.scala —
          @@ -44,5 +44,6 @@ abstract class CheckpointAction[K: ClassTag] extends DrmLike[K]

          { case Some(cp) => cp }

          + val classTag = implicitly[ClassTag[K]]
          — End diff –

          yes you can't put implementation into trait. we'd have to make it optional
          for the trait (=???) and override in the concrete things. Concrete things
          are of course just of two types: logical operators based on the classes i
          already mentioned, that already have this support, and the checkpoint
          implementation, which probably does not yet. let me play with your branch a
          little to see what we can do.

          On Tue, Sep 29, 2015 at 11:10 AM, Alexey Grigorev <notifications@github.com>
          wrote:

          > In
          > math-scala/src/main/scala/org/apache/mahout/math/drm/logical/CheckpointAction.scala
          > <https://github.com/apache/mahout/pull/137#discussion_r40706926>:
          >
          > > @@ -44,5 +44,6 @@ abstract class CheckpointAction[K: ClassTag] extends DrmLike[K]

          { > > case Some(cp) => cp > > }

          > >
          > > + val classTag = implicitly[ClassTag[K]]
          >
          > I remember trying to attach it to DrmLike, but I had some problems because
          > it's a trait
          >
          > —
          > Reply to this email directly or view it on GitHub
          > <https://github.com/apache/mahout/pull/137/files#r40706926>.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/137#discussion_r40707550 — Diff: math-scala/src/main/scala/org/apache/mahout/math/drm/logical/CheckpointAction.scala — @@ -44,5 +44,6 @@ abstract class CheckpointAction [K: ClassTag] extends DrmLike [K] { case Some(cp) => cp } + val classTag = implicitly[ClassTag [K] ] — End diff – yes you can't put implementation into trait. we'd have to make it optional for the trait (=???) and override in the concrete things. Concrete things are of course just of two types: logical operators based on the classes i already mentioned, that already have this support, and the checkpoint implementation, which probably does not yet. let me play with your branch a little to see what we can do. On Tue, Sep 29, 2015 at 11:10 AM, Alexey Grigorev <notifications@github.com> wrote: > In > math-scala/src/main/scala/org/apache/mahout/math/drm/logical/CheckpointAction.scala > < https://github.com/apache/mahout/pull/137#discussion_r40706926 >: > > > @@ -44,5 +44,6 @@ abstract class CheckpointAction [K: ClassTag] extends DrmLike [K] { > > case Some(cp) => cp > > } > > > > + val classTag = implicitly[ClassTag [K] ] > > I remember trying to attach it to DrmLike, but I had some problems because > it's a trait > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/137/files#r40706926 >. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-144181773

          Hm. doesn't compile for me with maven 3.3.3 and jdk 1.8.

          INFO] — maven-assembly-plugin:2.4.1:single (dependency-reduced) @ mahout-flink_2.10 —
          [INFO] Reading assembly descriptor: src/main/assembly/dependency-reduced.xml
          [INFO] ------------------------------------------------------------------------
          [INFO] Reactor Summary:
          [INFO]
          [INFO] Mahout Build Tools ................................. SUCCESS [ 1.782 s]
          [INFO] Apache Mahout ...................................... SUCCESS [ 0.039 s]
          [INFO] Mahout Math ........................................ SUCCESS [ 7.477 s]
          [INFO] Mahout HDFS ........................................ SUCCESS [ 1.487 s]
          [INFO] Mahout Map-Reduce .................................. SUCCESS [ 12.867 s]
          [INFO] Mahout Integration ................................. SUCCESS [ 2.695 s]
          [INFO] Mahout Examples .................................... SUCCESS [ 13.358 s]
          [INFO] Mahout Math Scala bindings ......................... SUCCESS [ 25.349 s]
          [INFO] Mahout H2O backend ................................. SUCCESS [ 16.332 s]
          [INFO] Mahout Spark bindings .............................. SUCCESS [ 26.458 s]
          [INFO] Mahout Spark bindings shell ........................ SUCCESS [ 4.863 s]
          [INFO] Mahout Release Package ............................. SUCCESS [ 0.663 s]
          [INFO] Mahout Flink bindings .............................. FAILURE [06:09 min]
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD FAILURE
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 08:03 min
          [INFO] Finished at: 2015-09-29T13:24:29-07:00
          [INFO] Final Memory: 80M/859M
          [INFO] ------------------------------------------------------------------------
          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4.1:single (dependency-reduced) on project mahout-flink_2.10: Error reading assemblies: Error locating assembly descriptor: src/main/assembly/dependency-reduced.xml
          [ERROR]
          [ERROR] [1] [INFO] Searching for file location: /TB/dmitriy/projects/mahout/flink/src/main/assembly/dependency-reduced.xml
          [ERROR]
          [ERROR] [2] [INFO] File: /TB/dmitriy/projects/mahout/flink/src/main/assembly/dependency-reduced.xml does not exist.
          [ERROR]
          [ERROR] [3] [INFO] File: /TB/dmitriy/projects/mahout/src/main/assembly/dependency-reduced.xml does not exist.
          [ERROR] -> [Help 1]
          [E

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-144181773 Hm. doesn't compile for me with maven 3.3.3 and jdk 1.8. INFO] — maven-assembly-plugin:2.4.1:single (dependency-reduced) @ mahout-flink_2.10 — [INFO] Reading assembly descriptor: src/main/assembly/dependency-reduced.xml [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools ................................. SUCCESS [ 1.782 s] [INFO] Apache Mahout ...................................... SUCCESS [ 0.039 s] [INFO] Mahout Math ........................................ SUCCESS [ 7.477 s] [INFO] Mahout HDFS ........................................ SUCCESS [ 1.487 s] [INFO] Mahout Map-Reduce .................................. SUCCESS [ 12.867 s] [INFO] Mahout Integration ................................. SUCCESS [ 2.695 s] [INFO] Mahout Examples .................................... SUCCESS [ 13.358 s] [INFO] Mahout Math Scala bindings ......................... SUCCESS [ 25.349 s] [INFO] Mahout H2O backend ................................. SUCCESS [ 16.332 s] [INFO] Mahout Spark bindings .............................. SUCCESS [ 26.458 s] [INFO] Mahout Spark bindings shell ........................ SUCCESS [ 4.863 s] [INFO] Mahout Release Package ............................. SUCCESS [ 0.663 s] [INFO] Mahout Flink bindings .............................. FAILURE [06:09 min] [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 08:03 min [INFO] Finished at: 2015-09-29T13:24:29-07:00 [INFO] Final Memory: 80M/859M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4.1:single (dependency-reduced) on project mahout-flink_2.10: Error reading assemblies: Error locating assembly descriptor: src/main/assembly/dependency-reduced.xml [ERROR] [ERROR] [1] [INFO] Searching for file location: /TB/dmitriy/projects/mahout/flink/src/main/assembly/dependency-reduced.xml [ERROR] [ERROR] [2] [INFO] File: /TB/dmitriy/projects/mahout/flink/src/main/assembly/dependency-reduced.xml does not exist. [ERROR] [ERROR] [3] [INFO] File: /TB/dmitriy/projects/mahout/src/main/assembly/dependency-reduced.xml does not exist. [ERROR] -> [Help 1] [E
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-144183839

          I also remember having problems with building Mahout with jdk 1.8. Have you
          tried 1.7?
          Unfortunately I cannot test it myself at the moment...

          On 29 September 2015 at 22:26, Dmitriy Lyubimov <notifications@github.com>
          wrote:

          > Hm. doesn't compile for me with maven 3.3.3 and jdk 1.8.
          >
          > INFO] — maven-assembly-plugin:2.4.1:single (dependency-reduced) @
          > mahout-flink_2.10 —
          > [INFO] Reading assembly descriptor:
          > src/main/assembly/dependency-reduced.xml
          > [INFO]
          > ------------------------------------------------------------------------
          > [INFO] Reactor Summary:
          > [INFO]
          > [INFO] Mahout Build Tools ................................. SUCCESS [
          > 1.782 s]
          > [INFO] Apache Mahout ...................................... SUCCESS [
          > 0.039 s]
          > [INFO] Mahout Math ........................................ SUCCESS [
          > 7.477 s]
          > [INFO] Mahout HDFS ........................................ SUCCESS [
          > 1.487 s]
          > [INFO] Mahout Map-Reduce .................................. SUCCESS [
          > 12.867 s]
          > [INFO] Mahout Integration ................................. SUCCESS [
          > 2.695 s]
          > [INFO] Mahout Examples .................................... SUCCESS [
          > 13.358 s]
          > [INFO] Mahout Math Scala bindings ......................... SUCCESS [
          > 25.349 s]
          > [INFO] Mahout H2O backend ................................. SUCCESS [
          > 16.332 s]
          > [INFO] Mahout Spark bindings .............................. SUCCESS [
          > 26.458 s]
          > [INFO] Mahout Spark bindings shell ........................ SUCCESS [
          > 4.863 s]
          > [INFO] Mahout Release Package ............................. SUCCESS [
          > 0.663 s]
          > [INFO] Mahout Flink bindings .............................. FAILURE [06:09
          > min]
          > [INFO]
          > ------------------------------------------------------------------------
          > [INFO] BUILD FAILURE
          > [INFO]
          > ------------------------------------------------------------------------
          > [INFO] Total time: 08:03 min
          > [INFO] Finished at: 2015-09-29T13:24:29-07:00
          > [INFO] Final Memory: 80M/859M
          > [INFO]
          > ------------------------------------------------------------------------
          > [ERROR] Failed to execute goal
          > org.apache.maven.plugins:maven-assembly-plugin:2.4.1:single
          > (dependency-reduced) on project mahout-flink_2.10: Error reading
          > assemblies: Error locating assembly descriptor:
          > src/main/assembly/dependency-reduced.xml
          > [ERROR]
          > [ERROR] [1] [INFO] Searching for file location:
          > /TB/dmitriy/projects/mahout/flink/src/main/assembly/dependency-reduced.xml
          > [ERROR]
          > [ERROR] [2] [INFO] File:
          > /TB/dmitriy/projects/mahout/flink/src/main/assembly/dependency-reduced.xml
          > does not exist.
          > [ERROR]
          > [ERROR] [3] [INFO] File:
          > /TB/dmitriy/projects/mahout/src/main/assembly/dependency-reduced.xml does
          > not exist.
          > [ERROR] -> [Help 1]
          > [E
          >
          > —
          > Reply to this email directly or view it on GitHub
          > <https://github.com/apache/mahout/pull/137#issuecomment-144181773>.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-144183839 I also remember having problems with building Mahout with jdk 1.8. Have you tried 1.7? Unfortunately I cannot test it myself at the moment... On 29 September 2015 at 22:26, Dmitriy Lyubimov <notifications@github.com> wrote: > Hm. doesn't compile for me with maven 3.3.3 and jdk 1.8. > > INFO] — maven-assembly-plugin:2.4.1:single (dependency-reduced) @ > mahout-flink_2.10 — > [INFO] Reading assembly descriptor: > src/main/assembly/dependency-reduced.xml > [INFO] > ------------------------------------------------------------------------ > [INFO] Reactor Summary: > [INFO] > [INFO] Mahout Build Tools ................................. SUCCESS [ > 1.782 s] > [INFO] Apache Mahout ...................................... SUCCESS [ > 0.039 s] > [INFO] Mahout Math ........................................ SUCCESS [ > 7.477 s] > [INFO] Mahout HDFS ........................................ SUCCESS [ > 1.487 s] > [INFO] Mahout Map-Reduce .................................. SUCCESS [ > 12.867 s] > [INFO] Mahout Integration ................................. SUCCESS [ > 2.695 s] > [INFO] Mahout Examples .................................... SUCCESS [ > 13.358 s] > [INFO] Mahout Math Scala bindings ......................... SUCCESS [ > 25.349 s] > [INFO] Mahout H2O backend ................................. SUCCESS [ > 16.332 s] > [INFO] Mahout Spark bindings .............................. SUCCESS [ > 26.458 s] > [INFO] Mahout Spark bindings shell ........................ SUCCESS [ > 4.863 s] > [INFO] Mahout Release Package ............................. SUCCESS [ > 0.663 s] > [INFO] Mahout Flink bindings .............................. FAILURE [06:09 > min] > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD FAILURE > [INFO] > ------------------------------------------------------------------------ > [INFO] Total time: 08:03 min > [INFO] Finished at: 2015-09-29T13:24:29-07:00 > [INFO] Final Memory: 80M/859M > [INFO] > ------------------------------------------------------------------------ > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:2.4.1:single > (dependency-reduced) on project mahout-flink_2.10: Error reading > assemblies: Error locating assembly descriptor: > src/main/assembly/dependency-reduced.xml > [ERROR] > [ERROR] [1] [INFO] Searching for file location: > /TB/dmitriy/projects/mahout/flink/src/main/assembly/dependency-reduced.xml > [ERROR] > [ERROR] [2] [INFO] File: > /TB/dmitriy/projects/mahout/flink/src/main/assembly/dependency-reduced.xml > does not exist. > [ERROR] > [ERROR] [3] [INFO] File: > /TB/dmitriy/projects/mahout/src/main/assembly/dependency-reduced.xml does > not exist. > [ERROR] -> [Help 1] > [E > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/137#issuecomment-144181773 >. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-144621492

          I tried it with 1.7 and had the same problem. I'll look into it

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-144621492 I tried it with 1.7 and had the same problem. I'll look into it
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-145770261

          So I fixed it and I can run `mvn clean package -DskipTests`. But since some of the tests don't pass for the Flink backend, if I remove the `-DskipTests`, I won't build. What do you think, should I disable the failing tests?

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-145770261 So I fixed it and I can run `mvn clean package -DskipTests`. But since some of the tests don't pass for the Flink backend, if I remove the `-DskipTests`, I won't build. What do you think, should I disable the failing tests?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-145775984

          I think it's best to disable the failing tests, else it's gonna break the ci build

          Sent from my iPhone

          > On Oct 6, 2015, at 3:45 AM, Alexey Grigorev <notifications@github.com> wrote:
          >
          > So I fixed it and I can run mvn clean package -DskipTests. But since some of the tests don't pass for the Flink backend, if I remove the -DskipTests, I won't build. What do you think, should I disable the failing tests?
          >
          > —
          > Reply to this email directly or view it on GitHub.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-145775984 I think it's best to disable the failing tests, else it's gonna break the ci build Sent from my iPhone > On Oct 6, 2015, at 3:45 AM, Alexey Grigorev <notifications@github.com> wrote: > > So I fixed it and I can run mvn clean package -DskipTests. But since some of the tests don't pass for the Flink backend, if I remove the -DskipTests, I won't build. What do you think, should I disable the failing tests? > > — > Reply to this email directly or view it on GitHub. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-145934646

          If they are standard algorithm tests coming from the abstract test suites
          in math-scala, they currently cannot be disabled (in any way i know anyway)
          without disabling them for other backends too.

          Thanks Alexey, i will take a look again when i have time!

          On Tue, Oct 6, 2015 at 1:02 AM, Suneel Marthi <notifications@github.com>
          wrote:

          > I think it's best to disable the failing tests, else it's gonna break the
          > ci build
          >
          > Sent from my iPhone
          >
          > > On Oct 6, 2015, at 3:45 AM, Alexey Grigorev <notifications@github.com>
          > wrote:
          > >
          > > So I fixed it and I can run mvn clean package -DskipTests. But since
          > some of the tests don't pass for the Flink backend, if I remove the
          > -DskipTests, I won't build. What do you think, should I disable the failing
          > tests?
          > >
          > > —
          > > Reply to this email directly or view it on GitHub.
          > >
          >
          > —
          > Reply to this email directly or view it on GitHub
          > <https://github.com/apache/mahout/pull/137#issuecomment-145775984>.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-145934646 If they are standard algorithm tests coming from the abstract test suites in math-scala, they currently cannot be disabled (in any way i know anyway) without disabling them for other backends too. Thanks Alexey, i will take a look again when i have time! On Tue, Oct 6, 2015 at 1:02 AM, Suneel Marthi <notifications@github.com> wrote: > I think it's best to disable the failing tests, else it's gonna break the > ci build > > Sent from my iPhone > > > On Oct 6, 2015, at 3:45 AM, Alexey Grigorev <notifications@github.com> > wrote: > > > > So I fixed it and I can run mvn clean package -DskipTests. But since > some of the tests don't pass for the Flink backend, if I remove the > -DskipTests, I won't build. What do you think, should I disable the failing > tests? > > > > — > > Reply to this email directly or view it on GitHub. > > > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/137#issuecomment-145775984 >. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-145945199

          > If they are standard algorithm tests coming from the abstract test suites in math-scala, they currently cannot be disabled (in any way i know anyway) without disabling them for other backends too.

          Yes, they are the standard tests, I unfortunately also don't a way to disable a specific test.

          Probably I can just disable the suites with failing tests completely, with a note that these tests should be fixed (and run when developing/changing something). I hope that my tests should give enough coverage to detect errors if they are introduced to the existing code.

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-145945199 > If they are standard algorithm tests coming from the abstract test suites in math-scala, they currently cannot be disabled (in any way i know anyway) without disabling them for other backends too. Yes, they are the standard tests, I unfortunately also don't a way to disable a specific test. Probably I can just disable the suites with failing tests completely, with a note that these tests should be fixed (and run when developing/changing something). I hope that my tests should give enough coverage to detect errors if they are introduced to the existing code.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-149045074

          Ok here is what i suggest.

          Since the code is still a bit raw (in a sense of passing tests), let's
          indeed pull Alexey's work into a dev mahout branch and then create a new
          request to merge to master to track merge conflicts (github will quick to
          note those).

          The reason is that i proably may want to do some small changes to it so the
          assumption is that it would be more visible (and easier for me to do it
          that way than to actually send a PR to Alexey's github project to enter
          changes into the pr.

          If ti is ok, i can do it.

          -d

          On Tue, Oct 6, 2015 at 10:53 AM, Alexey Grigorev <notifications@github.com>
          wrote:

          > If they are standard algorithm tests coming from the abstract test suites
          > in math-scala, they currently cannot be disabled (in any way i know anyway)
          > without disabling them for other backends too.
          >
          > Yes, they are the standard tests, I unfortunately also don't a way to
          > disable a specific test.
          >
          > Probably I can just disable the suites with failing tests completely, with
          > a note that these tests should be fixed (and run when developing/changing
          > something). I hope that my tests should give enough coverage to detect
          > errors if they are introduced to the existing code.
          >
          > —
          > Reply to this email directly or view it on GitHub
          > <https://github.com/apache/mahout/pull/137#issuecomment-145945199>.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-149045074 Ok here is what i suggest. Since the code is still a bit raw (in a sense of passing tests), let's indeed pull Alexey's work into a dev mahout branch and then create a new request to merge to master to track merge conflicts (github will quick to note those). The reason is that i proably may want to do some small changes to it so the assumption is that it would be more visible (and easier for me to do it that way than to actually send a PR to Alexey's github project to enter changes into the pr. If ti is ok, i can do it. -d On Tue, Oct 6, 2015 at 10:53 AM, Alexey Grigorev <notifications@github.com> wrote: > If they are standard algorithm tests coming from the abstract test suites > in math-scala, they currently cannot be disabled (in any way i know anyway) > without disabling them for other backends too. > > Yes, they are the standard tests, I unfortunately also don't a way to > disable a specific test. > > Probably I can just disable the suites with failing tests completely, with > a note that these tests should be fixed (and run when developing/changing > something). I hope that my tests should give enough coverage to detect > errors if they are introduced to the existing code. > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/137#issuecomment-145945199 >. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-149103398

          sure. please let me know if there's anything I can do (rebase, look into something, etc)

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-149103398 sure. please let me know if there's anything I can do (rebase, look into something, etc)
          Hide
          smarthi Suneel Marthi added a comment -

          +1 to creating a new dev branch for Alexey's work. I can take care of this today if someone's not already gotten to it.

          Show
          smarthi Suneel Marthi added a comment - +1 to creating a new dev branch for Alexey's work. I can take care of this today if someone's not already gotten to it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-149339970

          I looked at this briefly last night, was able to get it down to only 2 failing tests (of 6 total) in the DistributedDecompositionSuite: dspca and dals.

          It looks like `FlinkDrm.blockify()` is creating or adding at least one empty partition to the `BlockifiedFlinkDrm[K]` (possibly some idiosyncrasy with the backing flink Dataset..? just a quick guess).

          It's probably something deeper than that causing the problems, but by adding the below check for an empty partition, all DSSVD tests pass.

          The DSPCA test runs without exception but fails with an incorrect finishing value. The DALS test runs out of heap space even with -Xmx4g set.

          FlinkDrm.scala:
          ```
          def blockify(): BlockifiedFlinkDrm[K] = {
          val ncolLocal = ncol
          val classTag = implicitly[ClassTag[K]]

          val parts = ds.mapPartition(new MapPartitionFunction[DrmTuple[K], (Array[K], Matrix)] {
          def mapPartition(values: Iterable[DrmTuple[K]], out: Collector[(Array[K], Matrix)]): Unit = {
          val it = values.asScala.seq
          val (keys, vectors) = it.unzip

          if (!(vectors.isEmpty)) { // <-- adding this check makes DSSVD tests pass

          val isDense = vectors.head.isDense

          if (isDense) {
          val matrix = new DenseMatrix(vectors.size, ncolLocal)
          vectors.zipWithIndex.foreach

          { case (vec, idx) => matrix(idx, ::) := vec }

          out.collect((keys.toArray(classTag), matrix))
          } else

          { val matrix = new SparseRowMatrix(vectors.size, ncolLocal, vectors.toArray) out.collect((keys.toArray(classTag), matrix)) }

          }
          }
          })

          new BlockifiedFlinkDrm(parts, ncol)
          }
          ```

          a comment from @dlyubimov on this partitioning issue:
          >
          > apparently flink plans (product splitting) may produce empty partitions. Spark physical operators were careful to avoid this when deciding on the product parallelism, so with Spark this is not known to happen (neither with h20 for that matter).
          >
          > Technically of course the codes must survive empty partition situations. However, it has been patched only on a need-to-do basis, so obviously there are spots like blockify() that are not really fixed yet out of mere necessity. Since like i said the spark optimizer is careful to avoid this, it apparently never happened before (and i ran a lot of spark code on this).
          >
          > So... of course we should patch the blockify code. However, the real problem IMO is the careful review of the splitting logic in Flink physical layer. Empty splits generally should not happen more often than is absolutely unavoidable (especially if this happens after drmParallelize that explicitly said to make 2 partitions out of 500-odd rows, give or take).

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-149339970 I looked at this briefly last night, was able to get it down to only 2 failing tests (of 6 total) in the DistributedDecompositionSuite: dspca and dals. It looks like `FlinkDrm.blockify()` is creating or adding at least one empty partition to the `BlockifiedFlinkDrm [K] ` (possibly some idiosyncrasy with the backing flink Dataset..? just a quick guess). It's probably something deeper than that causing the problems, but by adding the below check for an empty partition, all DSSVD tests pass. The DSPCA test runs without exception but fails with an incorrect finishing value. The DALS test runs out of heap space even with -Xmx4g set. FlinkDrm.scala: ``` def blockify(): BlockifiedFlinkDrm [K] = { val ncolLocal = ncol val classTag = implicitly[ClassTag [K] ] val parts = ds.mapPartition(new MapPartitionFunction[DrmTuple [K] , (Array [K] , Matrix)] { def mapPartition(values: Iterable[DrmTuple [K] ], out: Collector[(Array [K] , Matrix)]): Unit = { val it = values.asScala.seq val (keys, vectors) = it.unzip if (!(vectors.isEmpty)) { // <-- adding this check makes DSSVD tests pass val isDense = vectors.head.isDense if (isDense) { val matrix = new DenseMatrix(vectors.size, ncolLocal) vectors.zipWithIndex.foreach { case (vec, idx) => matrix(idx, ::) := vec } out.collect((keys.toArray(classTag), matrix)) } else { val matrix = new SparseRowMatrix(vectors.size, ncolLocal, vectors.toArray) out.collect((keys.toArray(classTag), matrix)) } } } }) new BlockifiedFlinkDrm(parts, ncol) } ``` a comment from @dlyubimov on this partitioning issue: > > apparently flink plans (product splitting) may produce empty partitions. Spark physical operators were careful to avoid this when deciding on the product parallelism, so with Spark this is not known to happen (neither with h20 for that matter). > > Technically of course the codes must survive empty partition situations. However, it has been patched only on a need-to-do basis, so obviously there are spots like blockify() that are not really fixed yet out of mere necessity. Since like i said the spark optimizer is careful to avoid this, it apparently never happened before (and i ran a lot of spark code on this). > > So... of course we should patch the blockify code. However, the real problem IMO is the careful review of the splitting logic in Flink physical layer. Empty splits generally should not happen more often than is absolutely unavoidable (especially if this happens after drmParallelize that explicitly said to make 2 partitions out of 500-odd rows, give or take).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-149438804

          ok i pushed that branch (which is very nicely rebased, by the way, thank you!) to apache repo. I may want to make few changes. When i do, i will be submitting PRs against this branch in mahout repo (and commit if ok'd).

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-149438804 ok i pushed that branch (which is very nicely rebased, by the way, thank you!) to apache repo. I may want to make few changes. When i do, i will be submitting PRs against this branch in mahout repo (and commit if ok'd).
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user dlyubimov opened a pull request:

          https://github.com/apache/mahout/pull/161

          MAHOUT-1570, sub-pr: a siggestion: let's unify all key class tag extractors.

          Unifying "keyClassTag" of checkpoitns and "classTagK" of logical operators and

          elevating "keyClassTag" into DrmLike[] trait. No more logical forks any more .

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/dlyubimov/mahout flink-binding

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/mahout/pull/161.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #161


          commit 0df6e08f27c76c6eeb4a714e0087722bef166970
          Author: Dmitriy Lyubimov <dlyubimov@apache.org>
          Date: 2015-10-20T06:26:43Z

          Unifying "keyClassTag" of checkpoitns and "classTagK" of logical operators and
          elevating "keyClassTag" into DrmLike[] trait. No more logical forks any more .


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user dlyubimov opened a pull request: https://github.com/apache/mahout/pull/161 MAHOUT-1570 , sub-pr: a siggestion: let's unify all key class tag extractors. Unifying "keyClassTag" of checkpoitns and "classTagK" of logical operators and elevating "keyClassTag" into DrmLike[] trait. No more logical forks any more . You can merge this pull request into a Git repository by running: $ git pull https://github.com/dlyubimov/mahout flink-binding Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/161.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #161 commit 0df6e08f27c76c6eeb4a714e0087722bef166970 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: 2015-10-20T06:26:43Z Unifying "keyClassTag" of checkpoitns and "classTagK" of logical operators and elevating "keyClassTag" into DrmLike[] trait. No more logical forks any more .
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149451874

          spark backend tests pass, but in h20 i get

          `
          10-19 23:45:46.002 192.168.11.4:54321 13168 #onsSuite INFO: Cloud of size 1 formed [/192.168.11.4:54321]

              • RUN ABORTED ***
                java.lang.StackOverflowError:
                at org.apache.mahout.math.drm.DistributedEngine$.org$apache$mahout$math$drm$DistributedEngine$$pass1(DistributedEngine.scala:142)
                at org.apache.mahout.math.drm.DistributedEngine$.org$apache$mahout$math$drm$DistributedEngine$$pass1(DistributedEngine.scala:182)
                at org.apache.mahout.math.drm.DistributedEngine$class.optimizerRewrite(DistributedEngine.scala:44)
                `

          well the H20 build is known for that it never builds for me anyway

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149451874 spark backend tests pass, but in h20 i get ` 10-19 23:45:46.002 192.168.11.4:54321 13168 #onsSuite INFO: Cloud of size 1 formed [/192.168.11.4:54321] RUN ABORTED *** java.lang.StackOverflowError: at org.apache.mahout.math.drm.DistributedEngine$.org$apache$mahout$math$drm$DistributedEngine$$pass1(DistributedEngine.scala:142) at org.apache.mahout.math.drm.DistributedEngine$.org$apache$mahout$math$drm$DistributedEngine$$pass1(DistributedEngine.scala:182) at org.apache.mahout.math.drm.DistributedEngine$class.optimizerRewrite(DistributedEngine.scala:44) ` well the H20 build is known for that it never builds for me anyway
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149454965

          Ok so yes – dspca produces just wrong result, and the dssvd test just fails with out of memory.

          java.lang.OutOfMemoryError: Java heap space
          at java.util.Arrays.copyOf(Arrays.java:2271)
          at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
          at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
          at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)

          I would start with tracing problems with dssvd first. It looks like some memory management settings for flink. Any ideas? Should we just bump it the jvm memory of maven test?

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149454965 Ok so yes – dspca produces just wrong result, and the dssvd test just fails with out of memory. java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) I would start with tracing problems with dssvd first. It looks like some memory management settings for flink. Any ideas? Should we just bump it the jvm memory of maven test?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hsaputra commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/137#discussion_r42465218

          — Diff: pom.xml —
          @@ -121,6 +121,8 @@
          <scala.compat.version>2.10</scala.compat.version>
          <scala.version>2.10.4</scala.version>
          <spark.version>1.3.1</spark.version>
          + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released -->
          + <flink.version>0.9-SNAPSHOT</flink.version>
          — End diff –

          Flink 0.9.1 is out so we could remove the SNAPSHOT label I suppose.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hsaputra commented on a diff in the pull request: https://github.com/apache/mahout/pull/137#discussion_r42465218 — Diff: pom.xml — @@ -121,6 +121,8 @@ <scala.compat.version>2.10</scala.compat.version> <scala.version>2.10.4</scala.version> <spark.version>1.3.1</spark.version> + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released --> + <flink.version>0.9-SNAPSHOT</flink.version> — End diff – Flink 0.9.1 is out so we could remove the SNAPSHOT label I suppose.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hsaputra commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149467395

          Hi @dlyubimov , is there any more information of the stack trace for the failure?

          Show
          githubbot ASF GitHub Bot added a comment - Github user hsaputra commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149467395 Hi @dlyubimov , is there any more information of the stack trace for the failure?
          Hide
          smarthi Suneel Marthi added a comment -

          I am seeing this OOM too, will dig into this today. Possibly some Flink settings that we could be missing.
          We should be going off of Flink 0.10-Snapshot, since 0.10 will be released anytime soon.

          Show
          smarthi Suneel Marthi added a comment - I am seeing this OOM too, will dig into this today. Possibly some Flink settings that we could be missing. We should be going off of Flink 0.10-Snapshot, since 0.10 will be released anytime soon.
          Hide
          smarthi Suneel Marthi added a comment -

          I think we shuld change that to 0.10-SNAPSHOT

          Show
          smarthi Suneel Marthi added a comment - I think we shuld change that to 0.10-SNAPSHOT
          Hide
          hsaputra Henry Saputra added a comment -

          Going closer to latest HEAD always desirable

          Show
          hsaputra Henry Saputra added a comment - Going closer to latest HEAD always desirable
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149629763

          Full test report (with all failures)

          <script src="https://gist.github.com/dlyubimov/bfe0c7356ff775a8852b.js"></script>

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149629763 Full test report (with all failures) <script src="https://gist.github.com/dlyubimov/bfe0c7356ff775a8852b.js"></script>
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149630375

          Aha. so a lot of those are actually failures associated with empty partitions. It looks like. just what Andrew reported.
          optimizer should not (willingly) produce empty splits.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149630375 Aha. so a lot of those are actually failures associated with empty partitions. It looks like. just what Andrew reported. optimizer should not (willingly) produce empty splits.
          Hide
          smarthi Suneel Marthi added a comment -

          Other than the OOM from -dspca i.e

          Show
          smarthi Suneel Marthi added a comment - Other than the OOM from -dspca i.e
          Hide
          dlyubimov Dmitriy Lyubimov added a comment -

          sorry. here is the gist link of full test report. https://gist.github.com/dlyubimov/bfe0c7356ff775a8852b

          Show
          dlyubimov Dmitriy Lyubimov added a comment - sorry. here is the gist link of full test report. https://gist.github.com/dlyubimov/bfe0c7356ff775a8852b
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149637044

          with the memory bumped up to 4G (inside IntelliJ) the stack trace shows FlinkOpAtA as the source of the oom error in dals:
          ```
          Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
          at java.util.Arrays.copyOf(Arrays.java:2271)
          at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
          at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
          at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
          at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
          at java.io.ObjectOutputStream$BlockDataOutputStream.flush(ObjectOutputStream.java:1821)
          at java.io.ObjectOutputStream.flush(ObjectOutputStream.java:718)
          at java.io.ObjectOutputStream.close(ObjectOutputStream.java:739)
          at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:319)
          at org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:268)
          at org.apache.flink.runtime.operators.util.TaskConfig.setStubWrapper(TaskConfig.java:273)
          at org.apache.flink.optimizer.plantranslate.JobGraphGenerator.createDataSourceVertex(JobGraphGenerator.java:864)
          at org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:260)
          at org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:103)
          at org.apache.flink.optimizer.plan.SourcePlanNode.accept(SourcePlanNode.java:87)
          at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
          at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
          at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
          at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
          at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
          at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
          at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
          at org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:127)
          at org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:170)
          at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:176)
          at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:54)
          at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789)
          at org.apache.flink.api.java.DataSet.collect(DataSet.java:408)
          at org.apache.mahout.flinkbindings.blas.FlinkOpAtA$.slim(FlinkOpAtA.scala:57)
          at org.apache.mahout.flinkbindings.blas.FlinkOpAtA$.at_a(FlinkOpAtA.scala:39)
          at org.apache.mahout.flinkbindings.FlinkEngine$.flinkTranslate(FlinkEngine.scala:124)
          at org.apache.mahout.flinkbindings.FlinkEngine$.toPhysical(FlinkEngine.scala:91)
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149637044 with the memory bumped up to 4G (inside IntelliJ) the stack trace shows FlinkOpAtA as the source of the oom error in dals: ``` Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.flush(ObjectOutputStream.java:1821) at java.io.ObjectOutputStream.flush(ObjectOutputStream.java:718) at java.io.ObjectOutputStream.close(ObjectOutputStream.java:739) at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:319) at org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:268) at org.apache.flink.runtime.operators.util.TaskConfig.setStubWrapper(TaskConfig.java:273) at org.apache.flink.optimizer.plantranslate.JobGraphGenerator.createDataSourceVertex(JobGraphGenerator.java:864) at org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:260) at org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:103) at org.apache.flink.optimizer.plan.SourcePlanNode.accept(SourcePlanNode.java:87) at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) at org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:127) at org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:170) at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:176) at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:54) at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789) at org.apache.flink.api.java.DataSet.collect(DataSet.java:408) at org.apache.mahout.flinkbindings.blas.FlinkOpAtA$.slim(FlinkOpAtA.scala:57) at org.apache.mahout.flinkbindings.blas.FlinkOpAtA$.at_a(FlinkOpAtA.scala:39) at org.apache.mahout.flinkbindings.FlinkEngine$.flinkTranslate(FlinkEngine.scala:124) at org.apache.mahout.flinkbindings.FlinkEngine$.toPhysical(FlinkEngine.scala:91) ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149637584

          Thanks for looking into it! It may be my bug, but I'll let the Flink team know about it anyways.

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149637584 Thanks for looking into it! It may be my bug, but I'll let the Flink team know about it anyways.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/137#discussion_r42560808

          — Diff: pom.xml —
          @@ -121,6 +121,8 @@
          <scala.compat.version>2.10</scala.compat.version>
          <scala.version>2.10.4</scala.version>
          <spark.version>1.3.1</spark.version>
          + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released -->
          + <flink.version>0.9-SNAPSHOT</flink.version>
          — End diff –

          @alexey : BTW is there such a thing as Flink shell?

          On Tue, Oct 20, 2015 at 12:54 AM, Henry Saputra <notifications@github.com>
          wrote:

          > In pom.xml
          > <https://github.com/apache/mahout/pull/137#discussion_r42465218>:
          >
          > > @@ -121,6 +121,8 @@
          > > <scala.compat.version>2.10</scala.compat.version>
          > > <scala.version>2.10.4</scala.version>
          > > <spark.version>1.3.1</spark.version>
          > > + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released -->
          > > + <flink.version>0.9-SNAPSHOT</flink.version>
          >
          > Flink 0.9.1 is out so we could remove the SNAPSHOT label I suppose.
          >
          > —
          > Reply to this email directly or view it on GitHub
          > <https://github.com/apache/mahout/pull/137/files#r42465218>.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on a diff in the pull request: https://github.com/apache/mahout/pull/137#discussion_r42560808 — Diff: pom.xml — @@ -121,6 +121,8 @@ <scala.compat.version>2.10</scala.compat.version> <scala.version>2.10.4</scala.version> <spark.version>1.3.1</spark.version> + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released --> + <flink.version>0.9-SNAPSHOT</flink.version> — End diff – @alexey : BTW is there such a thing as Flink shell? On Tue, Oct 20, 2015 at 12:54 AM, Henry Saputra <notifications@github.com> wrote: > In pom.xml > < https://github.com/apache/mahout/pull/137#discussion_r42465218 >: > > > @@ -121,6 +121,8 @@ > > <scala.compat.version>2.10</scala.compat.version> > > <scala.version>2.10.4</scala.version> > > <spark.version>1.3.1</spark.version> > > + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released --> > > + <flink.version>0.9-SNAPSHOT</flink.version> > > Flink 0.9.1 is out so we could remove the SNAPSHOT label I suppose. > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/137/files#r42465218 >. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hsaputra commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/137#discussion_r42561078

          — Diff: pom.xml —
          @@ -121,6 +121,8 @@
          <scala.compat.version>2.10</scala.compat.version>
          <scala.version>2.10.4</scala.version>
          <spark.version>1.3.1</spark.version>
          + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released -->
          + <flink.version>0.9-SNAPSHOT</flink.version>
          — End diff –

          Flink has Scala shell:
          https://ci.apache.org/projects/flink/flink-docs-master/apis/scala_shell.html

          Show
          githubbot ASF GitHub Bot added a comment - Github user hsaputra commented on a diff in the pull request: https://github.com/apache/mahout/pull/137#discussion_r42561078 — Diff: pom.xml — @@ -121,6 +121,8 @@ <scala.compat.version>2.10</scala.compat.version> <scala.version>2.10.4</scala.version> <spark.version>1.3.1</spark.version> + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released --> + <flink.version>0.9-SNAPSHOT</flink.version> — End diff – Flink has Scala shell: https://ci.apache.org/projects/flink/flink-docs-master/apis/scala_shell.html
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/137#discussion_r42565606

          — Diff: pom.xml —
          @@ -121,6 +121,8 @@
          <scala.compat.version>2.10</scala.compat.version>
          <scala.version>2.10.4</scala.version>
          <spark.version>1.3.1</spark.version>
          + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released -->
          + <flink.version>0.9-SNAPSHOT</flink.version>
          — End diff –

          there is a Flink-scala-shell in the flink project. It should be available
          with the flink project binaries.

          On Tue, Oct 20, 2015 at 6:01 PM, Dmitriy Lyubimov <notifications@github.com>
          wrote:

          > In pom.xml
          > <https://github.com/apache/mahout/pull/137#discussion_r42560808>:
          >
          > > @@ -121,6 +121,8 @@
          > > <scala.compat.version>2.10</scala.compat.version>
          > > <scala.version>2.10.4</scala.version>
          > > <spark.version>1.3.1</spark.version>
          > > + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released -->
          > > + <flink.version>0.9-SNAPSHOT</flink.version>
          >
          > @alexey <https://github.com/alexey> : BTW is there such a thing as Flink
          > shell?
          > … <#15087459e636606c_>
          > On Tue, Oct 20, 2015 at 12:54 AM, Henry Saputra <notifications@github.com>
          > wrote: In pom.xml <#137 (comment)
          > <https://github.com/apache/mahout/pull/137#discussion_r42465218>>: > @@
          > -121,6 +121,8 @@ > <scala.compat.version>2.10</scala.compat.version> >
          > <scala.version>2.10.4</scala.version> >
          > <spark.version>1.3.1</spark.version> > + <!-- TODO: Remove snapshot
          > dependency when Flink 0.9.1 is released --> > +
          > <flink.version>0.9-SNAPSHOT</flink.version> Flink 0.9.1 is out so we could
          > remove the SNAPSHOT label I suppose. — Reply to this email directly or view
          > it on GitHub <https://github.com/apache/mahout/pull/137/files#r42465218>.
          >
          > —
          > Reply to this email directly or view it on GitHub
          > <https://github.com/apache/mahout/pull/137/files#r42560808>.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on a diff in the pull request: https://github.com/apache/mahout/pull/137#discussion_r42565606 — Diff: pom.xml — @@ -121,6 +121,8 @@ <scala.compat.version>2.10</scala.compat.version> <scala.version>2.10.4</scala.version> <spark.version>1.3.1</spark.version> + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released --> + <flink.version>0.9-SNAPSHOT</flink.version> — End diff – there is a Flink-scala-shell in the flink project. It should be available with the flink project binaries. On Tue, Oct 20, 2015 at 6:01 PM, Dmitriy Lyubimov <notifications@github.com> wrote: > In pom.xml > < https://github.com/apache/mahout/pull/137#discussion_r42560808 >: > > > @@ -121,6 +121,8 @@ > > <scala.compat.version>2.10</scala.compat.version> > > <scala.version>2.10.4</scala.version> > > <spark.version>1.3.1</spark.version> > > + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released --> > > + <flink.version>0.9-SNAPSHOT</flink.version> > > @alexey < https://github.com/alexey > : BTW is there such a thing as Flink > shell? > … <#15087459e636606c_> > On Tue, Oct 20, 2015 at 12:54 AM, Henry Saputra <notifications@github.com> > wrote: In pom.xml <#137 (comment) > < https://github.com/apache/mahout/pull/137#discussion_r42465218 >>: > @@ > -121,6 +121,8 @@ > <scala.compat.version>2.10</scala.compat.version> > > <scala.version>2.10.4</scala.version> > > <spark.version>1.3.1</spark.version> > + <!-- TODO: Remove snapshot > dependency when Flink 0.9.1 is released --> > + > <flink.version>0.9-SNAPSHOT</flink.version> Flink 0.9.1 is out so we could > remove the SNAPSHOT label I suppose. — Reply to this email directly or view > it on GitHub < https://github.com/apache/mahout/pull/137/files#r42465218 >. > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/137/files#r42560808 >. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/137#discussion_r42584189

          — Diff: pom.xml —
          @@ -121,6 +121,8 @@
          <scala.compat.version>2.10</scala.compat.version>
          <scala.version>2.10.4</scala.version>
          <spark.version>1.3.1</spark.version>
          + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released -->
          + <flink.version>0.9-SNAPSHOT</flink.version>
          — End diff –

          I also know there were plans to do Mahout DSL support for the shell, but I'm not sure if it got any further.

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on a diff in the pull request: https://github.com/apache/mahout/pull/137#discussion_r42584189 — Diff: pom.xml — @@ -121,6 +121,8 @@ <scala.compat.version>2.10</scala.compat.version> <scala.version>2.10.4</scala.version> <spark.version>1.3.1</spark.version> + <!-- TODO: Remove snapshot dependency when Flink 0.9.1 is released --> + <flink.version>0.9-SNAPSHOT</flink.version> — End diff – I also know there were plans to do Mahout DSL support for the shell, but I'm not sure if it got any further.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149884474

          The problem of the `OutOfMemoryError` in `dals` of the `DistributedDecompositionSuite` is the following:

          In order parallelize the input matrix, the `FlinkEngine` creates a `Collection[DrmTuple[Int]]` which is given to Flink's `CollectionInputFormat`. The `CollectionInputFormat` is simply a wrapper for a collection which is serialized and shipped to the corresponding data source at runtime. The problem is that the matrix rows in the collection of `DrmTuple[Int]` are of type `MatrixVectorView`. Instances of this class hold a reference to the original matrix. Therefore, serializing an instance of `MatrixVectorView` effectively boils down to serializing the complete input matrix for each row. This means that the `CollectionInputFormat` needs memory of order `size(inputMatrix) * rows(inputMatrix)` to serialize the given collection of `DrmTuple[Int]` instances.

          The input size for the dals test case are `500 * 500`. This means that the input matrix has roughly the size of 2 MB. The serialized `CollectionInputFormat` has then a size of roughly 1 GB, because we serialize the matrix for each row. If you replace `FlinkEngine.scala:231` with `val rows = (0 until m.nrow).map(i => (i, dvec(m(i, :).asInstanceOf[Vector]))`, the test will pass the serialization point (but fail with another error The reason why 1 GB of memory, given that you set the max limit to 4 GB, is already enough to crash Flink is that Flink reserver per default 70% of the available memory for its managed memory (for sorting and hashing).

          I think it would make sense to convert possible `Views` to its materialized form before distributing the data in the cluster. That way, you avoid the creation of unnecessary data which might be ok for the local use case where the data is shared.

          I hope this sheds a little bit of light in the `OutOfMemoryError` you've encountered.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149884474 The problem of the `OutOfMemoryError` in `dals` of the `DistributedDecompositionSuite` is the following: In order parallelize the input matrix, the `FlinkEngine` creates a `Collection[DrmTuple [Int] ]` which is given to Flink's `CollectionInputFormat`. The `CollectionInputFormat` is simply a wrapper for a collection which is serialized and shipped to the corresponding data source at runtime. The problem is that the matrix rows in the collection of `DrmTuple [Int] ` are of type `MatrixVectorView`. Instances of this class hold a reference to the original matrix. Therefore, serializing an instance of `MatrixVectorView` effectively boils down to serializing the complete input matrix for each row. This means that the `CollectionInputFormat` needs memory of order `size(inputMatrix) * rows(inputMatrix)` to serialize the given collection of `DrmTuple [Int] ` instances. The input size for the dals test case are `500 * 500`. This means that the input matrix has roughly the size of 2 MB. The serialized `CollectionInputFormat` has then a size of roughly 1 GB, because we serialize the matrix for each row. If you replace `FlinkEngine.scala:231` with `val rows = (0 until m.nrow).map(i => (i, dvec(m(i, : ).asInstanceOf [Vector] ))`, the test will pass the serialization point (but fail with another error The reason why 1 GB of memory, given that you set the max limit to 4 GB, is already enough to crash Flink is that Flink reserver per default 70% of the available memory for its managed memory (for sorting and hashing). I think it would make sense to convert possible `Views` to its materialized form before distributing the data in the cluster. That way, you avoid the creation of unnecessary data which might be ok for the local use case where the data is shared. I hope this sheds a little bit of light in the `OutOfMemoryError` you've encountered.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149886416

          I forgot to mention that the error is

          ```
          Caused by: java.util.NoSuchElementException: head of empty list
          at scala.collection.immutable.Nil$.head(List.scala:337)
          at scala.collection.immutable.Nil$.head(List.scala:334)
          at org.apache.mahout.flinkbindings.drm.RowsFlinkDrm$$anon$1.mapPartition(FlinkDrm.scala:68)
          at org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:98)
          at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
          at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
          at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
          at java.lang.Thread.run(Thread.java:745)
          ```

          The error appears also in some of the other test cases, therefore I think it's not a problem of Flink itself but of the implementation of the Flink bindings.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149886416 I forgot to mention that the error is ``` Caused by: java.util.NoSuchElementException: head of empty list at scala.collection.immutable.Nil$.head(List.scala:337) at scala.collection.immutable.Nil$.head(List.scala:334) at org.apache.mahout.flinkbindings.drm.RowsFlinkDrm$$anon$1.mapPartition(FlinkDrm.scala:68) at org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:98) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) ``` The error appears also in some of the other test cases, therefore I think it's not a problem of Flink itself but of the implementation of the Flink bindings.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149890401

          That's been my conclusion too, Till. Its an issue in Flink bindings impl. Thanks for ur time and feedback.

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149890401 That's been my conclusion too, Till. Its an issue in Flink bindings impl. Thanks for ur time and feedback.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-149987779

          On Wed, Oct 21, 2015 at 5:48 AM, Till Rohrmann <notifications@github.com>
          wrote:

          > The problem of the OutOfMemoryError in dals of the
          > DistributedDecompositionSuite is the following:
          >
          > In order parallelize the input matrix, the FlinkEngine creates a
          > Collection[DrmTuple[Int]] which is given to Flink's CollectionInputFormat.
          > The CollectionInputFormat is simply a wrapper for a collection which is
          > serialized and shipped to the corresponding data source at runtime. The
          > problem is that the matrix rows in the collection of DrmTuple[Int] are of
          > type MatrixVectorView. Instances of this class hold a reference to the
          > original matrix. Therefore, serializing an instance of MatrixVectorView
          > effectively boils down to serializing the complete input matrix for each
          > row.
          >
          This has never been a problem for other mappings and should not be here.
          What you serialize is either (1) collection of vectors, or (2) a matrix
          view.

          Java serialization of the mahout-math is not supported. so it has to be a
          custom serialization in each case. Mahout core provides two serialization
          means: (1) hadoop writable, (2) kryo.

          in particular, spark backend uses kryo serialization.

          In this case i guess we are speaking of collection of vectors, so the
          classes supprting these types of serialization would be VectorWritable and
          [1]. Kryo is generally a preferred way. We can move kryo support into
          math-scala out of spark package if needed.

          if neither kryo nor writable fits the bill, then additional thought is
          required. I remember vaguely talking to Stephan and he told me that
          supporting Writable serialization will suffice as a minimum requirement. So
          serializing vectors with either of these two ways should be acceptable
          regardless of actual implementation (view/non-view, doesn't matter. they
          serialize any Vector implementation).

          [1]
          https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/sparkbindings/io/VectorKryoSerializer.scala

          > This means that the CollectionInputFormat needs memory of order size(inputMatrix)
          > * rows(inputMatrix) to serialize the given collection of DrmTuple[Int]
          > instances.
          >
          > The input size for the dals test case are 500 * 500. This means that the
          > input matrix has roughly the size of 2 MB. The serialized
          > CollectionInputFormat has then a size of roughly 1 GB, because we
          > serialize the matrix for each row. If you replace FlinkEngine.scala:231
          > with val rows = (0 until m.nrow).map(i => (i, dvec(m(i,
          > :).asInstanceOf[Vector])), the test will pass the serialization point
          > (but fail with another error The reason why 1 GB of memory, given that
          > you set the max limit to 4 GB, is already enough to crash Flink is that
          > Flink reserver per default 70% of the available memory for its managed
          > memory (for sorting and hashing).
          >
          > I think it would make sense to convert possible Views to its materialized
          > form before distributing the data in the cluster. That way, you avoid the
          > creation of unnecessary data which might be ok for the local use case where
          > the data is shared.
          >
          > I hope this sheds a little bit of light in the OutOfMemoryError you've
          > encountered.
          >
          > —
          > Reply to this email directly or view it on GitHub
          > <https://github.com/apache/mahout/pull/161#issuecomment-149884474>.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-149987779 On Wed, Oct 21, 2015 at 5:48 AM, Till Rohrmann <notifications@github.com> wrote: > The problem of the OutOfMemoryError in dals of the > DistributedDecompositionSuite is the following: > > In order parallelize the input matrix, the FlinkEngine creates a > Collection[DrmTuple [Int] ] which is given to Flink's CollectionInputFormat. > The CollectionInputFormat is simply a wrapper for a collection which is > serialized and shipped to the corresponding data source at runtime. The > problem is that the matrix rows in the collection of DrmTuple [Int] are of > type MatrixVectorView. Instances of this class hold a reference to the > original matrix. Therefore, serializing an instance of MatrixVectorView > effectively boils down to serializing the complete input matrix for each > row. > This has never been a problem for other mappings and should not be here. What you serialize is either (1) collection of vectors, or (2) a matrix view. Java serialization of the mahout-math is not supported. so it has to be a custom serialization in each case. Mahout core provides two serialization means: (1) hadoop writable, (2) kryo. in particular, spark backend uses kryo serialization. In this case i guess we are speaking of collection of vectors, so the classes supprting these types of serialization would be VectorWritable and [1] . Kryo is generally a preferred way. We can move kryo support into math-scala out of spark package if needed. if neither kryo nor writable fits the bill, then additional thought is required. I remember vaguely talking to Stephan and he told me that supporting Writable serialization will suffice as a minimum requirement. So serializing vectors with either of these two ways should be acceptable regardless of actual implementation (view/non-view, doesn't matter. they serialize any Vector implementation). [1] https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/sparkbindings/io/VectorKryoSerializer.scala > This means that the CollectionInputFormat needs memory of order size(inputMatrix) > * rows(inputMatrix) to serialize the given collection of DrmTuple [Int] > instances. > > The input size for the dals test case are 500 * 500. This means that the > input matrix has roughly the size of 2 MB. The serialized > CollectionInputFormat has then a size of roughly 1 GB, because we > serialize the matrix for each row. If you replace FlinkEngine.scala:231 > with val rows = (0 until m.nrow).map(i => (i, dvec(m(i, > : ).asInstanceOf [Vector] )), the test will pass the serialization point > (but fail with another error The reason why 1 GB of memory, given that > you set the max limit to 4 GB, is already enough to crash Flink is that > Flink reserver per default 70% of the available memory for its managed > memory (for sorting and hashing). > > I think it would make sense to convert possible Views to its materialized > form before distributing the data in the cluster. That way, you avoid the > creation of unnecessary data which might be ok for the local use case where > the data is shared. > > I hope this sheds a little bit of light in the OutOfMemoryError you've > encountered. > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/161#issuecomment-149884474 >. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-150209353

          I assume the reason why this has never been a problem for the spark and h2o bindings is exactly that you've registered (at least for the sparkbindings) a proper kryo serializer for the vector types. This is not done for the flink bindings.

          Let me clarify a little bit how Flink serializes the collection of `MatrixVectorView`. Flink's type extractor detects that this type is not a POJO and thus assigns it a `GenericTypeInfo`. Types with this type info are serialized using Kryo. Since kryo has no special serializer registered it will use the default one. The default serializer in this case is the `FieldSerializer`. Thus, it will simply serialize all fields.

          That's how the `CollectionInputFormat` can serialize the `MatrixVectorView`. Consequently, `CollectionInputFormat` can be serializable using Java serialization because it takes care of serializing its data collection.

          However, I've just checked and noticed that Flink does not allow you to specify default serializers for Kryo. With that you can do the same as for the spark bindings. I'll fix this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-150209353 I assume the reason why this has never been a problem for the spark and h2o bindings is exactly that you've registered (at least for the sparkbindings) a proper kryo serializer for the vector types. This is not done for the flink bindings. Let me clarify a little bit how Flink serializes the collection of `MatrixVectorView`. Flink's type extractor detects that this type is not a POJO and thus assigns it a `GenericTypeInfo`. Types with this type info are serialized using Kryo. Since kryo has no special serializer registered it will use the default one. The default serializer in this case is the `FieldSerializer`. Thus, it will simply serialize all fields. That's how the `CollectionInputFormat` can serialize the `MatrixVectorView`. Consequently, `CollectionInputFormat` can be serializable using Java serialization because it takes care of serializing its data collection. However, I've just checked and noticed that Flink does not allow you to specify default serializers for Kryo. With that you can do the same as for the spark bindings. I'll fix this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-150217075

          Well, I just saw that you're using Flink `0.9-SNAPSHOT`. In `0.10-SNAPSHOT`, you can register default serializers for Kryo. I just tested to bump Flink's version to `0.10-SNAPSHOT` and it worked without problems. If you move `VectorKryoSerializer` and `GenericMatrixKryoSerializer` to a module which is accessible from the flink bindings module, then adding

          ```
          env.addDefaultKryoSerializer(classOf[Vector], new VectorKryoSerializer())
          env.addDefaultKryoSerializer(classOf[Matrix], new GenericMatrixKryoSerializer())
          ```

          to the constructor of `FlinkDistributedContext` solved the OOM for me. Additionally, you have to make both serializers either serializable, otherwise Flink cannot ship them.

          Alternatively, you can give both serializers a zero arg constructor (default values don't count a such a constructor for Java). Then the serializers don't have to be serializable and you can register the serializers via

          ```
          env.addDefaultKryoSerializer(classOf[Vector], classOf[VectorKryoSerializer])
          env.addDefaultKryoSerializer(classOf[Matrix], classOf[GenericMatrixKryoSerializer])
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-150217075 Well, I just saw that you're using Flink `0.9-SNAPSHOT`. In `0.10-SNAPSHOT`, you can register default serializers for Kryo. I just tested to bump Flink's version to `0.10-SNAPSHOT` and it worked without problems. If you move `VectorKryoSerializer` and `GenericMatrixKryoSerializer` to a module which is accessible from the flink bindings module, then adding ``` env.addDefaultKryoSerializer(classOf [Vector] , new VectorKryoSerializer()) env.addDefaultKryoSerializer(classOf [Matrix] , new GenericMatrixKryoSerializer()) ``` to the constructor of `FlinkDistributedContext` solved the OOM for me. Additionally, you have to make both serializers either serializable, otherwise Flink cannot ship them. Alternatively, you can give both serializers a zero arg constructor (default values don't count a such a constructor for Java). Then the serializers don't have to be serializable and you can register the serializers via ``` env.addDefaultKryoSerializer(classOf [Vector] , classOf [VectorKryoSerializer] ) env.addDefaultKryoSerializer(classOf [Matrix] , classOf [GenericMatrixKryoSerializer] ) ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-150823835

          Thanks, Till!
          I could try to look into it during the weekend, but I'm not sure how it's going to work with 2 open PRs for the same code. The solution Till suggested doesn't look too complicated though...

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-150823835 Thanks, Till! I could try to look into it during the weekend, but I'm not sure how it's going to work with 2 open PRs for the same code. The solution Till suggested doesn't look too complicated though...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-150824127

          Alexey,

          Before we try out Till's suggested fixes, we are redoing some of the backend code to refactor common functionality out to a common module.

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-150824127 Alexey, Before we try out Till's suggested fixes, we are redoing some of the backend code to refactor common functionality out to a common module.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/mahout/pull/161

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/mahout/pull/161
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-150852621

          i guess we can continue using this PR to see the current diff if Alexey pulls our mahout branch here (it of course turns out there's no way to make PR off the github repo mirror)

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-150852621 i guess we can continue using this PR to see the current diff if Alexey pulls our mahout branch here (it of course turns out there's no way to make PR off the github repo mirror)
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on the pull request:

          https://github.com/apache/mahout/pull/161#issuecomment-150876512

          Fyi, We r running off of Flink 0.10-Snapshot

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on the pull request: https://github.com/apache/mahout/pull/161#issuecomment-150876512 Fyi, We r running off of Flink 0.10-Snapshot
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-150895829

          @dlyubimov just to clarify: you're suggesting I pull from `mahout:flink-binding` to my `:flink-binding`, so all new commits from that branch appear here?

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-150895829 @dlyubimov just to clarify: you're suggesting I pull from `mahout:flink-binding` to my `:flink-binding`, so all new commits from that branch appear here?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dlyubimov commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-150896721

          Maybe. we need something to review the diff of the flink branch vs. master.
          can't create a PR since we do not own github's mahout repo.

          On Sat, Oct 24, 2015 at 11:06 PM, Alexey Grigorev <notifications@github.com>
          wrote:

          > @dlyubimov <https://github.com/dlyubimov> just to clarify: you're
          > suggesting I pull from mahout:flink-binding to my :flink-binding, so all
          > new commits from that branch appear here?
          >
          > —
          > Reply to this email directly or view it on GitHub
          > <https://github.com/apache/mahout/pull/137#issuecomment-150895829>.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-150896721 Maybe. we need something to review the diff of the flink branch vs. master. can't create a PR since we do not own github's mahout repo. On Sat, Oct 24, 2015 at 11:06 PM, Alexey Grigorev <notifications@github.com> wrote: > @dlyubimov < https://github.com/dlyubimov > just to clarify: you're > suggesting I pull from mahout:flink-binding to my :flink-binding, so all > new commits from that branch appear here? > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/137#issuecomment-150895829 >. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-150896824

          @alexey, yes please update ur local flink-bindings repo from
          mahout:flink-binding

          On Sun, Oct 25, 2015 at 2:06 AM, Alexey Grigorev <notifications@github.com>
          wrote:

          > @dlyubimov <https://github.com/dlyubimov> just to clarify: you're
          > suggesting I pull from mahout:flink-binding to my :flink-binding, so all
          > new commits from that branch appear here?
          >
          > —
          > Reply to this email directly or view it on GitHub
          > <https://github.com/apache/mahout/pull/137#issuecomment-150895829>.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-150896824 @alexey, yes please update ur local flink-bindings repo from mahout:flink-binding On Sun, Oct 25, 2015 at 2:06 AM, Alexey Grigorev <notifications@github.com> wrote: > @dlyubimov < https://github.com/dlyubimov > just to clarify: you're > suggesting I pull from mahout:flink-binding to my :flink-binding, so all > new commits from that branch appear here? > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/137#issuecomment-150895829 >. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-150896852

          Nevertheless, we have pushed some changes up to the mahout : flink-binding
          that @alexey needs to pull from

          On Sun, Oct 25, 2015 at 2:38 AM, Dmitriy Lyubimov <notifications@github.com>
          wrote:

          > Maybe. we need something to review the diff of the flink branch vs. master.
          > can't create a PR since we do not own github's mahout repo.
          >
          > On Sat, Oct 24, 2015 at 11:06 PM, Alexey Grigorev <
          > notifications@github.com>
          > wrote:
          >
          > > @dlyubimov <https://github.com/dlyubimov> just to clarify: you're
          > > suggesting I pull from mahout:flink-binding to my :flink-binding, so all
          > > new commits from that branch appear here?
          > >
          > > —
          > > Reply to this email directly or view it on GitHub
          > > <https://github.com/apache/mahout/pull/137#issuecomment-150895829>.
          >
          > >
          >
          > —
          > Reply to this email directly or view it on GitHub
          > <https://github.com/apache/mahout/pull/137#issuecomment-150896721>.
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-150896852 Nevertheless, we have pushed some changes up to the mahout : flink-binding that @alexey needs to pull from On Sun, Oct 25, 2015 at 2:38 AM, Dmitriy Lyubimov <notifications@github.com> wrote: > Maybe. we need something to review the diff of the flink branch vs. master. > can't create a PR since we do not own github's mahout repo. > > On Sat, Oct 24, 2015 at 11:06 PM, Alexey Grigorev < > notifications@github.com> > wrote: > > > @dlyubimov < https://github.com/dlyubimov > just to clarify: you're > > suggesting I pull from mahout:flink-binding to my :flink-binding, so all > > new commits from that branch appear here? > > > > — > > Reply to this email directly or view it on GitHub > > < https://github.com/apache/mahout/pull/137#issuecomment-150895829 >. > > > > > — > Reply to this email directly or view it on GitHub > < https://github.com/apache/mahout/pull/137#issuecomment-150896721 >. >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-150896909

          @alexey, the VectorKryo and GenericMatrixKryo classes have been refactored
          out to a common math-scala package.

          After u update ur local branch with the latest from mahout:flink-binding, u
          could try Till's suggested fixes for OOM (and if possible other failing
          tests) and create a PR.

          On Sun, Oct 25, 2015 at 2:42 AM, Suneel Marthi <suneel.marthi@gmail.com>
          wrote:

          > Nevertheless, we have pushed some changes up to the mahout : flink-binding
          > that @alexey needs to pull from
          >
          > On Sun, Oct 25, 2015 at 2:38 AM, Dmitriy Lyubimov <
          > notifications@github.com> wrote:
          >
          >> Maybe. we need something to review the diff of the flink branch vs.
          >> master.
          >> can't create a PR since we do not own github's mahout repo.
          >>
          >> On Sat, Oct 24, 2015 at 11:06 PM, Alexey Grigorev <
          >> notifications@github.com>
          >> wrote:
          >>
          >> > @dlyubimov <https://github.com/dlyubimov> just to clarify: you're
          >> > suggesting I pull from mahout:flink-binding to my :flink-binding, so all
          >> > new commits from that branch appear here?
          >> >
          >> > —
          >> > Reply to this email directly or view it on GitHub
          >> > <https://github.com/apache/mahout/pull/137#issuecomment-150895829>.
          >>
          >> >
          >>
          >> —
          >> Reply to this email directly or view it on GitHub
          >> <https://github.com/apache/mahout/pull/137#issuecomment-150896721>.
          >>
          >
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-150896909 @alexey, the VectorKryo and GenericMatrixKryo classes have been refactored out to a common math-scala package. After u update ur local branch with the latest from mahout:flink-binding, u could try Till's suggested fixes for OOM (and if possible other failing tests) and create a PR. On Sun, Oct 25, 2015 at 2:42 AM, Suneel Marthi <suneel.marthi@gmail.com> wrote: > Nevertheless, we have pushed some changes up to the mahout : flink-binding > that @alexey needs to pull from > > On Sun, Oct 25, 2015 at 2:38 AM, Dmitriy Lyubimov < > notifications@github.com> wrote: > >> Maybe. we need something to review the diff of the flink branch vs. >> master. >> can't create a PR since we do not own github's mahout repo. >> >> On Sat, Oct 24, 2015 at 11:06 PM, Alexey Grigorev < >> notifications@github.com> >> wrote: >> >> > @dlyubimov < https://github.com/dlyubimov > just to clarify: you're >> > suggesting I pull from mahout:flink-binding to my :flink-binding, so all >> > new commits from that branch appear here? >> > >> > — >> > Reply to this email directly or view it on GitHub >> > < https://github.com/apache/mahout/pull/137#issuecomment-150895829 >. >> >> > >> >> — >> Reply to this email directly or view it on GitHub >> < https://github.com/apache/mahout/pull/137#issuecomment-150896721 >. >> > >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexey commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-150897111

          guys its very interesting but please mention correct user during your comments
          eg: @alexeygrigorev not @alexey

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexey commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-150897111 guys its very interesting but please mention correct user during your comments eg: @alexeygrigorev not @alexey
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-150897455

          Fixed all references to correct @alexeygrigorev

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-150897455 Fixed all references to correct @alexeygrigorev
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alexeygrigorev commented on the pull request:

          https://github.com/apache/mahout/pull/137#issuecomment-150899535

          the new commits are pulled

          Show
          githubbot ASF GitHub Bot added a comment - Github user alexeygrigorev commented on the pull request: https://github.com/apache/mahout/pull/137#issuecomment-150899535 the new commits are pulled
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo closed the pull request at:

          https://github.com/apache/mahout/pull/187

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo closed the pull request at: https://github.com/apache/mahout/pull/187
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user andrewpalumbo reopened a pull request:

          https://github.com/apache/mahout/pull/187

          MAHOUT-1570: Flink binding b: to review.. not for merge

          still needs to be rebased on top of Mahout 0.11.2 master possibly.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/andrewpalumbo/mahout flink-binding-b

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/mahout/pull/187.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #187


          commit b86115c44a5ef998cc073f003ce79472a6b9005e
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2015-11-16T16:58:36Z

          Add type information to flink-bindings

          commit 6006e7730723f910bbf3ae7b4cfcdb732411a3ce
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2015-11-16T17:14:47Z

          Remove flinkbindings.blas package object

          commit e465b5e68faa3a7549b69e6df566019ea2f01624
          Author: Suneel Marthi <smarthi@apache.org>
          Date: 2015-11-16T17:39:55Z

          Merge pull request #1 from tillrohrmann/flink-binding

          Add type information to flink bindings

          commit a41d0bda894224b238da246ef222c1ca1ce101a2
          Author: smarthi <smarthi@apache.org>
          Date: 2015-11-12T06:10:54Z

          NOJira: Add missing license header

          commit 1c6f73bb747e2072c0af05aa105ca080a998c0f7
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-01-10T19:09:39Z

          fix for FlinkopAtB

          commit 1ae78e334b85ff90278d153f925539a8651487c7
          Author: smarthi <smarthi@apache.org>
          Date: 2016-01-10T22:47:36Z

          Modified to use Flink Scala Api, fixed FlinkOpABt

          commit 1ec2047cb0f1e400f3211cec6f65cb7354ac3bb6
          Author: smarthi <smarthi@apache.org>
          Date: 2016-01-10T23:18:43Z

          Implemented Sampling methods, other minor fixes

          commit f1437460899374648eb3f8fd96c19003c43bcd53
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-01-11T01:10:43Z

          Merge branch 'flink-binding' of https://github.com/smarthi/mahout into flink-binding-suneel-2016

          checked out --theirs

          Conflicts:
          flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala
          flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAtA.scala
          flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAtB.scala

          commit f457ad44d1cc4670d969ccb771fde30b90525952
          Author: smarthi <smarthi@apache.org>
          Date: 2016-01-16T03:31:20Z

          Code reformatting, added TypeInformation check

          commit a351af12f0f0e9381504cae9969f3ffe349d5d6f
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-01-16T03:36:08Z

          Merge branch 'flink-binding' of https://github.com/smarthi/mahout into flink-bindings-b

          commit dcfea65fe3ddb6b438558ad1d48d3fb1c8ce04b9
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-01-16T23:15:09Z

          Allow 'Any' key type infomation for generateTypeInformation[K: ClassTag]: TypeInformation[K]

          commit b94c045160b29b55e08ec1f7025d4d60fb8ae82c
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-01-17T20:51:17Z

          wip

          commit a0a319643964cdf0fdded31348104014bd161c36
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-02-06T19:56:37Z

          (1) Refactored flinks Hadoop1Utils into an Hadoop2Utils opject to reflect their retirement of hadoop 1 API.
          (2) wip: flink's checkpoint action seems to not be working; some trial and error. and a note added in the math-scala base tests to note this problem. ie. what shoul be deterministic tests are not.
          (3) this could possibly be an error due to flink rebalancing pre-checkpoint action.. though unlikely.
          (4) does not compile with today's flink 1.0-SNAPSHOT due to some changes in exctracting type information made in the last week or so.

          commit cbac921171deef7ea559581adae31ed27ccc65d2
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-02-06T20:54:56Z

          Use Object instead of Any

          commit 957e9f194f9fbd728c36f2f715f62561b2471a2a
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-02-06T23:10:59Z

          upgrade kryo dep in math-scala to match that used by flink. still getting OOM errors in tests

          commit c82ed88812dfe324a6829ebdf3c25a8c58b67b13
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-02-07T02:08:11Z

          clean up,
          make sure flink module is using correct kryo version,
          use 'ProblemTestSuite' as example of odd problems

          commit 22f456613aa18f16ab35f6861ed279dc8bb1b020
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-02-07T06:30:47Z

          replace accidentaly deleted line

          commit e0da8bba715d38b87f1d08353c857161dee75c47
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-02-09T02:27:55Z

          wip: drmDfsWrite: explicitly define all Writable Subclasses for ds.map() keys

          commit 759d024f2f45b232e333b9fd742902d8401ff16b
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-02T21:10:11Z

          Include java.io.serilazable interface in Martix and Vector Kryo serilizers. Update flink pom for flink 1.0 release module renaming. wip: failing stackOverFlow on FlinkOpTimesRightMatrix

          commit 44a36d7b1c0350b6cda731f1b35f5593e5a31cad
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-02T21:44:11Z

          Hack to fix kryo stackOverFlow when using native flink broadcasting of a Mahout matrix in FlinkOpTimesRight

          commit dc9104ca064c1f35f4d31dfa1a9c338308566f2a
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-03T02:14:32Z

          set the environment's parallelism to 10 in the DistributedFlinkTestSuite so that tests like dsqDist(X,Y) do not fail witl not enough slots

          commit b14f38880be9407395b3d3009b831066ef605c65
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-03T02:19:34Z

          remove all instances of @runwith[JunitRunner] from tests

          commit 1079d32f78d8cdaabd8e10d366347e576a962838
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-03T02:22:21Z

          Bump flink to 1.1-SNAPSHOT

          commit 935e9eb0f743c1170913fa7090a203af6c7c993e
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-03T03:36:31Z

          wip, move all 6 failing tests into FailingTestSuite

          commit dfb1fae5dea753e9bfc62f3f84fc20cf3c12387d
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-03T04:37:54Z

          wip: intwritable expected to be longwritable

          commit f9c39ea355aaf5b81d3c14f3e9c1d81bc7f3c48e
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-03T22:26:35Z

          isolate a skingle key type to show error with drm.dfsWrite()

          commit 73f8a70a6ba356427f85bc31c74d1a8499fa947a
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-11T00:47:02Z

          some clean up

          commit f05418cef87ff8f2d9efa59ce3a71f9f84181fda
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-11T01:14:07Z

          set flink version to 1.0.0

          commit a06f02d426d96a36800d72206692d03c68f8805f
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-11T03:09:52Z

          write a simple (Int,Int) unit test for flink hadoop output

          commit 29fe274eeaf6f2cb8d01bd6b75627ed70b624291
          Author: Andrew Palumbo <apalumbo@apache.org>
          Date: 2016-03-14T01:30:41Z

          fix for failing Writables


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user andrewpalumbo reopened a pull request: https://github.com/apache/mahout/pull/187 MAHOUT-1570 : Flink binding b: to review.. not for merge still needs to be rebased on top of Mahout 0.11.2 master possibly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewpalumbo/mahout flink-binding-b Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/187.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #187 commit b86115c44a5ef998cc073f003ce79472a6b9005e Author: Till Rohrmann <trohrmann@apache.org> Date: 2015-11-16T16:58:36Z Add type information to flink-bindings commit 6006e7730723f910bbf3ae7b4cfcdb732411a3ce Author: Till Rohrmann <trohrmann@apache.org> Date: 2015-11-16T17:14:47Z Remove flinkbindings.blas package object commit e465b5e68faa3a7549b69e6df566019ea2f01624 Author: Suneel Marthi <smarthi@apache.org> Date: 2015-11-16T17:39:55Z Merge pull request #1 from tillrohrmann/flink-binding Add type information to flink bindings commit a41d0bda894224b238da246ef222c1ca1ce101a2 Author: smarthi <smarthi@apache.org> Date: 2015-11-12T06:10:54Z NOJira: Add missing license header commit 1c6f73bb747e2072c0af05aa105ca080a998c0f7 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-01-10T19:09:39Z fix for FlinkopAtB commit 1ae78e334b85ff90278d153f925539a8651487c7 Author: smarthi <smarthi@apache.org> Date: 2016-01-10T22:47:36Z Modified to use Flink Scala Api, fixed FlinkOpABt commit 1ec2047cb0f1e400f3211cec6f65cb7354ac3bb6 Author: smarthi <smarthi@apache.org> Date: 2016-01-10T23:18:43Z Implemented Sampling methods, other minor fixes commit f1437460899374648eb3f8fd96c19003c43bcd53 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-01-11T01:10:43Z Merge branch 'flink-binding' of https://github.com/smarthi/mahout into flink-binding-suneel-2016 checked out --theirs Conflicts: flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAtA.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAtB.scala commit f457ad44d1cc4670d969ccb771fde30b90525952 Author: smarthi <smarthi@apache.org> Date: 2016-01-16T03:31:20Z Code reformatting, added TypeInformation check commit a351af12f0f0e9381504cae9969f3ffe349d5d6f Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-01-16T03:36:08Z Merge branch 'flink-binding' of https://github.com/smarthi/mahout into flink-bindings-b commit dcfea65fe3ddb6b438558ad1d48d3fb1c8ce04b9 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-01-16T23:15:09Z Allow 'Any' key type infomation for generateTypeInformation [K: ClassTag] : TypeInformation [K] commit b94c045160b29b55e08ec1f7025d4d60fb8ae82c Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-01-17T20:51:17Z wip commit a0a319643964cdf0fdded31348104014bd161c36 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-02-06T19:56:37Z (1) Refactored flinks Hadoop1Utils into an Hadoop2Utils opject to reflect their retirement of hadoop 1 API. (2) wip: flink's checkpoint action seems to not be working; some trial and error. and a note added in the math-scala base tests to note this problem. ie. what shoul be deterministic tests are not. (3) this could possibly be an error due to flink rebalancing pre-checkpoint action.. though unlikely. (4) does not compile with today's flink 1.0-SNAPSHOT due to some changes in exctracting type information made in the last week or so. commit cbac921171deef7ea559581adae31ed27ccc65d2 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-02-06T20:54:56Z Use Object instead of Any commit 957e9f194f9fbd728c36f2f715f62561b2471a2a Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-02-06T23:10:59Z upgrade kryo dep in math-scala to match that used by flink. still getting OOM errors in tests commit c82ed88812dfe324a6829ebdf3c25a8c58b67b13 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-02-07T02:08:11Z clean up, make sure flink module is using correct kryo version, use 'ProblemTestSuite' as example of odd problems commit 22f456613aa18f16ab35f6861ed279dc8bb1b020 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-02-07T06:30:47Z replace accidentaly deleted line commit e0da8bba715d38b87f1d08353c857161dee75c47 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-02-09T02:27:55Z wip: drmDfsWrite: explicitly define all Writable Subclasses for ds.map() keys commit 759d024f2f45b232e333b9fd742902d8401ff16b Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-02T21:10:11Z Include java.io.serilazable interface in Martix and Vector Kryo serilizers. Update flink pom for flink 1.0 release module renaming. wip: failing stackOverFlow on FlinkOpTimesRightMatrix commit 44a36d7b1c0350b6cda731f1b35f5593e5a31cad Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-02T21:44:11Z Hack to fix kryo stackOverFlow when using native flink broadcasting of a Mahout matrix in FlinkOpTimesRight commit dc9104ca064c1f35f4d31dfa1a9c338308566f2a Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-03T02:14:32Z set the environment's parallelism to 10 in the DistributedFlinkTestSuite so that tests like dsqDist(X,Y) do not fail witl not enough slots commit b14f38880be9407395b3d3009b831066ef605c65 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-03T02:19:34Z remove all instances of @runwith [JunitRunner] from tests commit 1079d32f78d8cdaabd8e10d366347e576a962838 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-03T02:22:21Z Bump flink to 1.1-SNAPSHOT commit 935e9eb0f743c1170913fa7090a203af6c7c993e Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-03T03:36:31Z wip, move all 6 failing tests into FailingTestSuite commit dfb1fae5dea753e9bfc62f3f84fc20cf3c12387d Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-03T04:37:54Z wip: intwritable expected to be longwritable commit f9c39ea355aaf5b81d3c14f3e9c1d81bc7f3c48e Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-03T22:26:35Z isolate a skingle key type to show error with drm.dfsWrite() commit 73f8a70a6ba356427f85bc31c74d1a8499fa947a Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-11T00:47:02Z some clean up commit f05418cef87ff8f2d9efa59ce3a71f9f84181fda Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-11T01:14:07Z set flink version to 1.0.0 commit a06f02d426d96a36800d72206692d03c68f8805f Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-11T03:09:52Z write a simple (Int,Int) unit test for flink hadoop output commit 29fe274eeaf6f2cb8d01bd6b75627ed70b624291 Author: Andrew Palumbo <apalumbo@apache.org> Date: 2016-03-14T01:30:41Z fix for failing Writables
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user smarthi commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/187#discussion_r56041224

          — Diff: spark/src/main/scala/org/apache/mahout/drivers/TrainNBDriver.scala —
          @@ -48,33 +47,33 @@ object TrainNBDriver extends MahoutSparkDriver {

          // default trainComplementary is false
          opts = opts + ("trainComplementary" -> false)

          • opt[Unit]("trainComplementary") abbr ("c") action
            Unknown macro: { (_, options) => + opt[Unit]("trainComplementary") abbr "c" action { (_, options) => options + ("trainComplementary" -> true) - } text ("Train a complementary model, Default}

            text "Train a complementary model, Default: false."

          // Laplace smoothing paramater default is 1.0
          opts = opts + ("alphaI" -> 1.0)

          • opt[Double]("alphaI") abbr ("a") action { (x, options) =>
            + opt[Double]("alphaI") abbr "a" action { (x, options) => options + ("alphaI" -> x) - }

            text ("Laplace soothing factor default is 1.0") validate

            { x => + }

            text "Laplace soothing factor default is 1.0" validate { x =>

              • End diff –

          its 'smoothing' not 'soothing'

          Show
          githubbot ASF GitHub Bot added a comment - Github user smarthi commented on a diff in the pull request: https://github.com/apache/mahout/pull/187#discussion_r56041224 — Diff: spark/src/main/scala/org/apache/mahout/drivers/TrainNBDriver.scala — @@ -48,33 +47,33 @@ object TrainNBDriver extends MahoutSparkDriver { // default trainComplementary is false opts = opts + ("trainComplementary" -> false) opt [Unit] ("trainComplementary") abbr ("c") action Unknown macro: { (_, options) => + opt[Unit]("trainComplementary") abbr "c" action { (_, options) => options + ("trainComplementary" -> true) - } text ("Train a complementary model, Default} text "Train a complementary model, Default: false." // Laplace smoothing paramater default is 1.0 opts = opts + ("alphaI" -> 1.0) opt [Double] ("alphaI") abbr ("a") action { (x, options) => + opt [Double] ("alphaI") abbr "a" action { (x, options) => options + ("alphaI" -> x) - } text ("Laplace soothing factor default is 1.0") validate { x => + } text "Laplace soothing factor default is 1.0" validate { x => End diff – its 'smoothing' not 'soothing'
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the pull request:

          https://github.com/apache/mahout/pull/187#issuecomment-196952291

          merged to and now working from apache/flink-binding

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/187#issuecomment-196952291 merged to and now working from apache/flink-binding
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo closed the pull request at:

          https://github.com/apache/mahout/pull/187

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo closed the pull request at: https://github.com/apache/mahout/pull/187
          Hide
          smarthi Suneel Marthi added a comment -

          Marking this as 'Resolved' since all the referred Jiras have been implemented

          Show
          smarthi Suneel Marthi added a comment - Marking this as 'Resolved' since all the referred Jiras have been implemented
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/mahout/pull/137

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/mahout/pull/137
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Mahout-Quality #3324 (See https://builds.apache.org/job/Mahout-Quality/3324/)
          MAHOUT-1570: initial skeleton for Mahout DSL on Apache Flink (alexey.s.grigoriev: rev bb4c4bcaf452d67b75b3c8d7c500cca6aeb31036)

          • pom.xml
          • flink/src/main/scala/org/apache/mahout/flinkbindings/package.scala
          • flink/pom.xml
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala
          • math-scala/src/main/scala/org/apache/mahout/math/drm/DistributedContext.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkDistributedContext.scala
            MAHOUT-1570: Flink: calculating ncol, nrow; colSum, colMean, norm (alexey.s.grigoriev: rev df1db7cc775ed5e10c6416e033e25e430ffdd171)
          • flink/src/main/scala/org/apache/mahout/flinkbindings/drm/CheckpointedFlinkDrm.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala
            MAHOUT-1570: rebased to latest upstream (alexey.s.grigoriev: rev de732d4bc7fae9d4719ecba54decd14376551344)
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala
            MAHOUT-1570: upgraded to flink 0.9-SNAPSHOT for IO (alexey.s.grigoriev: rev 1806ca809beb2312c62e83b3dbab314b90172118)
          • flink/src/test/scala/org/apache/mahout/flinkbindings/RLikeOpsSuite.scala
          • flink/src/test/scala/org/apache/mahout/flinkbindings/blas/LATestSuit.scala
          • pom.xml
          • flink/src/main/resources/log4j.properties
          • flink/pom.xml
            MAHOUT-1570: Flink: added headers, comments and aknowledgements (alexey.s.grigoriev: rev 58f79489129c73b9a4bd7cd5a503cfaf723ccaef)
          • flink/src/main/scala/org/apache/mahout/flinkbindings/package.scala
          • flink/src/test/scala/org/apache/mahout/flinkbindings/RLikeOpsSuite.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkDistributedContext.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/io/DrmMetadata.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpCBind.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/io/HDFSPathSearch.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpTimesRightMatrix.scala
          • flink/src/test/scala/org/apache/mahout/flinkbindings/examples/ReadCsvExample.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/drm/CheckpointedFlinkDrm.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/drm/FlinkDrm.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/package.scala
          • flink/src/test/scala/org/apache/mahout/flinkbindings/blas/LATestSuit.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAewB.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/io/Hadoop1HDFSUtil.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAt.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpMapBlock.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/DataSetOps.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpRowRange.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpRBind.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAtB.scala
          • flink/src/test/scala/org/apache/mahout/flinkbindings/DistributedFlinkSuit.scala
          • flink/src/test/scala/org/apache/mahout/flinkbindings/UseCasesSuite.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAewScalar.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAx.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/io/HDFSUtil.scala
            MAHOUT-1570: rebased to latest upstream (alexey.s.grigoriev: rev 8de8b798f0ffa374c2b18e00b8130ac0a0d8e918)
          • flink/src/test/scala/org/apache/mahout/flinkbindings/DistributedFlinkSuit.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkByteBCast.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/package.scala
            MAHOUT-1570: Flink: imports cleaned (alexey.s.grigoriev: rev 851eebcb648752f1a3bff08191d19ce3d6f66ab5)
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAt.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/drm/FlinkDrm.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpTimesRightMatrix.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpMapBlock.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpRowRange.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAewB.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpCBind.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/drm/CheckpointedFlinkDrm.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpRBind.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAtB.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAx.scala
            MAHOUT-1570: Flink: nrow and ncol optimized (alexey.s.grigoriev: rev be815fb25f4c008bf3809bb444e3d8562dea96fa)
          • flink/src/main/scala/org/apache/mahout/flinkbindings/drm/CheckpointedFlinkDrm.scala
            MAHOUT-1570: rebased to latest upstream (alexey.s.grigoriev: rev 4bafc76276ff3d6a3eb1687d4d1c3f834671a482)
          • flink/pom.xml
            MAHOUT-1570: Flink: drmParallelizeEmpty and extra tests (alexey.s.grigoriev: rev d13f48849ad5658cc03acea57f7c5d9d14bad137)
          • flink/src/test/scala/org/apache/mahout/flinkbindings/DrmLikeOpsSuite.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala
            MAHOUT-1570: Flink: numNonZeroElementsPerColumn (alexey.s.grigoriev: rev 35426a96b98bb74538466318c5347d2f90415e97)
          • flink/src/test/scala/org/apache/mahout/flinkbindings/DrmLikeOpsSuite.scala
          • flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala
            MAHOUT-1570: fixed maven issue (a.grigorev: rev c72698aca16368c4f16d5181fbdd3db6314c04e3)
          • flink/pom.xml
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Mahout-Quality #3324 (See https://builds.apache.org/job/Mahout-Quality/3324/ ) MAHOUT-1570 : initial skeleton for Mahout DSL on Apache Flink (alexey.s.grigoriev: rev bb4c4bcaf452d67b75b3c8d7c500cca6aeb31036) pom.xml flink/src/main/scala/org/apache/mahout/flinkbindings/package.scala flink/pom.xml flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala math-scala/src/main/scala/org/apache/mahout/math/drm/DistributedContext.scala flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkDistributedContext.scala MAHOUT-1570 : Flink: calculating ncol, nrow; colSum, colMean, norm (alexey.s.grigoriev: rev df1db7cc775ed5e10c6416e033e25e430ffdd171) flink/src/main/scala/org/apache/mahout/flinkbindings/drm/CheckpointedFlinkDrm.scala flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala MAHOUT-1570 : rebased to latest upstream (alexey.s.grigoriev: rev de732d4bc7fae9d4719ecba54decd14376551344) flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala MAHOUT-1570 : upgraded to flink 0.9-SNAPSHOT for IO (alexey.s.grigoriev: rev 1806ca809beb2312c62e83b3dbab314b90172118) flink/src/test/scala/org/apache/mahout/flinkbindings/RLikeOpsSuite.scala flink/src/test/scala/org/apache/mahout/flinkbindings/blas/LATestSuit.scala pom.xml flink/src/main/resources/log4j.properties flink/pom.xml MAHOUT-1570 : Flink: added headers, comments and aknowledgements (alexey.s.grigoriev: rev 58f79489129c73b9a4bd7cd5a503cfaf723ccaef) flink/src/main/scala/org/apache/mahout/flinkbindings/package.scala flink/src/test/scala/org/apache/mahout/flinkbindings/RLikeOpsSuite.scala flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkDistributedContext.scala flink/src/main/scala/org/apache/mahout/flinkbindings/io/DrmMetadata.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpCBind.scala flink/src/main/scala/org/apache/mahout/flinkbindings/io/HDFSPathSearch.scala flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpTimesRightMatrix.scala flink/src/test/scala/org/apache/mahout/flinkbindings/examples/ReadCsvExample.scala flink/src/main/scala/org/apache/mahout/flinkbindings/drm/CheckpointedFlinkDrm.scala flink/src/main/scala/org/apache/mahout/flinkbindings/drm/FlinkDrm.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/package.scala flink/src/test/scala/org/apache/mahout/flinkbindings/blas/LATestSuit.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAewB.scala flink/src/main/scala/org/apache/mahout/flinkbindings/io/Hadoop1HDFSUtil.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAt.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpMapBlock.scala flink/src/main/scala/org/apache/mahout/flinkbindings/DataSetOps.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpRowRange.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpRBind.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAtB.scala flink/src/test/scala/org/apache/mahout/flinkbindings/DistributedFlinkSuit.scala flink/src/test/scala/org/apache/mahout/flinkbindings/UseCasesSuite.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAewScalar.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAx.scala flink/src/main/scala/org/apache/mahout/flinkbindings/io/HDFSUtil.scala MAHOUT-1570 : rebased to latest upstream (alexey.s.grigoriev: rev 8de8b798f0ffa374c2b18e00b8130ac0a0d8e918) flink/src/test/scala/org/apache/mahout/flinkbindings/DistributedFlinkSuit.scala flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkByteBCast.scala flink/src/main/scala/org/apache/mahout/flinkbindings/package.scala MAHOUT-1570 : Flink: imports cleaned (alexey.s.grigoriev: rev 851eebcb648752f1a3bff08191d19ce3d6f66ab5) flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAt.scala flink/src/main/scala/org/apache/mahout/flinkbindings/drm/FlinkDrm.scala flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpTimesRightMatrix.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpMapBlock.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpRowRange.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAewB.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpCBind.scala flink/src/main/scala/org/apache/mahout/flinkbindings/drm/CheckpointedFlinkDrm.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpRBind.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAtB.scala flink/src/main/scala/org/apache/mahout/flinkbindings/blas/FlinkOpAx.scala MAHOUT-1570 : Flink: nrow and ncol optimized (alexey.s.grigoriev: rev be815fb25f4c008bf3809bb444e3d8562dea96fa) flink/src/main/scala/org/apache/mahout/flinkbindings/drm/CheckpointedFlinkDrm.scala MAHOUT-1570 : rebased to latest upstream (alexey.s.grigoriev: rev 4bafc76276ff3d6a3eb1687d4d1c3f834671a482) flink/pom.xml MAHOUT-1570 : Flink: drmParallelizeEmpty and extra tests (alexey.s.grigoriev: rev d13f48849ad5658cc03acea57f7c5d9d14bad137) flink/src/test/scala/org/apache/mahout/flinkbindings/DrmLikeOpsSuite.scala flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala MAHOUT-1570 : Flink: numNonZeroElementsPerColumn (alexey.s.grigoriev: rev 35426a96b98bb74538466318c5347d2f90415e97) flink/src/test/scala/org/apache/mahout/flinkbindings/DrmLikeOpsSuite.scala flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala MAHOUT-1570 : fixed maven issue (a.grigorev: rev c72698aca16368c4f16d5181fbdd3db6314c04e3) flink/pom.xml
          Hide
          smarthi Suneel Marthi added a comment -

          Closing issues following Mahout 0.12.0 release on April 11, 2016

          Show
          smarthi Suneel Marthi added a comment - Closing issues following Mahout 0.12.0 release on April 11, 2016

            People

            • Assignee:
              smarthi Suneel Marthi
              Reporter:
              till.rohrmann Till Rohrmann
            • Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development

                  Agile