Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-2019

SparseRowMatrix assign ops user for loops instead of iterateNonZero and so can be optimized

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.1
    • Component/s: Math
    • Labels:
      None

      Description

      DRMs get blockified into SparseRowMatrix instances if the density is low. But SRM inherits the implementation of method like "assign" from AbstractMatrix, which uses nest for loops to traverse rows. For multiplying 2 matrices that are extremely sparse, the kind if data you see in collaborative filtering, this is extremely wasteful of execution time. Better to use a sparse vector's iterateNonZero Iterator for some function types.

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build Mahout-Quality #3511 (See https://builds.apache.org/job/Mahout-Quality/3511/)
          MAHOUT-2019 SparkRow Matrix Speedup and fixing change to scala 2.11 made (pat: rev 800a9ed6d7e015aa82b9eb7624bb441b71a8f397)

          • (edit) math/src/main/java/org/apache/mahout/math/SparseRowMatrix.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Mahout-Quality #3511 (See https://builds.apache.org/job/Mahout-Quality/3511/ ) MAHOUT-2019 SparkRow Matrix Speedup and fixing change to scala 2.11 made (pat: rev 800a9ed6d7e015aa82b9eb7624bb441b71a8f397) (edit) math/src/main/java/org/apache/mahout/math/SparseRowMatrix.java
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user pferrel commented on the issue:

          https://github.com/apache/mahout/pull/342

          Yes, will merge today

          Show
          githubbot ASF GitHub Bot added a comment - Github user pferrel commented on the issue: https://github.com/apache/mahout/pull/342 Yes, will merge today
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user pferrel closed the pull request at:

          https://github.com/apache/mahout/pull/342

          Show
          githubbot ASF GitHub Bot added a comment - Github user pferrel closed the pull request at: https://github.com/apache/mahout/pull/342
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/342

          Is this ready to merge?

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/342 Is this ready to merge?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/342

          Status on this?

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/342 Status on this?
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user pferrel opened a pull request:

          https://github.com/apache/mahout/pull/342

          MAHOUT-2019 Sparse speedup

              1. Purpose of PR:
                to review an apparent speedup of spark-itemsimilarity and the underlying SimilarityAnalysis.cooccurrence by using an iterateNonZero instead of the previous for loops in SparseRowMatrix.

          For discussion only at present

          MAHOUT-2019
          https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2019?filter=allopenissues&orderby=priority+DESC%2C+updated+DESC

              1. Important ToDos
                Please mark each with an "x"
          • [x] A JIRA ticket exists (if not, please create this first)https://issues.apache.org/jira/browse/ZEPPELIN/
          • [x] Title of PR is "MAHOUT-XXXX Brief Description of Changes" where XXXX is the JIRA number.
          • [ ] Created unit tests where appropriate
          • [ ] Added licenses correct on newly added files
          • [ ] Assigned JIRA to self
          • [ ] Added documentation in scala docs/java docs, and to website
          • [ ] Successfully built and ran all unit tests, verified that all tests pass locally.

          If all of these things aren't complete, but you still feel it is
          appropriate to open a PR, please add [WIP] after MAHOUT-XXXX before the
          descriptions- e.g. "MAHOUT-XXXX [WIP] Description of Change"

          Does this change break earlier versions?

          Is this the beginning of a larger project for which a feature branch should be made?

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/pferrel/mahout sparse-speedup

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/mahout/pull/342.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #342


          commit 26a2efa65e9f09df358e1021ebf45e3735e2ec6c
          Author: pferrel <pat@occamsmachete.com>
          Date: 2017-10-02T18:39:54Z

          minimum speedup fix

          commit 9330a2ed6d1211459c57863a5d664377c55aa747
          Author: pferrel <pat@occamsmachete.com>
          Date: 2017-10-02T19:27:47Z

          minimum speedup fix with cast exception check

          commit 722bd11f01e7250f99f21f17ec7211bf5abb2089
          Author: pferrel <pat@occamsmachete.com>
          Date: 2017-10-02T20:33:07Z

          added cast exception logging to SparseRowMatrix

          commit 02700ef13c44e403cba58288dcbab5cfabed8585
          Author: pferrel <pat@occamsmachete.com>
          Date: 2017-10-02T20:35:14Z

          Merge branch 'master' into sparse-speedup


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user pferrel opened a pull request: https://github.com/apache/mahout/pull/342 MAHOUT-2019 Sparse speedup Purpose of PR: to review an apparent speedup of spark-itemsimilarity and the underlying SimilarityAnalysis.cooccurrence by using an iterateNonZero instead of the previous for loops in SparseRowMatrix. For discussion only at present MAHOUT-2019 https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2019?filter=allopenissues&orderby=priority+DESC%2C+updated+DESC Important ToDos Please mark each with an "x" [x] A JIRA ticket exists (if not, please create this first) https://issues.apache.org/jira/browse/ZEPPELIN/ [x] Title of PR is "MAHOUT-XXXX Brief Description of Changes" where XXXX is the JIRA number. [ ] Created unit tests where appropriate [ ] Added licenses correct on newly added files [ ] Assigned JIRA to self [ ] Added documentation in scala docs/java docs, and to website [ ] Successfully built and ran all unit tests, verified that all tests pass locally. If all of these things aren't complete, but you still feel it is appropriate to open a PR, please add [WIP] after MAHOUT-XXXX before the descriptions- e.g. "MAHOUT-XXXX [WIP] Description of Change" Does this change break earlier versions? Is this the beginning of a larger project for which a feature branch should be made? You can merge this pull request into a Git repository by running: $ git pull https://github.com/pferrel/mahout sparse-speedup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/342.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #342 commit 26a2efa65e9f09df358e1021ebf45e3735e2ec6c Author: pferrel <pat@occamsmachete.com> Date: 2017-10-02T18:39:54Z minimum speedup fix commit 9330a2ed6d1211459c57863a5d664377c55aa747 Author: pferrel <pat@occamsmachete.com> Date: 2017-10-02T19:27:47Z minimum speedup fix with cast exception check commit 722bd11f01e7250f99f21f17ec7211bf5abb2089 Author: pferrel <pat@occamsmachete.com> Date: 2017-10-02T20:33:07Z added cast exception logging to SparseRowMatrix commit 02700ef13c44e403cba58288dcbab5cfabed8585 Author: pferrel <pat@occamsmachete.com> Date: 2017-10-02T20:35:14Z Merge branch 'master' into sparse-speedup

            People

            • Assignee:
              pferrel Pat Ferrel
              Reporter:
              pferrel Pat Ferrel
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development