Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.9
    • Fix Version/s: 1.3.0
    • Labels:
      None

      Description

      predict() and fit() require at the moment DataSet[(Int, Int)] or DataSet[(Int, Int, Double]) respectively.
      This should be changed to Long to accept more values or to something more general.
      See also http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Apache-Flink-0-9-ALS-API-td6424.html

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3265

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3265
          Hide
          till.rohrmann Till Rohrmann added a comment -

          Fixed via 43d2fd23a75a5ac7769d37cb5c2559803bd65800

          Show
          till.rohrmann Till Rohrmann added a comment - Fixed via 43d2fd23a75a5ac7769d37cb5c2559803bd65800
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the issue:

          https://github.com/apache/flink/pull/3265

          Travis passed. Merging this PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/3265 Travis passed. Merging this PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tillrohrmann opened a pull request:

          https://github.com/apache/flink/pull/3265

          FLINK-2211 [ml] Generalize ALS API

          This generalizes the ALS API, because it allows the `users` and `items` to be of type `Long` instead of `Int`.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tillrohrmann/flink generalizeALSAPI

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3265.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3265


          commit 2bbc8a4efd5831fa55d11537db5c375bffca4e68
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2017-02-03T17:22:13Z

          FLINK-2211 [ml] Generalize ALS API

          Allows the user and items to be of type Long


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/3265 FLINK-2211 [ml] Generalize ALS API This generalizes the ALS API, because it allows the `users` and `items` to be of type `Long` instead of `Int`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink generalizeALSAPI Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3265.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3265 commit 2bbc8a4efd5831fa55d11537db5c375bffca4e68 Author: Till Rohrmann <trohrmann@apache.org> Date: 2017-02-03T17:22:13Z FLINK-2211 [ml] Generalize ALS API Allows the user and items to be of type Long
          Hide
          till.rohrmann Till Rohrmann added a comment -

          If you have more items than 2^31-1 then you clearly need Long IDs for them. However every item block cannot contain more than 2^31 - 1 item vectors, because they are stored in an array. However, by increasing the number of item blocks one can decrease the number of items per block so that no block contains more items than 2^31 - 1. But I think this is a fair assumption since you usually are not able to keep an array of #itemsPerBlock * #latentFactors * sizeOfDouble bytes with #itemsPerBlock >> 2^31 - 1 in your memory anyway. Furthermore, it's safe to assume that #latentFactors < 2^31 -1 IMO.

          Show
          till.rohrmann Till Rohrmann added a comment - If you have more items than 2^31-1 then you clearly need Long IDs for them. However every item block cannot contain more than 2^31 - 1 item vectors, because they are stored in an array. However, by increasing the number of item blocks one can decrease the number of items per block so that no block contains more items than 2^31 - 1 . But I think this is a fair assumption since you usually are not able to keep an array of #itemsPerBlock * #latentFactors * sizeOfDouble bytes with #itemsPerBlock >> 2^31 - 1 in your memory anyway. Furthermore, it's safe to assume that #latentFactors < 2^31 -1 IMO.
          Hide
          till.rohrmann Till Rohrmann added a comment -

          I thought we were talking about making the user IDs and item IDs a Long. Why do you want to make the number of latent factors also a Long?

          Show
          till.rohrmann Till Rohrmann added a comment - I thought we were talking about making the user IDs and item IDs a Long . Why do you want to make the number of latent factors also a Long ?
          Hide
          rbraeunlich Ronny Bräunlich added a comment -

          I guess then we can close this issue as "won't resolve".

          Show
          rbraeunlich Ronny Bräunlich added a comment - I guess then we can close this issue as "won't resolve".
          Hide
          till.rohrmann Till Rohrmann added a comment -

          The number of factors should actually never be greater than an Int. If you want to calculate models with more than 2^31-1 features, then you cannot do it with Flink.

          Show
          till.rohrmann Till Rohrmann added a comment - The number of factors should actually never be greater than an Int . If you want to calculate models with more than 2^31-1 features, then you cannot do it with Flink.
          Hide
          rbraeunlich Ronny Bräunlich added a comment -

          Hey people,
          I have a small (big, major (pick one) ) problem and hope you can help me.
          While trying to convert every Int to Long Scala/Java puts some limitations on us.
          The ALS implementation uses Arrays internally, e.g.

          val matrix = Array.fill(triangleSize)(0.0)
          val fullMatrix = Array.fill(factors * factors)(0.0)
          

          and Arrays can only bit set to an Int size. Alternatives, like ArrayBuffer or ArrayLists have the same limitations.
          Do you have any ideas how to solve this? Converting the Longs to Ints would bring us back to the initial problem.
          Cheers,
          Ronny

          Show
          rbraeunlich Ronny Bräunlich added a comment - Hey people, I have a small (big, major (pick one) ) problem and hope you can help me. While trying to convert every Int to Long Scala/Java puts some limitations on us. The ALS implementation uses Arrays internally, e.g. val matrix = Array.fill(triangleSize)(0.0) val fullMatrix = Array.fill(factors * factors)(0.0) and Arrays can only bit set to an Int size. Alternatives, like ArrayBuffer or ArrayLists have the same limitations. Do you have any ideas how to solve this? Converting the Longs to Ints would bring us back to the initial problem. Cheers, Ronny
          Hide
          rbraeunlich Ronny Bräunlich added a comment -

          Sure, why not

          Show
          rbraeunlich Ronny Bräunlich added a comment - Sure, why not
          Hide
          till.rohrmann Till Rohrmann added a comment -

          Good idea Ronny Bräunlich, do you want to take the lead here?

          Show
          till.rohrmann Till Rohrmann added a comment - Good idea Ronny Bräunlich , do you want to take the lead here?

            People

            • Assignee:
              till.rohrmann Till Rohrmann
              Reporter:
              rbraeunlich Ronny Bräunlich
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development