Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9003

Add map/update function to MLlib/Vector

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: MLlib
    • Labels:
      None

      Description

      MLlib/Vector only support foreachActive function and is short of map/update which is inconvenience for some Vector operations.
      For example:
      val a = Vectors.dense(...)
      If we want to compute math.log for each elements of a and get Vector as return value, we can only code as:
      val b = Vectors.dense(a.toArray.map(math.log))
      or we can use "toBreeze" and "fromBreeze" make transformation with breeze API.
      The code snippet is not elegant, we want it can implement:
      val c = a.map(math.log)
      Also currently MLlib/Matrix has implemented map/update/foreachActive function. I think Vector should also has map/update.

        Issue Links

          Activity

          Hide
          srowen Sean Owen added a comment -

          I think the idea was that this is not supposed to become yet another vector/matrix library, and that you can manipulate the underlying breeze vector if needed. I don't know how strong that convention is. The use case you show doesn't really benefit except for maybe saving a method call; is there a case where this would be a bigger win?

          Show
          srowen Sean Owen added a comment - I think the idea was that this is not supposed to become yet another vector/matrix library, and that you can manipulate the underlying breeze vector if needed. I don't know how strong that convention is. The use case you show doesn't really benefit except for maybe saving a method call; is there a case where this would be a bigger win?
          Hide
          yanboliang Yanbo Liang added a comment - - edited

          Yes, I agree that this is not supposed to become yet another vector/matrix libaray. But I think map/update function is important enough to become the interface of vector just like foreachActive which is supported at present.
          I can also provide an example which may be benefit of these function.
          For example:
          val originalPrediction = Vectors.dense(Array(1, 2, 3))
          val expected = Vectors.dense(Array(10, 20, 30))

          In some cases, we can use "~==" to compare two Vector/Matrix which is defined in org.apache.spark.mllib.util.TestingUtils.

          So currently we can only code as following:
          val prediction = Vectors.dense(originalPrediction.toArray.map(x => x*10))
          assert(prediction ~== expected absTol 0.01, "prediction error")

          If we support map/update for Vector, we can code as:
          assert(originalPrediction.map(x => x*10) ~== expected absTol 0.01, "prediction error")

          However, MLlib/Matrix has already supported map/update/foreachActive function, and we can compare two Matrices use ~== effortless.

          Show
          yanboliang Yanbo Liang added a comment - - edited Yes, I agree that this is not supposed to become yet another vector/matrix libaray. But I think map/update function is important enough to become the interface of vector just like foreachActive which is supported at present. I can also provide an example which may be benefit of these function. For example: val originalPrediction = Vectors.dense(Array(1, 2, 3)) val expected = Vectors.dense(Array(10, 20, 30)) In some cases, we can use "~==" to compare two Vector/Matrix which is defined in org.apache.spark.mllib.util.TestingUtils. So currently we can only code as following: val prediction = Vectors.dense(originalPrediction.toArray.map(x => x*10)) assert(prediction ~== expected absTol 0.01, "prediction error") If we support map/update for Vector, we can code as: assert(originalPrediction.map(x => x*10) ~== expected absTol 0.01, "prediction error") However, MLlib/Matrix has already supported map/update/foreachActive function, and we can compare two Matrices use ~== effortless.
          Hide
          apachespark Apache Spark added a comment -

          User 'yanboliang' has created a pull request for this issue:
          https://github.com/apache/spark/pull/7357

          Show
          apachespark Apache Spark added a comment - User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/7357
          Hide
          josephkb Joseph K. Bradley added a comment -

          There is discussion about having a full-fledged local linear algebra library. This is set to 1.5 but may not actually make it. It'd be good to synch with others watching that JIRA.

          Show
          josephkb Joseph K. Bradley added a comment - There is discussion about having a full-fledged local linear algebra library. This is set to 1.5 but may not actually make it. It'd be good to synch with others watching that JIRA.
          Hide
          srowen Sean Owen added a comment -

          Joseph K. Bradley Please not another one! the world has too many.

          Show
          srowen Sean Owen added a comment - Joseph K. Bradley Please not another one! the world has too many.
          Hide
          josephkb Joseph K. Bradley added a comment -

          I won't argue with that. But we do need one with a friendly license, a stable API, active development and maintenance, and decent performance. Suggestions?

          Show
          josephkb Joseph K. Bradley added a comment - I won't argue with that. But we do need one with a friendly license, a stable API, active development and maintenance, and decent performance. Suggestions?
          Hide
          srowen Sean Owen added a comment -

          Is this SPARK-6442? let me comment there.

          Show
          srowen Sean Owen added a comment - Is this SPARK-6442 ? let me comment there.

            People

            • Assignee:
              Unassigned
              Reporter:
              yanboliang Yanbo Liang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development