Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1833

Enhance svec function to accept cardinality as parameter

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.12.1
    • Component/s: Math
    • Labels:
      None
    • Environment:

      Mahout Spark Shell 0.12.0,
      Spark 1.6.0 Cluster on Hadoop Yarn 2.7.1,
      Centos7 64bit

      Description

      It will be nice to enhance the existing svec function in org.apache.mahout.math.scalabindings

        /**
         * create a sparse vector out of list of tuple2's
         * @param sdata cardinality
         * @return
         */
        def svec(sdata: TraversableOnce[(Int, AnyVal)], cardinality: Int = -1) = {
          val required = if (sdata.nonEmpty) sdata.map(_._1).max + 1 else 0
          var tmp = -1
          if (cardinality < 0) {
            tmp = required
          } else if (cardinality < required) {
            throw new IllegalArgumentException(s"Required cardinality %required but got %cardinality")
          } else {
            tmp = cardinality
          }
          val initialCapacity = sdata.size
          val sv = new RandomAccessSparseVector(tmp, initialCapacity)
          sdata.foreach(t ⇒ sv.setQuick(t._1, t._2.asInstanceOf[Number].doubleValue()))
          sv
        }
      

      So user can specify the cardinality for the created sparse vector.

      This is very useful and convenient if user wants to create a DRM with many sparse vectors and the vectors are not with the same actual size(but with the same logical size, e.g. rows of a sparse matrix).

      Below code should demonstrate the case:

      var cardinality = 20
      val rdd = sc.textFile("/some/file.txt").map(_.split(",")).map(line => (line(0).toInt, Array((line(1).toInt,1)))).reduceByKey((v1, v2) => v1 ++ v2).map(row => (row._1, svec(row._2,cardinality)))
      
      val drm = drmWrap(rdd.map(row => (row._1, row._2.asInstanceOf[Vector])))
      
      // All below element wise opperations will fail for those DRM with not cardinality-consistent SparseVector
      val drm2 = drm + drm.t
      val drm3 = drm - drm.t
      val drm4 = drm * drm.t
      val drm5 = drm / drm.t
      

      Notice that in the last map, the svec acceptted one more cardinality parameter, so the cardinality of those created sparse vectors can be consistent.

        Attachments

          Activity

            People

            • Assignee:
              resec Edmond Luo
              Reporter:
              resec Edmond Luo
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: