Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5384

Vectors.sqdist return inconsistent result for sparse/dense vectors when the vectors have different lengths

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.3.0
    • 1.3.0
    • MLlib
    • None
    • centos, others should be similar

    Description

      For two vectors of different lengths, Vectors.sqdist would return different result when the vectors are represented as sparse and dense respectively. Sample:
      val s1 = new SparseVector(4, Array(0,1,2,3), Array(1.0, 2.0, 3.0, 4.0))
      val s2 = new SparseVector(1, Array(0), Array(9.0))
      val d1 = new DenseVector(Array(1.0, 2.0, 3.0, 4.0))
      val d2 = new DenseVector(Array(9.0))
      println(s1 == d1 && s2 == d2)
      println(Vectors.sqdist(s1, s2))
      println(Vectors.sqdist(d1, d2))
      result:
      true
      93.0
      64.0

      More precisely, for the extra part, Vectors.sqdist would include it for sparse vectors and exclude it for dense vectors. I'll send a PR and we can have more detailed discussion there.

      Attachments

        Activity

          People

            yuhaoyan yuhao yang
            yuhaoyan yuhao yang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified