Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4823

rowSimilarities

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Auto Closed
    • None
    • None
    • MLlib

    Description

      RowMatrix has a columnSimilarities method to find cosine similarities between columns.

      A rowSimilarities method would be useful to find similarities between rows.

      This is JIRA is to investigate which algorithms are suitable for such a method, better than brute-forcing it. Note that when there are many rows (> 10^6), it is unlikely that brute-force will be feasible, since the output will be of order 10^12.

      Attachments

        1. SparkMeetup2015-Experiments2.pdf
          56 kB
          Debasish Das
        2. SparkMeetup2015-Experiments1.pdf
          64 kB
          Debasish Das
        3. MovieLensSimilarity Comparisons.pdf
          93 kB
          Debasish Das

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rezazadeh Reza Zadeh
              Votes:
              5 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: