Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24217

Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Done
    • 2.4.0
    • None
    • ML
    • None

    Description

      We should display prediction and id corresponding to all the nodes.  Currently PIC is not returning the cluster indices of neighbour IDs which are not there in the ID column.

      As per the definition of PIC clustering, given in the code,

      PIC takes an affinity matrix between items (or vertices) as input. An affinity matrix
      is a symmetric matrix whose entries are non-negative similarities between items.
      PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each input row includes:

      • idCol: vertex ID
      • neighborsCol: neighbors of vertex in idCol
      • similaritiesCol: non-negative weights (similarities) of edges between the vertex
        in idCol and each neighbor in neighborsCol
      • "PIC returns a cluster assignment for each input vertex." It appends a new column predictionCol
        containing the cluster assignment in [0,k) for each row (vertex).

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              shahid shahid
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: