Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13714

Another ConnectedComponents based on Max-Degree Propagation

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • GraphX
    • None

    Description

      Current ConnectedComponents algorithm was based on Min-VertexId Propagation, which is sensitive to the place of Min-VertexId.
      While this implementation is based on Max-Degree Propagation.
      First, the degree graph is computed. And in the pregel progress, the vertex with the max degree in a CC is the start point of propagation.
      This new method has advantages over the old one:
      1, The convergence is only determined by the structs of CC, and is robust to the place of vertex with Min-ID.
      2, For spherical CCs in which there may be a concept like 'center', it can accelerate the convergence. For example, GraphGenerators.gridGraph(sc, 3, 3), the old CC need 4 supersteps, while the new one only need 2 supersteps.
      3, If we limit the number of iteration, the new method tend to generate more acceptable results.
      4, The output for each CC is the vertex with max degree in it, which may be more meaningful. And because the vertex-ID is nominal in most cases, the vertex with min-ID in a CC is somewhat meanless.

      But there are still two disadvantages:
      1,The message body grows, from (VID) to (VID, Degree). that is (Long) -> (Long, Int)
      2,For graph with simple CCs, it may be slower than old one. Because it need a extra degree computation.

      The api is the same like ConnectedComponents:

      val graph = ...
      val cc = graph.ConnectedComponentsWithDegree(100)
      or
      val cc = ConnectedComponentsWithDegree.run(graph, 100)

      Attachments

        Activity

          People

            Unassigned Unassigned
            podongfeng Ruifeng Zheng
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: