Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42856

Inconsistent output from label propagation algorithm in Graph X due to tie-breaking logic in vertexProgram

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1.0, 3.3.2
    • None
    • GraphX

    Description

      We are experiencing inconsistent output from the label propagation algorithm in Graph X. When we run the algorithm on the same input, we observe different outputs each time. This behavior is unexpected since the algorithm is not designed to be random, and we should be getting the same output for the same input.

      We suspect that the issue lies in the tie-breaking logic used by the vertexProgram when picking labels. Currently, the vertexProgram chooses the label with the maximum frequency, and in case of a tie, it selects the label that appears first. This logic does not handle tie cases correctly, resulting in different outputs for the same input.

      def vertexProgram(vid: VertexId, attr: Long, message: Map[VertexId, Long]): VertexId = {  
          if (message.isEmpty) attr else message.maxBy(_._2)._1
      }
      

      To solve this issue, we propose changing the tie-breaking logic to something like vertex ID. This change will ensure that the same label is always selected in case of a tie, resulting in consistent output from the algorithm for the same input.

      def vertexProgram(vid: VertexId, attr: Long, message: Map[VertexId, Long]): VertexId = {
          if (message.isEmpty) attr else message.maxBy{ case (key, value) => (value, key) }._1
      }
      

      Thanks!

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            m.haghpanah Mohammadreza Haghpanah

            Dates

              Created:
              Updated:

              Slack

                Issue deployment