Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.1.0, 3.3.2
-
None
Description
We are experiencing inconsistent output from the label propagation algorithm in Graph X. When we run the algorithm on the same input, we observe different outputs each time. This behavior is unexpected since the algorithm is not designed to be random, and we should be getting the same output for the same input.
We suspect that the issue lies in the tie-breaking logic used by the vertexProgram when picking labels. Currently, the vertexProgram chooses the label with the maximum frequency, and in case of a tie, it selects the label that appears first. This logic does not handle tie cases correctly, resulting in different outputs for the same input.
def vertexProgram(vid: VertexId, attr: Long, message: Map[VertexId, Long]): VertexId = { if (message.isEmpty) attr else message.maxBy(_._2)._1 }
To solve this issue, we propose changing the tie-breaking logic to something like vertex ID. This change will ensure that the same label is always selected in case of a tie, resulting in consistent output from the algorithm for the same input.
def vertexProgram(vid: VertexId, attr: Long, message: Map[VertexId, Long]): VertexId = { if (message.isEmpty) attr else message.maxBy{ case (key, value) => (value, key) }._1 }
Thanks!
Attachments
Attachments
Issue Links
- links to