Details
Description
While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset > 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?).
Here is some details of experiments I have done so far:
1. Input: numNodes=6101995593 ; noEdges=12163784626
Graph returns: numVertices=1807028297 ; numEdges=12163784626
2. Input : numNodes=2157586441 ; noEdges=2747322705
Graph Returns: numVertices=-2137380855 ; numEdges=2747322705
3. Input: numNodes=1725060105 ; noEdges=204176821
Graph: numVertices=1725060105 ; numEdges=2041768213
You can find the code to generate this bug here:
https://gist.github.com/npanj/92e949d86d08715bf4bf
Note: Nodes are labeled are 1...6B .
Attachments
Issue Links
- is duplicated by
-
SPARK-10228 Integer overflow in VertexRDDImpl.count
- Resolved
- links to