Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3190

Creation of large graph(> 2.15 B nodes) seems to be broken:possible overflow somewhere

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.3
    • Fix Version/s: 1.0.3, 1.1.0, 1.2.0, 1.3.2, 1.4.2, 1.5.0
    • Component/s: GraphX
    • Labels:
      None
    • Environment:

      Standalone mode running on EC2 . Using latest code from master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 .

      Description

      While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset > 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?).

      Here is some details of experiments I have done so far:
      1. Input: numNodes=6101995593 ; noEdges=12163784626
      Graph returns: numVertices=1807028297 ; numEdges=12163784626

      2. Input : numNodes=2157586441 ; noEdges=2747322705
      Graph Returns: numVertices=-2137380855 ; numEdges=2747322705

      3. Input: numNodes=1725060105 ; noEdges=204176821
      Graph: numVertices=1725060105 ; numEdges=2041768213

      You can find the code to generate this bug here:

      https://gist.github.com/npanj/92e949d86d08715bf4bf

      Note: Nodes are labeled are 1...6B .

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ankurd Ankur Dave
                Reporter:
                npanj npanj
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: