Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19124

GraphX PageRank execution time

    XMLWordPrintableJSON

Details

    Description

      Hello, I don't know if I'm writing in the right place but if anyone can help me that would be great.

      I've to run PageRank on a really big graph, 400 million edges, 12 million vertices (Wikipedia's graph) but It raises an execution time problem: after 10+ iteration of the algorithm the execution time raises abnormally from 10 mins per iteration to dozens of hours: https://d.pr/svBR.

      My code is really simple and it's taken directly from GraphX documentation.

      The machine used has two CPU Intel Xeon E5-2697 v3, 64GB of RAM and 500GB hard disk and it runs Windows Server 2012 R2 Standard.

      I allocated 8 cores and 50 GB of RAM to Spark invoking the Spark-Shell from the command line.

      What could the problem be?

      Thanks for any help!

      Attachments

        Activity

          People

            Unassigned Unassigned
            AmenRa666 Elias Bassani
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: