Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1188

GraphX triplets not working properly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.0
    • 0.9.2, 1.0.0
    • GraphX
    • None

    Description

      I followed the GraphX tutorial at http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx.html

      on a local stand-alone cluster (Spark version 0.9.0) with two workers. Somehow, the graph.triplets is not returning what it should – only Eds and Frans.

      ```
      scala> graph.edges.toArray
      14/03/04 16:15:57 INFO SparkContext: Starting job: collect at EdgeRDD.scala:51
      14/03/04 16:15:57 INFO DAGScheduler: Got job 5 (collect at EdgeRDD.scala:51) with 1 output partitions (allowLocal=false)
      14/03/04 16:15:57 INFO DAGScheduler: Final stage: Stage 27 (collect at EdgeRDD.scala:51)
      14/03/04 16:15:57 INFO DAGScheduler: Parents of final stage: List()
      14/03/04 16:15:57 INFO DAGScheduler: Missing parents: List()
      14/03/04 16:15:57 INFO DAGScheduler: Submitting Stage 27 (MappedRDD[36] at map at EdgeRDD.scala:51), which has no missing parents
      14/03/04 16:15:57 INFO DAGScheduler: Submitting 1 missing tasks from Stage 27 (MappedRDD[36] at map at EdgeRDD.scala:51)
      14/03/04 16:15:57 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks
      14/03/04 16:15:57 INFO TaskSetManager: Starting task 27.0:0 as TID 11 on executor localhost: localhost (PROCESS_LOCAL)
      14/03/04 16:15:57 INFO TaskSetManager: Serialized task 27.0:0 as 2068 bytes in 1 ms
      14/03/04 16:15:57 INFO Executor: Running task ID 11
      14/03/04 16:15:57 INFO BlockManager: Found block rdd_2_0 locally
      14/03/04 16:15:57 INFO Executor: Serialized size of result for 11 is 936
      14/03/04 16:15:57 INFO Executor: Sending result for 11 directly to driver
      14/03/04 16:15:57 INFO Executor: Finished task ID 11
      14/03/04 16:15:57 INFO TaskSetManager: Finished TID 11 in 13 ms on localhost (progress: 0/1)
      14/03/04 16:15:57 INFO DAGScheduler: Completed ResultTask(27, 0)
      14/03/04 16:15:57 INFO TaskSchedulerImpl: Remove TaskSet 27.0 from pool
      14/03/04 16:15:57 INFO DAGScheduler: Stage 27 (collect at EdgeRDD.scala:51) finished in 0.015 s
      14/03/04 16:15:57 INFO SparkContext: Job finished: collect at EdgeRDD.scala:51, took 0.023602266 s
      res7: Array[org.apache.spark.graphx.Edge[Int]] = Array(Edge(2,1,7), Edge(2,4,2), Edge(3,2,4), Edge(3,6,3), Edge(4,1,1), Edge(5,2,2), Edge(5,3,8), Edge(5,6,3))

      scala> graph.vertices.toArray
      14/03/04 16:16:18 INFO SparkContext: Starting job: toArray at <console>:27
      14/03/04 16:16:18 INFO DAGScheduler: Got job 6 (toArray at <console>:27) with 1 output partitions (allowLocal=false)
      14/03/04 16:16:18 INFO DAGScheduler: Final stage: Stage 28 (toArray at <console>:27)
      14/03/04 16:16:18 INFO DAGScheduler: Parents of final stage: List(Stage 32, Stage 29)
      14/03/04 16:16:18 INFO DAGScheduler: Missing parents: List()
      14/03/04 16:16:18 INFO DAGScheduler: Submitting Stage 28 (VertexRDD[15] at RDD at VertexRDD.scala:52), which has no missing parents
      14/03/04 16:16:18 INFO DAGScheduler: Submitting 1 missing tasks from Stage 28 (VertexRDD[15] at RDD at VertexRDD.scala:52)
      14/03/04 16:16:18 INFO TaskSchedulerImpl: Adding task set 28.0 with 1 tasks
      14/03/04 16:16:18 INFO TaskSetManager: Starting task 28.0:0 as TID 12 on executor localhost: localhost (PROCESS_LOCAL)
      14/03/04 16:16:18 INFO TaskSetManager: Serialized task 28.0:0 as 2426 bytes in 0 ms
      14/03/04 16:16:18 INFO Executor: Running task ID 12
      14/03/04 16:16:18 INFO BlockManager: Found block rdd_14_0 locally
      14/03/04 16:16:18 INFO Executor: Serialized size of result for 12 is 947
      14/03/04 16:16:18 INFO Executor: Sending result for 12 directly to driver
      14/03/04 16:16:18 INFO Executor: Finished task ID 12
      14/03/04 16:16:18 INFO TaskSetManager: Finished TID 12 in 13 ms on localhost (progress: 0/1)
      14/03/04 16:16:18 INFO DAGScheduler: Completed ResultTask(28, 0)
      14/03/04 16:16:18 INFO TaskSchedulerImpl: Remove TaskSet 28.0 from pool
      14/03/04 16:16:18 INFO DAGScheduler: Stage 28 (toArray at <console>:27) finished in 0.015 s
      14/03/04 16:16:18 INFO SparkContext: Job finished: toArray at <console>:27, took 0.027839851 s
      res9: Array[(org.apache.spark.graphx.VertexId, (String, Int))] = Array((4,(David,42)), (2,(Bob,27)), (6,(Fran,50)), (5,(Ed,55)), (3,(Charlie,65)), (1,(Alice,28)))

      scala> graph.triplets.toArray
      14/03/04 16:16:30 INFO SparkContext: Starting job: toArray at <console>:27
      14/03/04 16:16:30 INFO DAGScheduler: Got job 7 (toArray at <console>:27) with 1 output partitions (allowLocal=false)
      14/03/04 16:16:31 INFO DAGScheduler: Final stage: Stage 33 (toArray at <console>:27)
      14/03/04 16:16:31 INFO DAGScheduler: Parents of final stage: List(Stage 34)
      14/03/04 16:16:31 INFO DAGScheduler: Missing parents: List()
      14/03/04 16:16:31 INFO DAGScheduler: Submitting Stage 33 (ZippedPartitionsRDD2[32] at zipPartitions at GraphImpl.scala:60), which has no missing parents
      14/03/04 16:16:31 INFO DAGScheduler: Submitting 1 missing tasks from Stage 33 (ZippedPartitionsRDD2[32] at zipPartitions at GraphImpl.scala:60)
      14/03/04 16:16:31 INFO TaskSchedulerImpl: Adding task set 33.0 with 1 tasks
      14/03/04 16:16:31 INFO TaskSetManager: Starting task 33.0:0 as TID 13 on executor localhost: localhost (PROCESS_LOCAL)
      14/03/04 16:16:31 INFO TaskSetManager: Serialized task 33.0:0 as 3322 bytes in 1 ms
      14/03/04 16:16:31 INFO Executor: Running task ID 13
      14/03/04 16:16:31 INFO BlockManager: Found block rdd_2_0 locally
      14/03/04 16:16:31 INFO BlockManager: Found block rdd_31_0 locally
      14/03/04 16:16:31 INFO Executor: Serialized size of result for 13 is 931
      14/03/04 16:16:31 INFO Executor: Sending result for 13 directly to driver
      14/03/04 16:16:31 INFO Executor: Finished task ID 13
      14/03/04 16:16:31 INFO TaskSetManager: Finished TID 13 in 17 ms on localhost (progress: 0/1)
      14/03/04 16:16:31 INFO DAGScheduler: Completed ResultTask(33, 0)
      14/03/04 16:16:31 INFO TaskSchedulerImpl: Remove TaskSet 33.0 from pool
      14/03/04 16:16:31 INFO DAGScheduler: Stage 33 (toArray at <console>:27) finished in 0.019 s
      14/03/04 16:16:31 INFO SparkContext: Job finished: toArray at <console>:27, took 0.037909394 s
      res10: Array[org.apache.spark.graphx.EdgeTriplet[(String, Int),Int]] = Array(((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3))
      ```

      Attachments

        Issue Links

          Activity

            People

              darabos Daniel Darabos
              k0alak0der Kev Alan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: