Uploaded image for project: 'Giraph'
  1. Giraph
  2. GIRAPH-970

Missing chosen workers on superstep -1

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: bsp
    • Labels:
      None
    • Environment:

      Linux version 3.13.0-37-generic (buildd@kapok) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) 64 bit
      Hadoop 1.2.1

      Description

      I found a problem with Giraph 1.1.0 while trying to execute the ShortestPathComputation example.

      This is the command given:
      $HADOOP_HOME/bin/hadoop jar ~/git/giraph_patched/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /users/hadoop/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /users/hadoop/output/shortestpath -w 1

      And there is the output:
      #################################

      Warning: $HADOOP_HOME is deprecated.

      14/12/15 12:07:36 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
      14/12/15 12:07:36 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
      14/12/15 12:07:36 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
      14/12/15 12:07:38 INFO job.GiraphJob: Tracking URL: http://VirtualMINT-H023:50030/jobdetails.jsp?jobid=job_201412151205_0001
      14/12/15 12:07:38 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers
      14/12/15 12:08:51 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer virtualmint-h023:22181 --zkNode /_hadoopBsp/job_201412151205_0001/_haltComputation'
      14/12/15 12:08:51 INFO mapred.JobClient: Running job: job_201412151205_0001
      14/12/15 12:08:52 INFO mapred.JobClient: map 100% reduce 0%

      ################################

      The computation hangs here until the timeout is reached. Here is what I found while reading the first worker log.

      2014-12-15 12:12:16,303 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
      2014-12-15 12:12:16,314 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
      2014-12-15 12:12:16,332 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
      2014-12-15 12:12:16,341 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
      2014-12-15 12:12:16,344 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
      2014-12-15 12:12:16,373 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201412151211_0001/_vertexInputSplitDoneDir
      2014-12-15 12:12:16,375 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [virtualmint-h023_1]
      2014-12-15 12:12:16,393 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
      2014-12-15 12:12:16,464 ERROR org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Missing chosen workers [Worker(hostname=virtualmint-h023, MRtaskID=1, port=30001)] on superstep -1
      2014-12-15 12:12:16,464 ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with IllegalStateException
      java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported)
      at org.apache.giraph.master.BspServiceMaster.coordinateInputSplits(BspServiceMaster.java:1489)
      at org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1656)
      at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)
      2014-12-15 12:12:16,464 FATAL org.apache.giraph.graph.GraphTaskManager: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported), exiting...
      java.lang.IllegalStateException: java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported)
      at org.apache.giraph.master.MasterThread.run(MasterThread.java:194)
      Caused by: java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported)
      at org.apache.giraph.master.BspServiceMaster.coordinateInputSplits(BspServiceMaster.java:1489)
      at org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1656)
      at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)
      2014-12-15 12:12:16,464 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.

      ################################

      Computation does not even get to first superstep. Giraph cannot find the worker. Giraph-904 patch applied to BspServiceMaster.

      I am running the Hadoop 1.2.1 on a single machine with the configuration suggested in the Giraph Quick Start guide. Hadoop itself works fine (tested with wordcount example).

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Alessio Alessio Arleo
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: