Uploaded image for project: 'Giraph'
  1. Giraph
  2. GIRAPH-970

Missing chosen workers on superstep -1

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: bsp
    • Labels:
      None
    • Environment:

      Linux version 3.13.0-37-generic (buildd@kapok) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) 64 bit
      Hadoop 1.2.1

      Description

      I found a problem with Giraph 1.1.0 while trying to execute the ShortestPathComputation example.

      This is the command given:
      $HADOOP_HOME/bin/hadoop jar ~/git/giraph_patched/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /users/hadoop/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /users/hadoop/output/shortestpath -w 1

      And there is the output:
      #################################

      Warning: $HADOOP_HOME is deprecated.

      14/12/15 12:07:36 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
      14/12/15 12:07:36 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
      14/12/15 12:07:36 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
      14/12/15 12:07:38 INFO job.GiraphJob: Tracking URL: http://VirtualMINT-H023:50030/jobdetails.jsp?jobid=job_201412151205_0001
      14/12/15 12:07:38 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers
      14/12/15 12:08:51 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer virtualmint-h023:22181 --zkNode /_hadoopBsp/job_201412151205_0001/_haltComputation'
      14/12/15 12:08:51 INFO mapred.JobClient: Running job: job_201412151205_0001
      14/12/15 12:08:52 INFO mapred.JobClient: map 100% reduce 0%

      ################################

      The computation hangs here until the timeout is reached. Here is what I found while reading the first worker log.

      2014-12-15 12:12:16,303 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
      2014-12-15 12:12:16,314 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
      2014-12-15 12:12:16,332 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
      2014-12-15 12:12:16,341 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
      2014-12-15 12:12:16,344 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
      2014-12-15 12:12:16,373 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201412151211_0001/_vertexInputSplitDoneDir
      2014-12-15 12:12:16,375 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [virtualmint-h023_1]
      2014-12-15 12:12:16,393 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
      2014-12-15 12:12:16,464 ERROR org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Missing chosen workers [Worker(hostname=virtualmint-h023, MRtaskID=1, port=30001)] on superstep -1
      2014-12-15 12:12:16,464 ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with IllegalStateException
      java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported)
      at org.apache.giraph.master.BspServiceMaster.coordinateInputSplits(BspServiceMaster.java:1489)
      at org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1656)
      at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)
      2014-12-15 12:12:16,464 FATAL org.apache.giraph.graph.GraphTaskManager: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported), exiting...
      java.lang.IllegalStateException: java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported)
      at org.apache.giraph.master.MasterThread.run(MasterThread.java:194)
      Caused by: java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported)
      at org.apache.giraph.master.BspServiceMaster.coordinateInputSplits(BspServiceMaster.java:1489)
      at org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1656)
      at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)
      2014-12-15 12:12:16,464 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.

      ################################

      Computation does not even get to first superstep. Giraph cannot find the worker. Giraph-904 patch applied to BspServiceMaster.

      I am running the Hadoop 1.2.1 on a single machine with the configuration suggested in the Giraph Quick Start guide. Hadoop itself works fine (tested with wordcount example).

        Activity

        Hide
        Alessio Alessio Arleo added a comment -

        In my case the error was related to Giraph-904 bug (https://issues.apache.org/jira/browse/GIRAPH-904). In fact, in the first line* the system logs that there are missing workers. In this case, my hostname was made up by both lowercase and uppercase letters, while the system reports my hostname with only lowecase letters. I managed to solve my issue by using a hostname made up by only lowercase chars. Anyway, this is not a solution, and must be solved going to the root of the problem. I do not know if this issue has been solved in 1.2.0, but for sure it is an open issue in Giraph 1.1.0. I'll investigate further and try to solve the issue, but help is appreciated.

        *"Missing chosen workers [Worker(hostname=virtualmint-h023, MRtaskID=1, port=30001)] on superstep -1"

        Show
        Alessio Alessio Arleo added a comment - In my case the error was related to Giraph-904 bug ( https://issues.apache.org/jira/browse/GIRAPH-904 ). In fact, in the first line* the system logs that there are missing workers. In this case, my hostname was made up by both lowercase and uppercase letters, while the system reports my hostname with only lowecase letters. I managed to solve my issue by using a hostname made up by only lowercase chars. Anyway, this is not a solution, and must be solved going to the root of the problem. I do not know if this issue has been solved in 1.2.0, but for sure it is an open issue in Giraph 1.1.0. I'll investigate further and try to solve the issue, but help is appreciated. *"Missing chosen workers [Worker(hostname=virtualmint-h023, MRtaskID=1, port=30001)] on superstep -1"

          People

          • Assignee:
            Unassigned
            Reporter:
            Alessio Alessio Arleo
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development