Uploaded image for project: 'Giraph (Retired)'
  1. Giraph (Retired)
  2. GIRAPH-72

Running multiple Giraph jobs on the same cluster can lead to port collisions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.1.0
    • None
    • lib, zookeeper
    • None
    • production hadoop cluster, in-process ZK.

    Description

      Had a Giraph mini-hackathon at work today, and lots of us launched simultaneous test jobs at the same time, and often ran into the following collision:

      ------
      startSuperstep: WORKER_ONLY - Attempt=0, Superstep=-1
      2-Nov-2011 23:40:08

      java.net.BindException: Problem binding to <hostname>/<hostIP>:30000 : Address already in use
      at org.apache.hadoop.ipc.Server.bind(Server.java:196)
      at org.apache.hadoop.ipc.Server$Listener.(Server.java:259)
      at org.apache.hadoop.ipc.Server.(Server.java:1039)
      at org.apache.hadoop.ipc.RPC$Server.(RPC.java:492)
      at org.apache.hadoop.ipc.RPC.getServer(RPC.java:454)
      at org.apache.giraph.comm.RPCCommunications.getRPCServer(RPCCommunications.java:99)
      at org.apache.giraph.comm.BasicRPCCommunications.(BasicRPCCommunications.java:362)
      at org.apache.giraph.comm.RPCCommunications.(RPCCommunications.java:71)
      at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:570)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
      at org.apache.hadoop.mapred.Child.main(Child.java:170)
      Caused by: java.net.BindException: Address already in use
      at sun.nio.ch.Net.bind(Native Method)
      at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
      at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
      at org.apache.hadoop.ipc.Server.bind(Server.java:194)
      ... 12 more


      The job then simply hung. What it should do, I'd imagine, is at a bare minimum, catch this exception and allow the task to die quickly so it can get retried on another machine, or better yet, allow for a command-line arg at startup (and then passed into the Configuration) decide what ports to use. Best yet, something automagic which allows multiple GraphMappers on the same machine without manually picking ports (pick one at random and store it in zookeeper? but then what about the in-process zookeeper...)

      Attachments

        Activity

          People

            Unassigned Unassigned
            jake.mannix Jake Mannix
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: