Uploaded image for project: 'Giraph (Retired)'
  1. Giraph (Retired)
  2. GIRAPH-53

Unable to read additional data from server session, likely server has closed socket

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      I've got an error recently. Every thing goes well till it comes to the 103rd superstep.

      2011-10-14 16:23:38,904 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep
      2011-10-14 16:23:39,018 WARN org.apache.giraph.graph.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/101/_vertexRangeAssignments, type=NodeDeleted, state=SyncConnected)
      2011-10-14 16:23:39,057 INFO org.apache.giraph.graph.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=103 with /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_workerHealthyDir/locker-desktop_1 and hostnamePort = ["locker-desktop",30001]
      2011-10-14 16:23:39,057 WARN org.apache.giraph.graph.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/101/_superstepFinished, type=NodeDeleted, state=SyncConnected)
      2011-10-14 16:23:39,529 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x1330186cff30001, likely server has closed socket, closing socket connection and attempting reconnect
      2011-10-14 16:23:39,630 ERROR org.apache.zookeeper.ClientCnxn: Error while calling watcher
      java.lang.RuntimeException: process: Disconnected from ZooKeeper, cannot recover.
      at org.apache.giraph.graph.BspService.process(BspService.java:995)
      at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
      2011-10-14 16:23:41,098 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server locker-desktop/10.13.30.90:22181
      2011-10-14 16:23:41,099 WARN org.apache.zookeeper.ClientCnxn: Session 0x1330186cff30001 for server null, unexpected error, closing socket connection and attempting reconnect
      java.net.ConnectException: Connection refused
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
      at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
      2011-10-14 16:23:41,212 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
      2011-10-14 16:23:41,306 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
      2011-10-14 16:23:41,307 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName dic for UID 1001 from the native implementation
      2011-10-14 16:23:41,318 WARN org.apache.hadoop.mapred.Child: Error running child
      java.lang.RuntimeException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_vertexRangeAssignments
      at org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:836)
      at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:551)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
      at org.apache.hadoop.mapred.Child.main(Child.java:253)
      Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_vertexRangeAssignments
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
      at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
      at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
      at org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:830)
      ... 9 more
      I dont know whether it should be called a bug or not. Wait for some help, thx...

      Attachments

        Activity

          People

            Unassigned Unassigned
            locker locker
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: