Uploaded image for project: 'Giraph (Retired)'
  1. Giraph (Retired)
  2. GIRAPH-356

Improve ZooKeeper issues

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      Currently, if the ZooKeeper process fails, we have little information on why and what happened. This patch addresses this by keeping the last 100 log lines and dumps when the map fails under a RuntimeException.

      Here is an example of a master task failure when there is an invalid JVM argument passed to ZooKeeper. The error is much for obvious now.

      2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
      2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized option: -BadOpt
      2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could not create the Java virtual machine.
      2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
      2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
      java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 5 tries!
      at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
      at org.apache.hadoop.mapred.Child.main(Child.java:253)
      Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 5 tries!
      at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
      at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
      at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
      ... 7 more
      2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

      Attachments

        1. GIRAPH-356.2.patch
          21 kB
          Avery Ching
        2. GIRAPH-356.patch
          15 kB
          Avery Ching

        Activity

          People

            aching Avery Ching
            aching Avery Ching
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: