Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
Currently, if the ZooKeeper process fails, we have little information on why and what happened. This patch addresses this by keeping the last 100 log lines and dumps when the map fails under a RuntimeException.
Here is an example of a master task failure when there is an invalid JVM argument passed to ZooKeeper. The error is much for obvious now.
2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized option: -BadOpt
2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could not create the Java virtual machine.
2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 5 tries!
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 5 tries!
at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
... 7 more
2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task