Uploaded image for project: 'Giraph'
  1. Giraph
  2. GIRAPH-850

Improve internal zookeeper launching



    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: zookeeper
    • Labels:


      With the most up to date trunk, internal zookeeper launching only appears to work with Hadoop 1.x.x MR1.

      With Hadoop 2.x.x MR2, trying to run a job without specifying an external zookeeper location results in a failed job with the following in the logs:

      2014-02-12 17:30:30,281 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-1.7.0-openjdk-, -Xmx512m, -XX:ParallelGCThr
      eads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, /tmp/hadoop-yarn/staging/b.ajf/.staging/job_1392221733726_0002/job.jar, org.apache.zookeeper.server.quorum.QuorumPeerMain, /tmp/hadoop-b
      .ajf/nm-local-dir/usercache/b.ajf/appcache/application_1392221733726_0002/work/_bspZooKeeper/zoo.cfg] in directory /tmp/hadoop-b.ajf/nm-local-dir/usercache/b.ajf/appcache/application_1392221733726_0002/work/_bspZooKeeper
      2014-02-12 17:30:30,285 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to igraph-02.hi.inet:22181 with poll msecs = 3000
      2014-02-12 17:30:30,289 WARN [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
      java.net.ConnectException: Connection refused
      2014-02-12 17:30:30,413 INFO [org.apache.giraph.zk.ZooKeeperManager$StreamCollector] org.apache.giraph.zk.ZooKeeperManager$StreamCollector: readLines: Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain

      It clearly is unable to launch Zookeeper as it can't find the necessary class in the classpath. Looking at the command with which it tries to launch Zookeeper, we can see that it has specified a classpath of:

      -cp, /tmp/hadoop/yarn/staging/b.ajf/.staging/job_1392221733726_0002/job.jar

      which is a HDFS location.

      It seems that with Hadoop 2.x.x, the function Job.getJar() returns a HDFS path to the jar instead of the path to the local copy of the jar in the DirectoryCache. Hadoop 1.x.x appears to return a correct path as I didn't detect any problem there.

      The whole logic of finding the Zookeeper classpath seems extremely convoluted to me (not to mention broken as just shown for both MR2 and YARN). Since the currently running Java process has to have the zookeeper classes in its classpath anyway (because some of the classes in Giraph refer to Zookeeper classes), wouldn't it make more sense to just have the child java process starting Zookeeper simply inherit the classpath?


        1. GIRAPH-850.patch
          6 kB
          Alexandre Fonseca
        2. GIRAPH-850-2.patch
          9 kB
          Alexandre Fonseca



            • Assignee:
              AlexJF Alexandre Fonseca
            • Votes:
              1 Vote for this issue
              6 Start watching this issue


              • Created: