Uploaded image for project: 'Giraph (Retired)'
  1. Giraph (Retired)
  2. GIRAPH-920

Dynamic snapshot control via Zookeeper

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.1.0
    • None
    • bsp

    Description

      Gephi is great for showing (even time dependent) graphs, and using the Gephi-Hadoop-Connector such time dependent graphs can be imported into Gephi from Hadoop via a set of node- & edge-list queries against Hive or Impala. This helps a lot for debugging and showing the properties of an algorithm.

      Starting with an existing Giraph algorithm in which "Snapshots" are used to store the state of the graph from time to time using the built in feature from Giraph.
      To this we add a "hook", which allows a kind of "turn on" or "turn off" feature (just switching a flag which tells the algorithm to do a snapshot or even not during a superstep) ... the request goes from a client to Zookeeper which registers all snapshot requests and all snapshotable jobs.

      We use the tool gctrl tool, which has to be created.

      A command line call looks like this.

      gctrl enableSnap $jobID $step0 $stepDist

      gctrl : the tool to interact with a Giraph job via zookeeper

      enableSnap : command to turn dynamic snapshotting on
      disableSnap : command to turn of dynamic snapshotting
      listSnap : shows all running jobs, which are registered with the "snapshot feature"

      $jobID : the id of a Girpah job
      $step0 : first or next superstep, which finishes with a snapshot
      $stepDist : steps without a snapshot

      A basic structure for the Zookeeper stuff is ready (inspired by the Zookeeper book). We have to change the GiraphJob a bit. We introduce a helper class into which all snapshot controle things are delegated.

      Use Case:

      If snapshots are enabled, the state of the current graph is implicitly dumped to HDFS in a way which allows Hive / Impala queries.
      Therefore the Tables are prepared and all snapshots build partitions within that table. This allows us to show the graph in Gephi and we can do
      dynamic inspection outside of Giraph. In a long running job, one can step from one superstep to the next to study the behaviour at the critical point e.g.
      in the range around a phase transition.

      Expected results:

      a) the patch which has the code for Giraph
      b) a demo to present the feature, especially to show how to debug algorithms on scale, using a new algorithm, which is still in research.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kamir1604 Mirko Kaempf

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 504h
                504h
                Remaining:
                Remaining Estimate - 504h
                504h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment