Giraph
  1. Giraph
  2. GIRAPH-549

Tinkerpop/Blueprints/Rexter InputFormat

    Details

    • Type: New Feature New Feature
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      A lot of people misunderstand a graphdb with a large-scale graph processing engine, and have asked us before how to run queries etc with Giraph. Nonetheless, giraph can be an OLAP for graphdbs. users having their data stored in their graphdb(s) might want to load their data into Giraph to run analytics.

      We can use the API layer offered by Tinkerpop/Blueprints (https://github.com/tinkerpop/blueprints/wiki) on top of many graphdbs. In particular, we can use the REST layer on top of Blueprints offered by Rexter (https://github.com/tinkerpop/rexster).

      Quite opportunistically, one option would be to use Faunus's (https://github.com/thinkaurelius/faunus) Rextex-based InputFormat for Mapreduce (https://github.com/thinkaurelius/faunus/tree/master/src/main/java/com/thinkaurelius/faunus/formats/rexster).

      The project is a Google Summer of Code 2013 project, and the specifications can be found on our wiki at: https://cwiki.apache.org/confluence/display/GIRAPH/2013#2013-2.Project%3AGiraphintegrationwithTinkerpop

      1. GIRAPH-549.patch
        80 kB
        Armando Miraglia

        Issue Links

          Activity

          Hide
          Nitay Joffe added a comment -

          This looks interesting. Have folks used Tinkerpop stuff with large distributed databases before? I am not familiar with the technology but from cursory browsing it looks cool.

          Show
          Nitay Joffe added a comment - This looks interesting. Have folks used Tinkerpop stuff with large distributed databases before? I am not familiar with the technology but from cursory browsing it looks cool.
          Hide
          Claudio Martella added a comment -

          They currently support:
          Neo4j
          OrientDB
          Dex
          ArangoDB
          FluxGraph
          InfiniteGraph
          MongoDB
          Oracle NoSQL

          Of these, only a few are really distributed/sharded, while most are replicated/HA.

          Show
          Claudio Martella added a comment - They currently support: Neo4j OrientDB Dex ArangoDB FluxGraph InfiniteGraph MongoDB Oracle NoSQL Of these, only a few are really distributed/sharded, while most are replicated/HA.
          Hide
          Claudio Martella added a comment -

          Guys, I was thinking about approaching the Tinkerpop community to see if anybody there is interested in contributing this one. Or, like the Nutch people did with their integration with us, about filing a GSoC? What do you think?

          Show
          Claudio Martella added a comment - Guys, I was thinking about approaching the Tinkerpop community to see if anybody there is interested in contributing this one. Or, like the Nutch people did with their integration with us, about filing a GSoC? What do you think?
          Hide
          Michael Aro added a comment -

          Hello, I am interested in working on this project. I know I am cutting it close, but I have expressed an interest in another Giraph project.

          Show
          Michael Aro added a comment - Hello, I am interested in working on this project. I know I am cutting it close, but I have expressed an interest in another Giraph project.
          Hide
          Claudio Martella added a comment -

          Great. Please add your proposal to the wiki.

          Show
          Claudio Martella added a comment - Great. Please add your proposal to the wiki.
          Hide
          Claudio Martella added a comment -

          Hi Michael, I see you posted a comment on the wiki. I'm not sure about the precise procedure for students, but shouldn't the proposal be filed also through google melange?

          Show
          Claudio Martella added a comment - Hi Michael, I see you posted a comment on the wiki. I'm not sure about the precise procedure for students, but shouldn't the proposal be filed also through google melange?
          Hide
          Michael Aro added a comment -

          Thanks Claudio.

          Show
          Michael Aro added a comment - Thanks Claudio.
          Hide
          Michael Aro added a comment -

          I am working on the proposal to be submitted through melange. It will be
          completed in 2 - 4 hours. I am trying to gather knowledge from different
          sources and come up with an approach to attack the problems.

          Show
          Michael Aro added a comment - I am working on the proposal to be submitted through melange. It will be completed in 2 - 4 hours. I am trying to gather knowledge from different sources and come up with an approach to attack the problems.
          Hide
          Claudio Martella added a comment -

          good!

          Show
          Claudio Martella added a comment - good!
          Hide
          Michael Aro added a comment -

          Hello,

          I have submitted applications for:

          [1] Giraph integration with Tinkerpop
          http://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/maro/3001

          [2] Giraph Implementation of Nutch LinkRank Algorithm
          http://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/maro/1

          Cheers, Mike.

          Show
          Michael Aro added a comment - Hello, I have submitted applications for: [1] Giraph integration with Tinkerpop http://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/maro/3001 [2] Giraph Implementation of Nutch LinkRank Algorithm http://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/maro/1 Cheers, Mike.
          Hide
          Armando Miraglia added a comment -

          HI all,

          after working on this issue for a while I am happy to provide you with the first version of the patch that implements this input format. The patch is missing tests which I am considering to write using a mock up approach used for other input formats.
          I have tested the Input Format with a small toy database using neo4j and Rexster. I tested the execution using hadoop 1.0.2.

          Cheers,
          Armando

          Show
          Armando Miraglia added a comment - HI all, after working on this issue for a while I am happy to provide you with the first version of the patch that implements this input format. The patch is missing tests which I am considering to write using a mock up approach used for other input formats. I have tested the Input Format with a small toy database using neo4j and Rexster. I tested the execution using hadoop 1.0.2. Cheers, Armando
          Hide
          Armando Miraglia added a comment -

          passed maven clean verify install on apache/giraph repository code.

          Show
          Armando Miraglia added a comment - passed maven clean verify install on apache/giraph repository code.
          Hide
          Claudio Martella added a comment -

          Really cool work Armando. Once the unit tests are set, this can go in.

          Show
          Claudio Martella added a comment - Really cool work Armando. Once the unit tests are set, this can go in.
          Hide
          Armando Miraglia added a comment -

          Hi guys,

          I finally managed to:
          (a) add the support for gremlin scripts;
          (b) provide a set of test cases: I am testing the management of an empty db, a test db and querying Rexster using a very simple gremlin script;
          (c) a documentation page in which I explain what is available and how the API can be used. A compiled example of the page is available at: http://www.slashzero.org/giraph/rexster.html

          If you have any remarks just let me know.

          The patch successfully passed 'mvn clean verify'.

          Cheers,
          A.

          Show
          Armando Miraglia added a comment - Hi guys, I finally managed to: (a) add the support for gremlin scripts; (b) provide a set of test cases: I am testing the management of an empty db, a test db and querying Rexster using a very simple gremlin script; (c) a documentation page in which I explain what is available and how the API can be used. A compiled example of the page is available at: http://www.slashzero.org/giraph/rexster.html If you have any remarks just let me know. The patch successfully passed 'mvn clean verify'. Cheers, A.
          Hide
          Claudio Martella added a comment -

          this is cool! Can you open a reviewboard? https://reviews.apache.org/groups/giraph/

          Show
          Claudio Martella added a comment - this is cool! Can you open a reviewboard? https://reviews.apache.org/groups/giraph/
          Hide
          Armando Miraglia added a comment -

          passed mvn clean verify

          Show
          Armando Miraglia added a comment - passed mvn clean verify
          Hide
          Claudio Martella added a comment -

          committed. thanks.

          Show
          Claudio Martella added a comment - committed. thanks.

            People

            • Assignee:
              Unassigned
              Reporter:
              Claudio Martella
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development