Uploaded image for project: 'Giraph'
  1. Giraph
  2. GIRAPH-170

Workflow for loading RDF graph data into Giraph

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      W3C RDF provides a family of Web standards for exchanging graph-based data. RDF uses sets of simple binary relationships, labeling nodes and links with Web identifiers (URIs). Many public datasets are available as RDF, including the "Linked Data" cloud (see http://richard.cyganiak.de/2007/10/lod/ ). Many such datasets are listed at http://thedatahub.org/

      RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple line-oriented format is N-Triples. A format aligned with RDF's SPARQL query language is Turtle. Apache Jena and Any23 provide software to handle all these; http://incubator.apache.org/jena/ http://incubator.apache.org/any23/

      This JIRA leaves open the strategy for loading RDF data into Giraph. There are various possibilites, including exploitation of intermediate Hadoop-friendly stores, or pre-processing with e.g. Pig-based tools into a more Giraph-friendly form, or writing custom loaders. Even a HOWTO document or implementor notes here would be an advance on the current state of the art. The BluePrints Graph API (Gremlin etc.) has also been aligned with various RDF datasources.

      Related topics: multigraphs https://issues.apache.org/jira/browse/GIRAPH-141 touches on the issue (since we can't currently easily represent fully general RDF graphs since two nodes might be connected by more than one typed edge). Even without multigraphs it ought to be possible to bring RDF-sourced data
      into Giraph, e.g. perhaps some app is only interested in say the Movies + People subset of a big RDF collection.

      From Avery in email: "a helper VertexInputFormat (and maybe VertexOutputFormat) would certainly [despite GIRAPH-141] still help"

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                danbri Dan Brickley
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: