Uploaded image for project: 'Rya'
  1. Rya
  2. RYA-38

Implement SPARQL queries over multiple Rya instances

    Details

    • Type: Wish
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.10
    • Fix Version/s: None
    • Component/s: None
    • Environment:
      Unix, Java

      Description

      Rya is a scalable RDF triple store and supports SPARQL. Consider the scenario where collaboration is desired/required between multiple Rya instances. Each instance would store its own data, but we are interested in performing global search (SPARQL query), over the data available in the union of all instances. Implement the operations needed at each instance in order to support such a global search. Implement a basic federated coordinator that would produce the final answer to the global query.

      Sub-tasks that need to be implemented:

      -Implement a federation coordinator: the federation coordinator will have to:
      1. parse the SPARQL query; openRDF provides utilities for that, and Rya already has this implemented - should be able to re-use existing code
      2. generate a federated query execution plan which includes source selection (where each triple pattern / sub-graph should be evaluated). OpenRDF has utilities to generate a query plan for a SPARQL query (which Rya currently uses), and might have utilities to generate a federated query plan. For the initial implementation for source selection we can assume either that a global catalog exists, or that SPARQL ASK queries can be sent to the federation members to decide the source for each triple pattern in the SPARQL query.
      3. query plan optimization: this part is optional for the initial implementation. If time permits, cardinality-based optimizations should be considered, as well as different query plans (for example bushy plans vs. left-deep plans)
      4. query plan execution: implement at least one join algorithm to be able to combine the information received from the federated sources. Rya already implements join algorithms, so the code might be re-used

      -Implement a mediator layer for Rya to mediate between the coordinator and the local instance of Rya. This layer should provide functionality to receive a SPARQL query, or triple pattern, possibly with bindings, and provide an answer. As Rya already provides this functionality, the mediator layer might only need to expose the functionality and communicate with the federation coordinator

      -Implement federated inference: this part is likely outside of the scope of this project and requires additional research

        Activity

        Hide
        tulrich Ulrich Tchuenkam added a comment -

        Hi Adina,

        I am a 4th year Software Engineering Student from Cameroon, the description of the above project captivated my mind and I wish you give me some materials that i can go through to have a better understanding of what the project is all about.

        For the main time I am very familiar with the working environment needed for this project ie Unix and Java and I am very willing and ready to learn to catchup with the other technologies that are needed for the success of this project. From now to the moment where coding starts for GSOC I will be in phase with the technologies needed for this project.

        Thanks,

        Show
        tulrich Ulrich Tchuenkam added a comment - Hi Adina, I am a 4th year Software Engineering Student from Cameroon, the description of the above project captivated my mind and I wish you give me some materials that i can go through to have a better understanding of what the project is all about. For the main time I am very familiar with the working environment needed for this project ie Unix and Java and I am very willing and ready to learn to catchup with the other technologies that are needed for the success of this project. From now to the moment where coding starts for GSOC I will be in phase with the technologies needed for this project. Thanks,
        Hide
        adina Adina Crainiceanu added a comment -

        Hi Ulrich,

        Thank you for the interest. For background information you can read any textbook in databases that deals with database internals, query processing in particular. For example "Database Management Systems" by R. Ramakrishnan and J. Gehrke.
        For federated systems, you can read for example "FedX: Optimization Techniques for Federated Query Processing on Linked Data" by Andreas Schwarte, P Haase, Katja Hose, Ralf Schenkel, Michael Schmidt http://www2.informatik.uni-freiburg.de/~mschmidt/docs/iswc11_fedx.pdf

        Show
        adina Adina Crainiceanu added a comment - Hi Ulrich, Thank you for the interest. For background information you can read any textbook in databases that deals with database internals, query processing in particular. For example "Database Management Systems" by R. Ramakrishnan and J. Gehrke. For federated systems, you can read for example "FedX: Optimization Techniques for Federated Query Processing on Linked Data" by Andreas Schwarte, P Haase, Katja Hose, Ralf Schenkel, Michael Schmidt http://www2.informatik.uni-freiburg.de/~mschmidt/docs/iswc11_fedx.pdf

          People

          • Assignee:
            adina Adina Crainiceanu
            Reporter:
            adina Adina Crainiceanu
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development