Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Develop a new API geared towards bridging the gap between local Jena databases and remote SPARQL HTTP endpoints. This would provide a single object to represent a repository, and provide functions to perform querying and modification of the data in the repository. These functions would attempt to be as efficient as possible (i.e. streaming modifications to remote servers), while also promoting safe practices such as parameterizing user supplied query inputs to prevent SPARQL-injection attacks.

      I've started writing up some use cases at [1] (would like to move over to the Confluence Wiki shortly). And I've also started a branch [2] (not much there yet). Feedback is greatly appreciated.

      [1] http://people.apache.org/~sallen/jena/ClientUseCases.html
      [2] http://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/branches/ArqClient

        Activity

        Hide
        Rob Vesse added a comment -

        For the parameterizing queries see http://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/ParameterizedSparqlString.java which I think covers your use case - the code is designed to be used with both queries and updates

        Show
        Rob Vesse added a comment - For the parameterizing queries see http://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/ParameterizedSparqlString.java which I think covers your use case - the code is designed to be used with both queries and updates
        Hide
        Rob Vesse added a comment -

        Also we should be careful here because this basically sounds like "write the Sesame Repository API in Jena" - I'm not saying it is a bad idea I'm just suggesting that we should try and be clear what the specific goals of the new feature are
        e.g.

        • What is the proposed high level API?
        • Are we assuming an API that talks only SPARQL or are we postulating a more general API (e.g. the Storage API from dotNetRDF which just works in terms of coarse operations like Save Graph, Load Graph, Delete Graph, SPARQL Query, SPARQL Update etc) for which me may have multiple implementations thereof?
        Show
        Rob Vesse added a comment - Also we should be careful here because this basically sounds like "write the Sesame Repository API in Jena" - I'm not saying it is a bad idea I'm just suggesting that we should try and be clear what the specific goals of the new feature are e.g. What is the proposed high level API? Are we assuming an API that talks only SPARQL or are we postulating a more general API (e.g. the Storage API from dotNetRDF which just works in terms of coarse operations like Save Graph, Load Graph, Delete Graph, SPARQL Query, SPARQL Update etc) for which me may have multiple implementations thereof?
        Hide
        Stephen Allen added a comment -

        Yes, the intention is to build something like JDBC's Connection or Sesame's Repository API for SPARQL-like objects. Those can be remote repositories or local DatasetGraphs.

        I'm still working on defining the high level API, but the motivating problem was one of efficiently generating RDF from within my client application, sending that to a remote repository (SPARQL Update or Graph Store Protocol). I'd want it to be streaming on both ends, since it could be a very large amount of data. After it was uploaded, I would then like to be able to query it.

        All this through an API that provided transaction support while abstracting away the actual location of the RDF store.

        Show
        Stephen Allen added a comment - Yes, the intention is to build something like JDBC's Connection or Sesame's Repository API for SPARQL-like objects. Those can be remote repositories or local DatasetGraphs. I'm still working on defining the high level API, but the motivating problem was one of efficiently generating RDF from within my client application, sending that to a remote repository (SPARQL Update or Graph Store Protocol). I'd want it to be streaming on both ends, since it could be a very large amount of data. After it was uploaded, I would then like to be able to query it. All this through an API that provided transaction support while abstracting away the actual location of the RDF store.
        Hide
        Andy Seaborne added a comment - - edited

        This looks good and is good timing.

        The SPARQL features support has grown organically as new SPARQL features have become available. Pulling these together into coherent client-facing API would be very useful. There's bits in Fuseki (e.g. DatasetAccessor) as well which have not had time to migrate to the right place. As SPARQL access has evolved, the Dataset interface has become the object around which the various API use. The query execution factory or update execution factory uses the kind of Dataset to route to the right code.
        This abstraction isn't complete (no concept of "create Dataset").

        Keeping the new API as a separate module for better release cycles than core ARQ seems to me to be the way forward. We can then deliver in various forms. When we repackage under the org.apache.jena root, we have a chance to make API chanages and one thing to consider is having less of a separation of SPARQL and the Jena core API. Maybe move Dataset, DatasetGraph, Quad into just two packages (API, SPI).

        I don't have answers to your questions at [1], just some thoughts:

        On "Should the API be based Graph/Triple or Model/Statement objects?", I'd design the API to be related to Model/Statement but keeping an eye open to the graph level. We could make Graph/Triple/... more of an official API after a bit of clearing up like better naming, and refactoring out unused stuff.

        On "GSP and quads", I'm inclined just do the REST-thing GET/PUT/POST at the quads level.

        Final comment - "small steps". A partial API and implementation that covers the natural core of this, and made available labelled "experimental - for feedback" (another argument to me for being separate for now). Howabout the pure SPARQL bit for now?

        Sort of related, but not enough to link the JIRA to them, are JENA-189 (Jena3 technical) and JENA-190 (Jena delivery); JENA-189 because of getting streaming to work end-to-end and misc consolidation of the Graph API and JENA-190 because we can put this API bundled with other code in a single "development jar" to make it eaiser to use Jena.

        Show
        Andy Seaborne added a comment - - edited This looks good and is good timing. The SPARQL features support has grown organically as new SPARQL features have become available. Pulling these together into coherent client-facing API would be very useful. There's bits in Fuseki (e.g. DatasetAccessor) as well which have not had time to migrate to the right place. As SPARQL access has evolved, the Dataset interface has become the object around which the various API use. The query execution factory or update execution factory uses the kind of Dataset to route to the right code. This abstraction isn't complete (no concept of "create Dataset"). Keeping the new API as a separate module for better release cycles than core ARQ seems to me to be the way forward. We can then deliver in various forms. When we repackage under the org.apache.jena root, we have a chance to make API chanages and one thing to consider is having less of a separation of SPARQL and the Jena core API. Maybe move Dataset, DatasetGraph, Quad into just two packages (API, SPI). I don't have answers to your questions at [1] , just some thoughts: On "Should the API be based Graph/Triple or Model/Statement objects?", I'd design the API to be related to Model/Statement but keeping an eye open to the graph level. We could make Graph/Triple/... more of an official API after a bit of clearing up like better naming, and refactoring out unused stuff. On "GSP and quads", I'm inclined just do the REST-thing GET/PUT/POST at the quads level. Final comment - "small steps". A partial API and implementation that covers the natural core of this, and made available labelled "experimental - for feedback" (another argument to me for being separate for now). Howabout the pure SPARQL bit for now? Sort of related, but not enough to link the JIRA to them, are JENA-189 (Jena3 technical) and JENA-190 (Jena delivery); JENA-189 because of getting streaming to work end-to-end and misc consolidation of the Graph API and JENA-190 because we can put this API bundled with other code in a single "development jar" to make it eaiser to use Jena.

          People

          • Assignee:
            Stephen Allen
            Reporter:
            Stephen Allen
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development