Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1502

SPARQL extensions for processing CSV, XML, JSON and remote data

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Jena 3.6.0
    • None
    • ARQ
    • None

    Description

      Many systems have been built so far for transforming heterogeneous data - most prominently CSV, XML and JSON) to RDF.
      As it turns out, with a few extensions to ARQ, Jena becomes (at least for me) an extremely convenient tool for this task.

      To clarify our point, for a project we have to convert several (open) datasets, and we came up with a solution where we just have to execute a sequence of SPARQL queries making use of our ARQ extensions.

      In this repository there are sub folders with JSON datasets, and the conversion is just a matter of running the SPARQL queries in the files workloads.sparql (which adds triples describing workloads into a jena in-memory dataset) and process.sparql (which processes all workloads in that dataset and inserts triples into a (named) result graph). We created a thin command line wrapper to conveniently run these conversions.

      An example of these extension functions:

      # Add labels of train / bus stops
      INSERT {
        GRAPH eg:result { ?s rdfs:label ?l }
      }
      WHERE {
        # Magic property to fetch the text (at present always a string) of some URL
        <someUrlPointingToALocalOrRemoteDataset> url:text ?src .
        # Parse into a literal of JSON datatype
        BIND(STRDT(?src, xsd:json) AS ?o)
        # Access a JSON array attribute
        BIND(json:path(?o, "$.stopNames") AS ?stopNames)
        # Create bindings for each element in the JSON document
        ?stopNames json:unnest (?l ?i) .
        # An ordinary join with existing data
        GRAPH ?x { ?s eg:stopId ?i }
      }
      

      In fact, these SPARQL ARQ extensions would enable any Jena-based project to perform such integration tasks - and for instance one could already start a Fuseki in order to play around with conversions in a Web interface.

      • Is there interest to integrate our ARQ SPARQL extension functions into Jena? If so, what would we have to do and where (which existing or new jena module) would be the most appropriate place?
        We are also open to discussion and changes on what exactly the signatures of these extension functions should look like. For instance, right now we use two custom datatypes, xsd:json and xsd:xml which obviously should be replaced by better IRIs.
      • Maybe the functionality of running files containing sequences of SPARQL queries from the command line could also be added to Jena directly - as I think there is no magic outside the scope of Jena to it.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Aklakan Claus Stadler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: