Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Jena 2.10.1
    • Component/s: RIOT
    • Labels:

      Description

      RIOT has the ability to parse TriG (http://www4.wiwiss.fu-berlin.de/bizer/TriG/) files but not to serialize RDF datasets in that format.
      When working with named graphs people would probably find it easier to look at TriG files rather than N-Quads (same as Turtle and N-Triples).

        Activity

        Hide
        Laurent Pellegrino added a comment -

        I need this feature in order to improve quads serialization (I mean the output size). I will propose a patch.

        Show
        Laurent Pellegrino added a comment - I need this feature in order to improve quads serialization (I mean the output size). I will propose a patch.
        Hide
        Andy Seaborne added a comment -

        If it's to reduce size, then you can do only partial "pretty" trig and gat much if not all of the advantage.

        The RDF-WG hasn't defined TriG yet som thing may chnage - I'm arguing there for no restriction of using a graph name once per file. Instead, I argue, multiple named blocks of triples that all go in the same graph is better as sometimes quads don't arrive in perfect G-sorted order.

        I have some less-than-half finished code for a TriG writer. Well, it's a new Turtle writer that can be called from inside a TriG writer.

        https://svn.apache.org/repos/asf/incubator/jena/Scratch/AFS/Dev/trunk/src/main/java/riot/

        TriGWriter.java
        TurtleWriter2.java
        TurtleWriterBlocks.java
        TurtleWriterFlat.java

        The current Jena Turtle writer is very old code and it shows. It can't easily be made to work embedded so I was rewriting it. TurtleWriter2 is not complete - it does not have list handling, sorted predicates or object lists (although personally I don't like object lists much).

        And datasets don't have prefixes (yet).

        And there's no writer architecture.

        I have been assuming the model.write() style is wrong - it needs to be

        WriterThing.write(OutputStream, syntax, model)
        WriterThing.write(OutputStream, syntax, dataset)

        and have one system wide WriterThing. Only RDF/XML needs very specialised setup and we shouldn't distort things just for RDF/XML.

        The code really is the easier part of the problem I put together and never completed. It's only just been put into SVN during some local cleaning.

        Nice output is hard; there are many aspects of the current Turtle writer that aren't in the new one. Some people care greatly about consistence of output - they store RDF data in version control.

        Hope this helps, but ignore it if not.

        Show
        Andy Seaborne added a comment - If it's to reduce size, then you can do only partial "pretty" trig and gat much if not all of the advantage. The RDF-WG hasn't defined TriG yet som thing may chnage - I'm arguing there for no restriction of using a graph name once per file. Instead, I argue, multiple named blocks of triples that all go in the same graph is better as sometimes quads don't arrive in perfect G-sorted order. I have some less-than-half finished code for a TriG writer. Well, it's a new Turtle writer that can be called from inside a TriG writer. https://svn.apache.org/repos/asf/incubator/jena/Scratch/AFS/Dev/trunk/src/main/java/riot/ TriGWriter.java TurtleWriter2.java TurtleWriterBlocks.java TurtleWriterFlat.java The current Jena Turtle writer is very old code and it shows. It can't easily be made to work embedded so I was rewriting it. TurtleWriter2 is not complete - it does not have list handling, sorted predicates or object lists (although personally I don't like object lists much). And datasets don't have prefixes (yet). And there's no writer architecture. I have been assuming the model.write() style is wrong - it needs to be WriterThing.write(OutputStream, syntax, model) WriterThing.write(OutputStream, syntax, dataset) and have one system wide WriterThing. Only RDF/XML needs very specialised setup and we shouldn't distort things just for RDF/XML. The code really is the easier part of the problem I put together and never completed. It's only just been put into SVN during some local cleaning. Nice output is hard; there are many aspects of the current Turtle writer that aren't in the new one. Some people care greatly about consistence of output - they store RDF data in version control. Hope this helps, but ignore it if not.
        Hide
        Laurent Pellegrino added a comment -

        Thanks Andy for your comment.

        > If it's to reduce size, then you can do only partial "pretty" trig and gat much if not all of the advantage.

        You are right, it will be simpler to manage it as you suggested even if it would better to use something which is more "standardized" but, as you said, TriG as not been yet defined by RDF-W.G

        I had started to work on a simple patch before to read your comment (translating dataset as TriG without taking advantage of Turtle features (prefixes abreviation, groups abreviation, lists, ...). Even if it is too simple, In terms of size and readability this simple version is always better than NQuads.

        Why not to create a TriG writer which has 2 levels?

        • One where no abbreviation is used (something similar to the simple version I described just before)
        • One where node values which are repeated are abbreviated and/or aggregated by using Turtle features (prefixes abbreviation, groups abbreviation through ',' and ';' symbols, lists, ...).

        Thus, level 1 could be easily provided with some minor changes whereas level 2, that takes advantage of Turtle features could be implemented later, when the RIOT architecture will be ready. Also, by using these two kind of writers, users could in the future choose between a version which does not use abbreviation but which is fast and consume less memory than the version 2 which has to buffer, sort nodes?

        Show
        Laurent Pellegrino added a comment - Thanks Andy for your comment. > If it's to reduce size, then you can do only partial "pretty" trig and gat much if not all of the advantage. You are right, it will be simpler to manage it as you suggested even if it would better to use something which is more "standardized" but, as you said, TriG as not been yet defined by RDF-W.G I had started to work on a simple patch before to read your comment (translating dataset as TriG without taking advantage of Turtle features (prefixes abreviation, groups abreviation, lists, ...). Even if it is too simple, In terms of size and readability this simple version is always better than NQuads. Why not to create a TriG writer which has 2 levels? One where no abbreviation is used (something similar to the simple version I described just before) One where node values which are repeated are abbreviated and/or aggregated by using Turtle features (prefixes abbreviation, groups abbreviation through ',' and ';' symbols, lists, ...). Thus, level 1 could be easily provided with some minor changes whereas level 2, that takes advantage of Turtle features could be implemented later, when the RIOT architecture will be ready. Also, by using these two kind of writers, users could in the future choose between a version which does not use abbreviation but which is fast and consume less memory than the version 2 which has to buffer, sort nodes?
        Hide
        Andy Seaborne added a comment -

        Laurent,

        I agree - two (or more?) different writers. For old (=current) Tutle writing, it tries to reuse the same code but this is actually inefficient because writing in blocks of same-subject triples does not require any pre-processing of the graph to be written.

        (new, mostly finished, untested) TurtleWriterBlocks is this style - it's a streaming write of the data with no special list of nested object forms.

        From the point of view efficiency, using the proto-Trig with TurtleWriterBlocks might be interesting to you. TriGWriter can choose the Turtle writer. The prefixes need sorting out.

        A non-TriG way of transferring datasets is to use BindingIO streams. These compress by avoiding sending terms in the previous row.

        Show
        Andy Seaborne added a comment - Laurent, I agree - two (or more?) different writers. For old (=current) Tutle writing, it tries to reuse the same code but this is actually inefficient because writing in blocks of same-subject triples does not require any pre-processing of the graph to be written. (new, mostly finished, untested) TurtleWriterBlocks is this style - it's a streaming write of the data with no special list of nested object forms. From the point of view efficiency, using the proto-Trig with TurtleWriterBlocks might be interesting to you. TriGWriter can choose the Turtle writer. The prefixes need sorting out. A non-TriG way of transferring datasets is to use BindingIO streams. These compress by avoiding sending terms in the previous row.
        Hide
        Iyad Shabani added a comment -

        Hi Andy,
        Do you plan to put the TriGWriter you wrote into a release?
        I need this writer, how could i use it?
        Thank you

        Show
        Iyad Shabani added a comment - Hi Andy, Do you plan to put the TriGWriter you wrote into a release? I need this writer, how could i use it? Thank you
        Hide
        Andy Seaborne added a comment -

        Hi Iyad,

        Yes, the plan is to include it in a Jena release.

        Can you use N-Quads? - there is a writer for NQuads already in ARQ, in the Apache release.

        org.openjena.riot.out.NQuadsWriter

        Otherwise, you can copy the code from the location in the SVN scratch area (see above).

        It's a small matter of timing a block of time to work on the I/O architecture to properly integrate the code into RIOT.

        Show
        Andy Seaborne added a comment - Hi Iyad, Yes, the plan is to include it in a Jena release. Can you use N-Quads? - there is a writer for NQuads already in ARQ, in the Apache release. org.openjena.riot.out.NQuadsWriter Otherwise, you can copy the code from the location in the SVN scratch area (see above). It's a small matter of timing a block of time to work on the I/O architecture to properly integrate the code into RIOT.
        Hide
        Andy Seaborne added a comment -

        See the new RDFDataMgr.write(...., Lang.TRIG)

        Show
        Andy Seaborne added a comment - See the new RDFDataMgr.write(...., Lang.TRIG)

          People

          • Assignee:
            Andy Seaborne
            Reporter:
            Paolo Castagna
          • Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 4h
              4h
              Remaining:
              Remaining Estimate - 4h
              4h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development