Uploaded image for project: 'Apache Any23 (Retired)'
  1. Apache Any23 (Retired)
  2. ANY23-396

Overhaul WriterFactory API

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2
    • 2.3
    • core
    • None

    Description

      This issue began with Jacek's observation that, in Rover, it is impossible to specify a delegating writer factory, i.e., one that maps/filters/reduces the preliminary extraction output before passing it on to the final outputstream writer. Lack of this ability caused us to have to specify numerous configuration flags in Rover, e.g., "--notrivial", which filters the output of the extractor by removing trivial css triples prior to writing the triples to their final format. Many of these flags could simply be replaced by the ids of delegating writer factories, if we had such a capability. One added advantage of that would be that then, users could specify the order in which these modifications take place. E.g., adding a logging decorator could take place before or after the "notrivial" decorator has been applied (or both before and after!). Which? If we can, we should really let the user decide.

      The most obvious solution to this problem was to extend the WriterFactory interface with a new DelegatingWriterFactory interface that accepts an arbitrary TripleHandler rather than an OutputStream as input.

      In doing so, it was also necessary to deprecate a few methods in WriterFactory and un-deprecate them in an extending TripleWriterFactory class (which takes the place of WriterFactory by creating a TripleHandler from an OutputStream). This deprecation was actually not too painful, first, because some of the methods were redundant in the first place (e.g., getMimeType()), and second, because it presented us with a perfect opportunity to add some much-needed improvements to the new interface.

      The biggest improvement is the addition of Settings as a parameter to the TripleHandler constructor, which will allow users to configure writers as they see fit, rather than forcing, e.g., prettyprint=true on them.

      ANY23-388 perfectly illustrates this current lack of configuration ability. And we fixed that issue by simply giving users protected access to the underlying RDFWriter instances so that they could configure them manually. However, in hindsight, this was a bad idea, as it could lead to backwards compatibility issues down the line if we decide to change the underlying implementation of RDFWriterTripleHandler instances. Luckily, the solution to ANY23-388 was only implemented recently and is still only present in the snapshot version of Any23. In my PR, I've removed that hack and replaced it with Settings, which is extensible ad infinitum and won't pose the same threat to backwards compatibility.

      Another improvement is the removal of RDF4J classes from the public WriterFactory API. (I replaced RDFFormat with our own TripleFormat class.) As I noted in my PR, it's probably better for us to use our own classes in public-facing interfaces rather than RDF4J's so that we can maintain stability in the event that RDF4J changes their API, or (heaven forbid) ceases to exist, or we simply want to modify the implementation. A good rule of thumb for us would probably be to limit usage of RDF4J in our public-facing API to the ubiquitous interfaces found in the org.eclipse.rdf4j:rdf4j-model artifact (e.g. IRI and Literal), since removing those would be virtually impossible without enormous backwards compatibility issues.

      Since this PR is quite large and there are a multitude of new classes and new behaviors (while managing to remain fully backwards-compatible with previous behavior), I'm looking for feedback! Please comment with any concerns, questions, or suggestions you have for improvement.

      PR can be viewed here: https://github.com/apache/any23/pull/122
       

      Attachments

        Activity

          People

            hansbrende Hans Brende
            grzebyta.dev Jacek Grzebyta
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: