Details

    • Type: Brainstorming Brainstorming
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This is a JIRA to discuss and collect technical changes to Jena that would warrant a "Jena3" whether an incompatible change or just sufficient changes to mean bumping the major version number is best.

        Issue Links

          Activity

          Andy Seaborne created issue -
          Hide
          Andy Seaborne added a comment -

          I am very wary of any change that would take many months to do.

          I am in favour of "smaller" and "simpler". Identify features for flexibility that are never used and remove them, leaving a single implementation which is easier to understand.

          Show
          Andy Seaborne added a comment - I am very wary of any change that would take many months to do. I am in favour of "smaller" and "simpler". Identify features for flexibility that are never used and remove them, leaving a single implementation which is easier to understand.
          Hide
          Ian Dickinson added a comment -

          I'd like to merge Resource and OntResource, to reduce some of the discontinuity of swapping between OntAPI and ModelAPI

          Show
          Ian Dickinson added a comment - I'd like to merge Resource and OntResource, to reduce some of the discontinuity of swapping between OntAPI and ModelAPI
          Hide
          Paolo Castagna added a comment - - edited

          A small Jena core/common (I don't know what would be an appropriate name) similar to what we had in:
          http://svn.apache.org/repos/asf/incubator/jena/Import/Jena-SVN/Experimental/Jena3/trunk/JenaCore/
          (see also: JENA-191)

          No RDF/XML, no Model API, no inference, no ontology API.

          Make as easy as possible for developers to add new readers/writers from/to different data formats.
          Maybe a new/better events system which we could use, for example: to keep third party indexes up-to-date (see: JENA-164) or to replicate updates onto different machines.

          (This might be useful to other Apache projects as well, for example: Any23)

          Show
          Paolo Castagna added a comment - - edited A small Jena core/common (I don't know what would be an appropriate name) similar to what we had in: http://svn.apache.org/repos/asf/incubator/jena/Import/Jena-SVN/Experimental/Jena3/trunk/JenaCore/ (see also: JENA-191 ) No RDF/XML, no Model API, no inference, no ontology API. Make as easy as possible for developers to add new readers/writers from/to different data formats. Maybe a new/better events system which we could use, for example: to keep third party indexes up-to-date (see: JENA-164 ) or to replicate updates onto different machines. (This might be useful to other Apache projects as well, for example: Any23)
          Paolo Castagna made changes -
          Field Original Value New Value
          Link This issue is related to ANY23-19 [ ANY23-19 ]
          Andy Seaborne made changes -
          Link This issue is related to JENA-192 [ JENA-192 ]
          Hide
          Andy Seaborne added a comment - - edited

          Renming the packages would be a user-visible change and warrants a v3 label.

          Show
          Andy Seaborne added a comment - - edited Renming the packages would be a user-visible change and warrants a v3 label.
          Andy Seaborne made changes -
          Link This issue is related to JENA-192 [ JENA-192 ]
          Hide
          Claude Warren added a comment -

          I would like to see a listener (or other event type interface) on the model similar to that available on the graph. At the simplest this as a listener that listens to all the graph events and passes them on to its listeners.

          Show
          Claude Warren added a comment - I would like to see a listener (or other event type interface) on the model similar to that available on the graph. At the simplest this as a listener that listens to all the graph events and passes them on to its listeners.
          Hide
          Andy Seaborne added a comment -

          Is ModelChangedListener what you are looking for?

          http://jena.apache.org/documentation/notes/event-handler-howto.html

          Show
          Andy Seaborne added a comment - Is ModelChangedListener what you are looking for? http://jena.apache.org/documentation/notes/event-handler-howto.html
          Hide
          Claude Warren added a comment -

          I was writing from memory and seem to have gotten the layer wrong. Yes, ModelChangedListener does what I was saying. Is there something similar for the Dataset?

          Show
          Claude Warren added a comment - I was writing from memory and seem to have gotten the layer wrong. Yes, ModelChangedListener does what I was saying. Is there something similar for the Dataset?
          Hide
          Rajeev B added a comment -

          Would be great if Jena defined a standard interface to easily use NOSQL/Graph Databases like OrientDB, MongoDB, Neo4J as triple stores.
          Or if it can use any DB supporting BluePrints (https://github.com/tinkerpop/blueprints/wiki/)

          Show
          Rajeev B added a comment - Would be great if Jena defined a standard interface to easily use NOSQL/Graph Databases like OrientDB, MongoDB, Neo4J as triple stores. Or if it can use any DB supporting BluePrints ( https://github.com/tinkerpop/blueprints/wiki/ )
          Hide
          Andy Seaborne added a comment -

          Rajeev - it would be good to able to that both ways round:

          1/ Jena API on top of NoSQL storage.
          2/ Graph languages on top of Jena graphs.

          Do you want to have a go and make a contribution to the project?

          Show
          Andy Seaborne added a comment - Rajeev - it would be good to able to that both ways round: 1/ Jena API on top of NoSQL storage. 2/ Graph languages on top of Jena graphs. Do you want to have a go and make a contribution to the project?
          Hide
          Andy Seaborne added a comment -

          A new API should be based on IRI objects, not strings.

          If we have our own IRI object, we can require it (optionally?) to be absolute (unlike java.net.URI) by very simple parsing. If we have a Jena IRI we can change the performance balance between high-grade checking and raw efficiency.

          The RIOT parsers do check IRIs – they use a cache of already-checked strings to avoid parsing every IRI every time.

          See JENA-319

          Show
          Andy Seaborne added a comment - A new API should be based on IRI objects, not strings. If we have our own IRI object, we can require it (optionally?) to be absolute (unlike java.net.URI) by very simple parsing. If we have a Jena IRI we can change the performance balance between high-grade checking and raw efficiency. The RIOT parsers do check IRIs – they use a cache of already-checked strings to avoid parsing every IRI every time. See JENA-319
          Hide
          Rajeev B added a comment -

          Pointed in the right direction, I don't mind giving it a shot over the weekend.

          Show
          Rajeev B added a comment - Pointed in the right direction, I don't mind giving it a shot over the weekend.
          Hide
          Andy Seaborne added a comment -

          I had an all-too-quick look at the Blueprints javadoc.

          The Graph/Node/Triple level (sometimes called the SPI - System Programming Interface)

          For an "Implementation", implementing the blueprint API objects backed by Graph/Node/Triple looks like a starting place. Looking at the sail implementation might help - this Jena SPI is close to the OpenRDF abstracions. One difference: Jena has univeral Nodes, not per connection factory create ones.

          For an "Ouplementation", (ARQ on Blueprinted storage) the StageGenerator route is the narrowest point of contact between

          http://jena.apache.org/documentation/query/arq-query-eval.html

          Show
          Andy Seaborne added a comment - I had an all-too-quick look at the Blueprints javadoc. The Graph/Node/Triple level (sometimes called the SPI - System Programming Interface) For an "Implementation", implementing the blueprint API objects backed by Graph/Node/Triple looks like a starting place. Looking at the sail implementation might help - this Jena SPI is close to the OpenRDF abstracions. One difference: Jena has univeral Nodes, not per connection factory create ones. For an "Ouplementation", (ARQ on Blueprinted storage) the StageGenerator route is the narrowest point of contact between http://jena.apache.org/documentation/query/arq-query-eval.html
          Hide
          Claude Warren added a comment -

          I'm not sure if this belongs here or perhaps there should be a similar JIRA for Fuseki but I would like to see a RESTful Fuseki where the dataset is identified in the URL and different endpoints for SPARQL queries and RDF downloads. I have code that I would contribute to this end.

          Show
          Claude Warren added a comment - I'm not sure if this belongs here or perhaps there should be a similar JIRA for Fuseki but I would like to see a RESTful Fuseki where the dataset is identified in the URL and different endpoints for SPARQL queries and RDF downloads. I have code that I would contribute to this end.
          Hide
          Andy Seaborne added a comment -

          Claude - yes, a separate JIRA would be easier from my point-of-view.

          I'd like to hear about your ideas for RESTful operation - maybe a discussion on a JIRA or on email.

          Fuseki does have a additional code for handling direct naming and operations on the dataset URL itself. This (SPARQL_UberServlet*) is not active by default yet because I also want to put in place an approach to security (looking at Apache Shiro) and it may make it hard to filter fro security if the kind of operation (R or W) is within the query string.

          See also JENA-201 (Fuseki delivered as a WAR file)

          Show
          Andy Seaborne added a comment - Claude - yes, a separate JIRA would be easier from my point-of-view. I'd like to hear about your ideas for RESTful operation - maybe a discussion on a JIRA or on email. Fuseki does have a additional code for handling direct naming and operations on the dataset URL itself. This (SPARQL_UberServlet*) is not active by default yet because I also want to put in place an approach to security (looking at Apache Shiro) and it may make it hard to filter fro security if the kind of operation (R or W) is within the query string. See also JENA-201 (Fuseki delivered as a WAR file)
          Hide
          Rajeev B added a comment -

          I'll read through and try to proceed in the direction of an "Ouplementation", this should bring any Store supporting Blueprints, into the gamut of Jena (without implementing anything on the Store side).

          So I'll start looking at the Sail Ouplementation and StageGenerator that you mentioned.

          Show
          Rajeev B added a comment - I'll read through and try to proceed in the direction of an "Ouplementation", this should bring any Store supporting Blueprints, into the gamut of Jena (without implementing anything on the Store side). So I'll start looking at the Sail Ouplementation and StageGenerator that you mentioned.
          Hide
          Andy Seaborne added a comment -

          Remove Capabilties from the Graph interface.

          Replace with a single "isReadOnly()".
          Rethink the Graph.size() in the cases where it is not accurate.

          Show
          Andy Seaborne added a comment - Remove Capabilties from the Graph interface. Replace with a single "isReadOnly()". Rethink the Graph.size() in the cases where it is not accurate.
          Hide
          Claude Warren added a comment -

          I think some of the capabilities in the graph interface are important and that it should be retained but cleaned up.

          Show
          Claude Warren added a comment - I think some of the capabilities in the graph interface are important and that it should be retained but cleaned up.
          Hide
          Claude Warren added a comment -

          A proposal for changes to Iterator, Lock and Transaction implementations for Jena 3.

          Show
          Claude Warren added a comment - A proposal for changes to Iterator, Lock and Transaction implementations for Jena 3.
          Claude Warren made changes -
          Attachment IteratorLockandTransactionsinJena3.pdf [ 12612845 ]
          Show
          Andy Seaborne added a comment - See also http://mail-archives.apache.org/mod_mbox/jena-dev/201311.mbox/%3C527A5574.7060306%40apache.org%3E
          Hide
          Andy Seaborne added a comment -

          Have you looked at TDB? This note seems to be SDB-centric in expectations.

          As implemented Iterators require that if there is a change to the underlying data store the iterator must fail. Thus any write cancels all reads.

          In TDB, Iterators from different transactions can exist at the same time and continue to iterate. They do not throw exceptions or fail on change outside the transaction - the changes simply aren't seen; writes do not cancel reads.

          The use of iterators block updates making the use of graphs in high frequency read/write
          environments difficult.

          In TDB, different versions of the database exist at the same time. Iterators do not block later writers or commits.

          4. Serializable requests
          within this isolation prohibit concurrent read/write, similar to the
          current situation. Pro: absolute consistency. Cons: lower concurrency

          TDB is fully serializable and allows concurrency, including concurrency between multiple versions of the database. TDB is not 2PL (SS2PL) based; it's MVCC (multi-version currency control). (By "MVCC", I mean a path-copying immutable datastructure style, not the multi-entry-version-id style - "MVCC" is used in both contexts.)

          Other:

          Locking a subgraph does not stablize query results. Queries can involve pattern negation (NOT EXISTS, MINUS). OPTIONALs also can depend on the absence of triples. Read-repeatable would also need to track read-repeatable-no-result.

          Show
          Andy Seaborne added a comment - Have you looked at TDB? This note seems to be SDB-centric in expectations. As implemented Iterators require that if there is a change to the underlying data store the iterator must fail. Thus any write cancels all reads. In TDB, Iterators from different transactions can exist at the same time and continue to iterate. They do not throw exceptions or fail on change outside the transaction - the changes simply aren't seen; writes do not cancel reads. The use of iterators block updates making the use of graphs in high frequency read/write environments difficult. In TDB, different versions of the database exist at the same time. Iterators do not block later writers or commits. 4. Serializable requests within this isolation prohibit concurrent read/write, similar to the current situation. Pro: absolute consistency. Cons: lower concurrency TDB is fully serializable and allows concurrency, including concurrency between multiple versions of the database. TDB is not 2PL (SS2PL) based; it's MVCC (multi-version currency control). (By "MVCC", I mean a path-copying immutable datastructure style, not the multi-entry-version-id style - "MVCC" is used in both contexts.) Other: Locking a subgraph does not stablize query results. Queries can involve pattern negation (NOT EXISTS, MINUS). OPTIONALs also can depend on the absence of triples. Read-repeatable would also need to track read-repeatable-no-result.
          Hide
          Claude Warren added a comment -

          I have not looked at SDB nor have I looked at the TDB code. I have looked at various in-memory and the early file based systems. My comments are somewhat SQL centric as I used the terms from SQL database transactions because 1) relational databases have a lot of experience with transactions and how they are used in production environments; and 2) the terms are mostly understood or easy to research.

          My comments on iterators come from the iterator documentation and the way in which in memory iterators work. It sounds like TBD defines the collection being iterated over as the items within the transaction, not necessarily the graph/model as a whole. This is basically what I wanted to see, perhaps what we need here is a clarification to the documentation describing what the iterators are iterating over. Q: will TDB ever throw a concurrent modification error on an iterator?

          Other:

          Indeed locking a subgraph does not guarantee that you will be able to stabilise the query results for all queries, however, I think that it would be useful for a fairly large subset of queries. Since pattern negation necessarily opens the door for non-terminating queries any query with pattern negation would probably be excluded from the set of queries that this would be beneficial for.

          Show
          Claude Warren added a comment - I have not looked at SDB nor have I looked at the TDB code. I have looked at various in-memory and the early file based systems. My comments are somewhat SQL centric as I used the terms from SQL database transactions because 1) relational databases have a lot of experience with transactions and how they are used in production environments; and 2) the terms are mostly understood or easy to research. My comments on iterators come from the iterator documentation and the way in which in memory iterators work. It sounds like TBD defines the collection being iterated over as the items within the transaction, not necessarily the graph/model as a whole. This is basically what I wanted to see, perhaps what we need here is a clarification to the documentation describing what the iterators are iterating over. Q: will TDB ever throw a concurrent modification error on an iterator? Other: Indeed locking a subgraph does not guarantee that you will be able to stabilise the query results for all queries, however, I think that it would be useful for a fairly large subset of queries. Since pattern negation necessarily opens the door for non-terminating queries any query with pattern negation would probably be excluded from the set of queries that this would be beneficial for.
          Hide
          Claude Warren added a comment -

          Jena 3 should use IRI where ever possible. Since we expect Jena 2 -> Jena 3 to require some rework, this seem to be a good time to embrace IRI as a class and use it wherever the string version of URL or URI is currently expected.

          As long as the IRI implementation can convert itself into a valid URL or URI this should be doable. For IRIs that can not be converted (are there any) we would probably need an exception thrown.

          Show
          Claude Warren added a comment - Jena 3 should use IRI where ever possible. Since we expect Jena 2 -> Jena 3 to require some rework, this seem to be a good time to embrace IRI as a class and use it wherever the string version of URL or URI is currently expected. As long as the IRI implementation can convert itself into a valid URL or URI this should be doable. For IRIs that can not be converted (are there any) we would probably need an exception thrown.
          Hide
          Andy Seaborne added a comment - - edited

          There is quite a big difference between, say, needing to change import statements because of repacking and needing to use IRIs everywhere. This is not to argue against it but just because some changes require rework, does not mean it's all the same amount of work.

          For this to be a good idea, we'd need to understand the implications. Jena IRI library performs a detailed parsing of the string. Is that an acceptable cost? What if a loop is doing an operation where part of the loop body is using the same string each time - avoiding repeated parsing maybe necessary.

          Jena can support multiple APIs - a possibility is to grow this style in parallel with a fairly direct port of the existing API and see which gains traction. It allows for a wide scope for change without forcing it on users just to get access to other improvements that aren't connected to the API.

          Show
          Andy Seaborne added a comment - - edited There is quite a big difference between, say, needing to change import statements because of repacking and needing to use IRIs everywhere. This is not to argue against it but just because some changes require rework, does not mean it's all the same amount of work. For this to be a good idea, we'd need to understand the implications. Jena IRI library performs a detailed parsing of the string. Is that an acceptable cost? What if a loop is doing an operation where part of the loop body is using the same string each time - avoiding repeated parsing maybe necessary. Jena can support multiple APIs - a possibility is to grow this style in parallel with a fairly direct port of the existing API and see which gains traction. It allows for a wide scope for change without forcing it on users just to get access to other improvements that aren't connected to the API.

            People

            • Assignee:
              Unassigned
              Reporter:
              Andy Seaborne
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development