Apache Jena
  1. Apache Jena
  2. JENA-164

LARQ needs to update the Lucene index when a SPARQL Update request is received

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: Jena 2.11.0
    • Component/s: LARQ
    • Labels:
      None

      Description

      LARQ used not to update the Lucene index as RDF statements were added/removed from a Jena Model.
      LARQ is currently extending StatementListener [1] to keep a Lucene index in sync with a Jena Model, however this notification mechanism is not applicable in the case of SPARQL Update requests.

      LARQ needs to update the Lucene index when a SPARQL Update request is received.

      See also a relevant thread from jena-users mailing list:

      [1] http://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/src/main/java/org/apache/jena/larq/IndexBuilderModel.java

        Issue Links

          Activity

          Hide
          Paolo Castagna added a comment - - edited

          Andy's suggestion from jena-dev:

          > And I've suggested LARQ could create a DatasetGraph and catch every
          > add(quad)/delete(quad).
          >
          > A LARQ assember could simply name the dataset description it wraps.
          > Assemble the LARQ assember assembles the inner dataset. Fuseki service
          > points to LARQ.
          >
          > Seems quite practical to try to me.

          http://mail-archives.apache.org/mod_mbox/incubator-jena-dev/201202.mbox/%3C4F2A6ECE.6010802%40googlemail.com%3E

          Show
          Paolo Castagna added a comment - - edited Andy's suggestion from jena-dev: > And I've suggested LARQ could create a DatasetGraph and catch every > add(quad)/delete(quad). > > A LARQ assember could simply name the dataset description it wraps. > Assemble the LARQ assember assembles the inner dataset. Fuseki service > points to LARQ. > > Seems quite practical to try to me. – http://mail-archives.apache.org/mod_mbox/incubator-jena-dev/201202.mbox/%3C4F2A6ECE.6010802%40googlemail.com%3E
          Hide
          Andy Seaborne added a comment -

          In context:

          This does not trap the TDB bulkloader route. That can be done by being between the parser and the bulk loader (any of them) and still be general purpose, not TDB specific.

          Show
          Andy Seaborne added a comment - In context: This does not trap the TDB bulkloader route. That can be done by being between the parser and the bulk loader (any of them) and still be general purpose, not TDB specific.
          Hide
          Paolo Castagna added a comment -
          Show
          Paolo Castagna added a comment - A couple of related messages from jena-users ml: http://markmail.org/message/ui2plmeuvrr4aaxm http://markmail.org/message/bowpd2gxsnwulgpl
          Hide
          laotao added a comment -

          Hi Paolo/Andy,

          I was searching for a solution regarding this. It is a very important feature for our project. Would you please inform me when this can be implemented? Or is there any work around that I can try?

          Thanks
          Tao

          Show
          laotao added a comment - Hi Paolo/Andy, I was searching for a solution regarding this. It is a very important feature for our project. Would you please inform me when this can be implemented? Or is there any work around that I can try? Thanks Tao
          Hide
          Paolo Castagna added a comment -

          Hi Tao, could you share a few more detail on your use cases and application scenario? I imagine you are using Fuseki (with TDB) and LARQ for free text searches. Do you submit all your updates via SPARQL Update or there are other ways your updates come in? Do you use tdbloader for large/incremental updates? How frequent and how big are your updates? Would re-building the Lucene index nightly or hourly be a feasible option for you?
          In any case, thanks for posting your comment here. You can either answer my questions here or on jena-users/jena-dev mailing lists. I prefer this discussions to happen on jena-users and leave JIRA comments to specific comments on how to implement/fix the issue. But comments in JIRA are welcome anyway and they are better than nothing.

          Show
          Paolo Castagna added a comment - Hi Tao, could you share a few more detail on your use cases and application scenario? I imagine you are using Fuseki (with TDB) and LARQ for free text searches. Do you submit all your updates via SPARQL Update or there are other ways your updates come in? Do you use tdbloader for large/incremental updates? How frequent and how big are your updates? Would re-building the Lucene index nightly or hourly be a feasible option for you? In any case, thanks for posting your comment here. You can either answer my questions here or on jena-users/jena-dev mailing lists. I prefer this discussions to happen on jena-users and leave JIRA comments to specific comments on how to implement/fix the issue. But comments in JIRA are welcome anyway and they are better than nothing.
          Hide
          Elli Schwarz added a comment -

          Is any further work planned on this issue? I use Fuseki as a remote endpoint, connecting to it via creation of an UpdateExecutionFactory and executing it. I would really like my indexes to be updated after execution of this update. Is there any way to do this?

          I already modified my Fuseki pom file to add the LARQ dependency, so I do have the Lucene index available, but it is very inconvenient for me to have to take down the server and run the command line larqbuilder command to regenerate the index. I'd be happy to work on this extension for myself if it's not too difficult and if someone could provide me with some tips. I've read all the comments here, but I'm not sure how they apply through my pathway of connected to Fuseki via the remote endpoint. I don't necessarily care if the index is updated automatically via the StatementListener, I just want the index updated after my update script is run (anyway, I don't want the listener invoked for each statement added by the update script, only when the update script is complete).

          Thanks!

          Show
          Elli Schwarz added a comment - Is any further work planned on this issue? I use Fuseki as a remote endpoint, connecting to it via creation of an UpdateExecutionFactory and executing it. I would really like my indexes to be updated after execution of this update. Is there any way to do this? I already modified my Fuseki pom file to add the LARQ dependency, so I do have the Lucene index available, but it is very inconvenient for me to have to take down the server and run the command line larqbuilder command to regenerate the index. I'd be happy to work on this extension for myself if it's not too difficult and if someone could provide me with some tips. I've read all the comments here, but I'm not sure how they apply through my pathway of connected to Fuseki via the remote endpoint. I don't necessarily care if the index is updated automatically via the StatementListener, I just want the index updated after my update script is run (anyway, I don't want the listener invoked for each statement added by the update script, only when the update script is complete). Thanks!
          Hide
          Andy Seaborne added a comment -

          Currently, maintenance of the index is lacking.

          I have been experimenting with a dataset wrapper that tracks changes and emits diffs (add and deletes of triples/quads) which could be a building block for index maintenance. It would also be useful for replication and backups.

          (If I could find some funding to cover this, at least in part, it would progress faster)

          Show
          Andy Seaborne added a comment - Currently, maintenance of the index is lacking. I have been experimenting with a dataset wrapper that tracks changes and emits diffs (add and deletes of triples/quads) which could be a building block for index maintenance. It would also be useful for replication and backups. (If I could find some funding to cover this, at least in part, it would progress faster)
          Hide
          Andy Seaborne added a comment -

          This is out-of-date. LARQ is replaced by jena-text, and that does keep the index in sync.

          Show
          Andy Seaborne added a comment - This is out-of-date. LARQ is replaced by jena-text, and that does keep the index in sync.

            People

            • Assignee:
              Andy Seaborne
              Reporter:
              Paolo Castagna
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 96h
                96h
                Remaining:
                Remaining Estimate - 96h
                96h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development