Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-953

Text search does not work in Fuseki with In-memory datasets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • Fuseki 1.1.2, Fuseki 2.0.0, Fuseki 2.3.0
    • Fuseki 2.3.0
    • Fuseki, Text
    • None
    • Ubuntu 14.04 in VM

    Description

      First of all I apologize for possible duplicate posts. I sent it to the mailing list, it disappeared from the "draft box" but didn't show up again in the "sent box" either. So I try to publish it here before I lost it from my clipboard.

      Here is the copy of the mail:

      Hi Andy,

      I am sorry for such a late response. We were busy on another project during this period. Now I try to explain how I reproduce the error step by step.

      So the problem is there is something wrong in the search indexing for in-memory datasets.

      Here is the configuration file I used, it should be basic enough: a server description, a service description and an index engine associating to the dataset to index "rdfs:label".

      @prefix : <#> .
      @prefix fuseki: <http://jena.apache.org/fuseki#> .
      @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
      @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
      @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
      @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
      @prefix text: <http://jena.apache.org/text#> .
      @prefix spatial: <http://jena.apache.org/spatial#> .

      [] a fuseki:Server ;
      fuseki:services (
      <#memory>
      ) .

      <#memory> a fuseki:Service ;
      fuseki:name "memory" ;
      fuseki:serviceQuery "sparql" ;
      fuseki:serviceQuery "query" ;
      fuseki:serviceUpdate "update" ; # SPARQL query service – /memory/update
      fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
      fuseki:serviceReadWriteGraphStore "data" ;
      fuseki:serviceReadGraphStore "get" ; # Graph store protocol (read only) – /memory/get
      fuseki:dataset :text_dataset ;
      .

      <#dataset> rdf:type ja:RDFDataset ;
      ja:defaultGraph
      [
      a ja:MemoryModel ;
      ] .

      1. Text
        [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
        text:TextDataset rdfs:subClassOf ja:RDFDataset .
        text:TextIndexLucene rdfs:subClassOf text:TextIndex .

      :text_dataset a text:TextDataset ;
      text:dataset <#dataset> ;
      text:index <#textIndexLucene> ;
      .

      1. Text index description
        <#textIndexLucene> a text:TextIndexLucene ;
        text:directory <file:Lucene> ;
        ##text:directory "mem" ;
        text:entityMap <#entMap> ;
        .

      <#entMap> a text:EntityMap ;
      text:entityField "uri" ;
      text:defaultField "text" ;
      text:map (
      [ text:field "text" ; text:predicate rdfs:label ]
      ) .

      The server is started with
      "./fuseki-server --config=config-memory-text.ttl"
      and console says it starts properly:
      [2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000
      [2015-06-03 12:13:09] Config INFO FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT
      [2015-06-03 12:13:09] Config INFO FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run
      [2015-06-03 12:13:09] Servlet INFO Initializing Shiro environment
      [2015-06-03 12:13:09] Config INFO Shiro file: file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini
      [2015-06-03 12:13:09] Config INFO Configuration file: config-memory-text.ttl
      [2015-06-03 12:13:10] Builder INFO Service: :memory
      [2015-06-03 12:13:11] Config INFO Register: /memory
      [2015-06-03 12:13:11] Server INFO Started 2015/06/03 12:13:11 CEST on port 3030

      I tested it in two versions: the official release 2.0.0 and the latest snapshot 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The phenomenons are as follows:

      In 2.0.0:
      If I load some triples not containing "rdfs:label", everything works properly. However in this case the index engine is not working; then as long as I add one triple for "rdfs:label" into the file I am loading to Fuseki, error emerges:
      [2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=40 Triples=40 Quads=0
      [2015-06-03 12:10:47] HttpAction WARN Exception during abort (operation attempts to continue): Can't abort a write lock-transaction
      [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms)
      I remember that a few months ago when 2.0.0 was released for the first time, I discovered this issue and reported to you. But at that time I didn't realize that the root reason was because of indexing. In a later snapshot you fix it, but my test wasn't proper so I thought the problem is solved and gave you a wrong feedback. My sincere apologizes.

      In 2.0.1 SNAPSHOT:
      The latest snapshot contains the patch I mentioned above so they can be successfully loaded. However they are not indexed at all. Queries with keyword search do not return any result.

      Following your advice, I tested loading and query from both Web UI and s-post/s-query tools, unfortunately (or fortunately?) the consequences are the same.

      TDB:
      Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0 and 2.0.1 SNAPSHOT is also performed, they both works properly. Loadings are successful and queries returns search results. The only difference is in the configuration file the in-memory dataset is replaced with TDB.
      @prefix : <#> .
      @prefix fuseki: <http://jena.apache.org/fuseki#> .
      @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
      @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
      @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
      @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
      @prefix text: <http://jena.apache.org/text#> .

      [] rdf:type fuseki:Server ;
      fuseki:services (
      <#service_text_tdb>
      ) .

      1. TDB
        [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
        tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
        tdb:GraphTDB rdfs:subClassOf ja:Model .
      1. Text
        [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
        text:TextDataset rdfs:subClassOf ja:RDFDataset .
        text:TextIndexLucene rdfs:subClassOf text:TextIndex .

      <#service_text_tdb> a fuseki:Service ;
      rdfs:label "TDB/text service" ;
      fuseki:name "tdb" ;
      fuseki:serviceQuery "query" ;
      fuseki:serviceQuery "sparql" ;
      fuseki:serviceUpdate "update" ;
      fuseki:serviceUpload "upload" ;
      fuseki:serviceReadGraphStore "get" ;
      fuseki:serviceReadWriteGraphStore "data" ;
      fuseki:dataset <#text_dataset> ;
      .

      <#text_dataset> a text:TextDataset ;
      text:dataset <#dataset> ;
      text:index <#indexLucene> ;
      .

      <#dataset> a tdb:DatasetTDB ;
      tdb:location "DB" ;
      ##tdb:unionDefaultGraph true ;
      .

      <#indexLucene> a text:TextIndexLucene ;
      text:directory <file:Lucene> ;
      ##text:directory "mem" ;
      text:entityMap <#entMap> ;
      .

      <#entMap> a text:EntityMap ;
      text:entityField "uri" ;
      text:defaultField "text" ;
      text:map (
      [ text:field "text" ; text:predicate rdfs:label ]
      ) .

      Any advice for it now? Thank you very much for your efforts in advance.

      Regards,
      Yang

      PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to test on it as well. However I wasn't able to run it.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yyz1989 Yang Yuanzhe
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: