Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1652

jena-text analyzer regression

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Jena 3.10.0
    • Jena 3.10.0
    • Text
    • None
    • Ubuntu 16.04
      java version "1.8.0_191"
      Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
      Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

    Description

      I noticed that Skosmos unit tests are failing when run with Fuseki 3.10 snapshots:
      https://github.com/NatLibFi/Skosmos/issues/828

      Digging a bit deeper, it seems that jena-text is no longer applying the analyzer on query strings as it used to in 3.9.0. The most likely reason for this change seems to be the Lucene upgrade (JENA-1621) which may have affected how analyzers are applied.

      Here is the text analyzer configuration I'm using:

      <#indexLucene> a text:TextIndexLucene ;
          ##text:directory <file:/tmp/lucene> ;
          text:directory "mem" ;
          text:entityMap <#entMap> ;
          text:storeValues true ;
          .
      
      <#entMap> a text:EntityMap ;
          text:entityField      "uri" ;
          text:graphField       "graph" ; ## enable graph-specific indexing
          text:defaultField     "pref" ; ## Must be defined in the text:map
          text:uidField         "uid" ;
          text:langField        "lang" ;
          text:map (
               # skos:prefLabel
               [ text:field "pref" ;
                 text:predicate skos:prefLabel ;
                 text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
               # skos:altLabel
               [ text:field "alt" ;
                 text:predicate skos:altLabel ;
                 text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
               # skos:hiddenLabel
               [ text:field "hidden" ;
                 text:predicate skos:hiddenLabel ;
                 text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
               ) .
      

      Here is a minimal test file that I load into the default graph:

      <http://example.org/guppy> <http://www.w3.org/2004/02/skos/core#prefLabel> "Guppy"@en-gb .
      

      This is the query I'm using:

      PREFIX text: <http://jena.apache.org/text#>
      SELECT * {
        ?s text:query 'G*' .
      }
      

      It returns one row (?s=<http://example.org/guppy>) on Fuseki 3.9.0 but nothing with today's 3.10 snapshot.

      If I change the 'G*' to lowercase 'g*' then I get the expected match also with the 3.10 snapshot. So the analyzer (which should lowercase everything and thus the case of the query string should be irrelevant) seems not to be applied for the query string.

      Attachments

        Activity

          People

            code-ferret Code Ferret
            osma Osma Suominen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: