Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1723

jena:text create OR's of Lucene fields

    XMLWordPrintableJSON

Details

    Description

      Motivation:

      With the current jena:text we often find that we have query patterns such as:

      select ?foo where {
        {
           (?s ?sc ?lit) text:query ( rdfs:label "some query" "highlight:" ).
        }
        union
        {
          (?s ?sc ?lit) text:query ( skos:altLabel "some query" "highlight:" ).
        }
        union
        { 
          (?s ?sc ?lit) text:query ( skos:prefLabel "some query" "highlight:").
        }
      }
      

      For various sets of RDF properties, each corresponding to some Lucene field.

      It can be more performant to push the unions into the Lucene query by rewriting as:

      (altLabel:"some query" OR prefLabel:"some query" OR label:"some query")
      

      Then it's a single query with Lucene performing the unions.

      Approach:

      We've implemented this by

      1. adding a new assembler feature in text:TextIndexLucene:

      [] text:props (
          text:propList [ text:propListProp  ex:labels ;
               text:props ( skos:prefLabel skos:altLabel rdfs:label ) ]
      } ;
      

      Which allows to give a single Property id, e.g., ex:labels, to a list of properties.

      and

      2. adding some syntax to the TextQueryPF:

      (?s ?sc ?lit ?graph ?prop) text:query ( text:props ex:labels "some query" "highlight:" )
      

      The addition of the fifth output arg, ?prop, allows to return the specific property that matched and if the input args includes text:props as the first argument then there must be a list, of at least one, properties prior to the query string. These properties are either the usual Lucene indexed properties that occur in text:query or a property list property such as ex:labels above.

      When a list property is encountered it is expanded to the underlying list of indexed properties from the configuration.

      There may be any mix of indexed and property list properties following text:props in the input arg list:

      (?s ?sc ?lit ?graph ?prop) text:query ( text:props ex:labels rdfs:comment "some query" "highlight:" )
      

      which searches over the three properties listed in ex:labels and the property rdfs:comment.

      This functionality is implemented, including copious tests, and a PR can be issued after a bit of code cleanup.

      Discussion:

      The use of text:props in the query form isn't strictly necessary, and was introduced as a way of indicating the intent to have a list of properties to be searched over.

      If the text:props flag is removed from the implementation then the feature will simply check the property(s) for whether they are list properties or just indexed properties.

      With this modification the above queries would be written simply as:

      (?s ?sc ?lit ?graph ?prop) text:query ( ex:labels "some query" "highlight:" )
      

      or

      (?s ?sc ?lit ?graph ?prop) text:query ( ex:labels rdfs:comment "some query" "highlight:" )
      

      Attachments

        Activity

          People

            code-ferret Code Ferret
            code-ferret Code Ferret
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: