Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7341

xjoin - join data from external sources

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • 4.10.3, 5.3.2, 6.0
    • search
    • None

    Description

      XJoin

      The "xjoin" SOLR contrib allows external results to be joined with SOLR results in a query and the SOLR result set to be filtered by the results of an external query. Values from the external results are made available in the SOLR results and may also be used to boost the scores of corresponding documents during the search. The contrib consists of the Java classes XJoinSearchComponent, XJoinValueSourceParser and XJoinQParserPlugin (and associated classes), which must be configured in solrconfig.xml, and the interfaces XJoinResultsFactory and XJoinResults, which are implemented by the user to provide the link between SOLR and the external results source (but see below for details of how to use the in-built SimpleXJoinResultsFactory implementation). External results and SOLR documents are matched via a single configurable attribute (the "join field").

      To include the XJoin contrib classes, add the following config to solrconfig.xml:

      <config>
        ..
         <!-- XJoin contrib -->
        <lib dir="${solr.install.dir:../../../..}/contrib/xjoin/lib" regex=".*\.jar" />
        <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-xjoin-\d.*\.jar" />
        ..
      </config>
      

      Note that any JARs containing implementations of the XJoinResultsFactory must also be included.

      Java classes and interfaces

      XJoinResultsFactory

      The user implementation of this interface is responsible for connecting to an external source to perform a query (or otherwise collect results). Parameters with prefix "<component name>.external." are passed from the SOLR query URL to pararameterise the search. The interface has the following methods:

      • void init(NamedList args) - this is called during SOLR initialisation, and passed parameters from the search component configuration (see below)
      • XJoinResults getResults(SolrParams params) - this is called during a SOLR search to generate external results, and is passed parameters from the SOLR query URL (as above)

      For example, the implementation might perform queries of an external source based on the 'q' SOLR query URL parameter (in full, <component name>.external.q).

      XJoinResults

      A user implementation of this interface is returned by the getResults() method of the XJoinResultsFactory implementation. It has methods:

      • Object getResult(String joinId) - this should return a particular result given the value of the join attribute
      • Iterable<String> getJoinIds() - this should return an ordered (ascending) list of the join attribute values for all results of the external search

      XJoinSearchComponent

      This is the central Java class of the contrib. It is a SOLR search component, configured in solrconfig.xml and included in one or more SOLR request handlers. There is one XJoin search component per external source, and each has two main responsibilities:

      • Before the SOLR search, it connects to the external source and retrieves results, storing them in the SOLR request context
      • After the SOLR search, it matches SOLR document in the results set and external results via the join field, adding attributes from the external results to documents in the SOLR results set

      It takes the following initialisation parameters:

      • factoryClass - this specifies the user-supplied class implementing XJoinResultsFactory, used to generate external results
      • joinField - this specifies the attribute on which to join between SOLR documents and external results
      • external - this parameter set is passed to configure the XJoinResultsFactory implementation

      For example, in solrconfig.xml:

      <searchComponent name="xjoin_test" class="org.apache.solr.search.xjoin.XJoinSearchComponent">
        <str name="factoryClass">test.TestXJoinResultsFactory</str>
        <str name="joinField">id</str>
        <lst name="external">
          <str name="values">1,2,3</str>
        </lst>
      </searchComponent>
      

      Here, the search component instantiates a new TextXJoinResultsFactory during initialisation, and passes it the "values" parameter (1, 2, 3) to configure it. To properly use the XJoinSearchComponent in a request handler, it must be included at the start and end of the component list, and may be configured with the following query parameters:

      • results - a comma-separated list of attributes from the XJoinResults implementation (created by the factory at search time) to be included in the SOLR results
      • fl - a comma-separated list of attributes from results objects (contained in an XJoinResults implementation) to be included in the SOLR results

      For example:

      <requestHandler name="/xjoin" class="solr.SearchHandler" startup="lazy">
        <lst name="defaults">
          ..
          <bool name="xjoin_test">true</bool>
          <str name="xjoin_test.listParameter">xx</str>
          <str name="xjoin_test.results">test_count</str>
          <str name="xjoin_test.fl">id,value</str>
        </lst>
        <arr name="first-components">
          <str>xjoin_test</str>
        </arr>
        <arr name="last-components">
          <str>xjoin_test</str>
        </arr>
      </requestHandler>
      

      Note that, to include the list of join ids returned by the external source in the SOLR results (likely for debug purposes), the value 'join_ids' may be specified in the "results" parameter.

      XJoinQParserPlugin

      This query parser plugin constructs a query from the results of the external searches, and is based on the TermsQParserPlugin. It takes the following local parameters:

      • method - as the TermsQParserPlugin, this specifies how to build the Lucene query based on the join ids contained in external results; one of termsFilter, booleanQuery, automaton, or docValuesTermsFilter (defaults to termsFilter)
        * v (or as usual with query parsers, specified via the query) - a Boolean combination of XJoin search component names. Supported operators are OR, AND, XOR, and AND NOT

      The query is a Boolean expression whose terms are XJoin search component names. The resulting set of join ids (obtained from the respective XJoin search components) are formed into a Lucene query. Note that the join field of all the referenced XJoin search components must be identical. Of course, the expression can be a single XJoin search component name in the simplest situation. For example:

      q={!xjoin}xjoin_test
      q={!xjoin v=xjoin_test}
      fq={!xjoin method=automaton}xjoin_test1 AND NOT xjoin_test2
      

      XJoinValueSourceParser

      This class provides a SOLR function that may be used, for example, in a boost function to weight the result score from external values. The function returns an attribute value from the external result with matching join attribute. There are two ways of using the function. Either the XJoin component name is specified in the configuration parameters and the external result attribute is the argument of the function in the query, or vice versa, the attribute is specified in the configuration parameters and the component name is the function argument.

      The parameters for configuration in solrconfig.xml are:

      • xJoinSearchComponent - the name of an XJoin search component containing external results
        * attribute - the attribute to use from external results
      • defaultValue - if the external result has no such attribute, then this value is returned

      Normally, only one of xJoinSearchComponent and attribute is configured, but it is possible to specify both (but you must specify at least one).

      For example:

      <valueSourceParser name="test_fn" class="org.apache.solr.search.xjoin.XJoinValueSourceParser">
        <str name="xJoinSearchComponent">xjoin_test</str>
        <double name="defaultValue">1.0</double>
      </valueSourceParser>
      

      with corresponding query string parameter (for example) bq=test_fn(value)

      Alternatively:

      <valueSourceParser name="test_fn" class="org.apache.solr.search.xjoin.XJoinValueSourceParser">
        <str name="attribute">value</str>
        <double name="defaultValue">1.0</double>
      </valueSourceParser>
      

      with corresponding query string parameter (for example) bq=test_fn(join_test)

      Mapping between attributes and Java methods

      Java method names are converted into attribute (field) names by stripping the initial "get" or "is" and converting the remainder from CamelCase to lowercase-with-underscores, and vice versa. For example, getScore() converts to "score" and getFooBar() converts to "foo_bar", and vice versa.

      The field list parameter of XJoinSearchComponent (fl) can be given as *, in which case all methods beginning 'get' or 'is' are converted into fields in the SOLR result for the document.

      Putting it together - the SOLR query URL

      Here is an example SOLR query URL to perform an xjoin:

      http://solrserver:8983/solr/collection1/xjoin?defType=edismax&q=*:*&xjoin_test.external.q=foobar&fl=id,score&fq={!xjoin}xjoin_test&bf=test_fn(value)
      

      This might result in the following SOLR response:

      <?xml version="1.0" encoding="UTF-8"?>
      <response>
        <lst name="responseHeader">
          <int name="status">0</int>
          <int name="QTime">346</int>
          <lst name="params">
            ..
          </lst>
        </lst>
        <result name="response" numFound="2" start="0" maxScore="58.60105">
          <doc>
            <str name="id">document1</str>
            <float name="score">58.60105</float>
          </doc>
          <doc>
            <str name="id">document2</str>
            <float name="score">14.260552</float>
          </doc>
        </result>
        <lst name="xjoin_test">
          <int name="test_count">145</int>
          <arr name="external">
            <lst>
              <str name="joinId">document1</str>
              <lst name="doc">
                <double name="value">7.4</double>
              </lst>
            </lst>
            <lst name="external">
              <str name="joinId">document2</str>
              <lst name="doc">
                <double name="value">2.3</double>
              </lst>
            </lst>
          </arr>
        </lst>
      </response>
      

      Notes:

      • The actual 'join' is specified by the fq parameter. See XJoinQParserPlugin above.
      • The function test_fn is used in the bf score-boost function. Since the argument is value2, that attribute of the external results is used as the score boost.

      Many-to-many joins

      XJoin supports many-to-many joins in the following two ways.

      Joining against a multi-valued field

      The SOLR field used as the join field may be multi-valued. External join values will match every SOLR document with at least one matching value in the join field. As usual, for every SOLR document in the results set, matching external results are appended. In this case, this includes matching external results with join id values for every value from the multi-valued field. Therefore, there may be many more external results included than the number of SOLR results.

      Many external results with the same join id

      The case of many external results having the same join id is supported by returning a Java Iterable from the implementation of XJoinResults.getResult(joinIdStr). In this case, one <lst name="doc"> is added to the corresponding <lst name="external"> per element in the iterable. For the XJoinValueSourceParser, the maximum value is taken from the set of possible values.

      Joining results from multiple external sources

      There are (at least) 3 different ways XJoin can be used in conjunction with other SOLR features to combine results from more than one external source.

      Multiple filter queries

      Multiple filter queries are ANDed together by SOLR, so if this is the desired combination for external result join ids, this is a simple approach. (Note the implications for filter caching.) In this case, the external join fields do not have to be the same.

      For example (assuming two configured XJoin components, xjoin_test and xjoin_other):

      http://solrserver:8983/solr/collection1/xjoin?q=*:*&xjoin_test.external.q=foobar&xjoin_other.external.q=barfoo&fq={!xjoin}xjoin_test&fq={!xjoin}xjoin_other
      

      Nested queries in the standard SOLR Query Parser

      The nested query syntax of the standard SOLR query parser (see https://wiki.apache.org/solr/SolrQuerySyntax) can be used for more complicated combinations, allowing for "should", "must" etc. Lucene queries to be built from external join id sets. The external join fields do not have to be the same.

      For example (again, assuming two configured XJoin components, xjoin_test and xjoin_other):

      http://solrserver:8983/solr/collection1/xjoin?q=*:*&xjoin_test.external.q=foobar&xjoin_other.external.q=barfoo&fq=_query_:"{!xjoin}xjoin_test" -_query_:"{!xjoin}xjoin_other"
      

      Boolean expressions with the XJoin Query Parser

      To combine external join id sets directly using a Boolean expression, one can use the XJoinQParserPlugin as detailed above. This allows arbitrary Boolean expressions using the operators AND, OR, XOR and AND NOT.

      For example (again, assuming two configured XJoin components, xjoin_test and xjoin_other):

      http://solrserver:8983/solr/collection1/xjoin?q=*:*&xjoin_test.external.q=foobar&xjoin_other.external.q=barfoo&fq={!xjoin}xjoin_test XOR xjoin_other
      

      The SimpleXJoinResultsFactory implementation

      The XJoin plugins accept java.util.Map returned from the results factory, both for the XJoinResults implementation and for the individual results objects themselves. This fact is made use of by the in-built SimpleXJoinResultsFactory, which is an implementation of XJoinResultsFactory that connects to a URL to collect results in XML or JSON format, and uses XPath/JsonPaths to extract field values. This can often be used instead of writing custom Java code.

      The SimpleXJoinResultsFactory takes the following initialisation parameters:

      • type - either XML or JSON
      • rootUrl - the URL to connect to for external results (can be file:// for testing)
      • globalFieldPaths - a list of XPaths or JsonPaths which are used to extract 'global' values (not individual result values)
      • joinIdPath - an XPath or JsonPath that should return an array of join ids extracted from the results
      • joinIdToken - a token used in resultFieldPaths that will be substituted with each join id, usually the default 'JOINID' will suffice
      • resultFieldPaths - a list of XPaths or JsonPaths which are used to extract result values

      Example solrconfig.xml snippet:

        <searchComponent name="xjoin" class="org.apache.solr.search.xjoin.XJoinSearchComponent">
          <str name="factoryClass">org.apache.solr.search.xjoin.simple.SimpleXJoinResultsFactory</str>
          <str name="joinField">id</str>
          <lst name="external">
            <str name="type">JSON</str>
            <str name="rootUrl">http://myserver/endpoint</str>
            <lst name="globalFieldPaths">
              <str name="count">$.length()</str>
            </lst>
            <str name="joinIdPath">$[*].id</str>
            <lst name="resultFieldPaths">
              <str name="field">$[?(@.id == 'JOINID')].field</str>
              <str name="value">$[?(@.id == 'JOINID')].value</str>
            </lst>
          </lst>
        </searchComponent>
      

      Any external SolrParams are turned into URL query string parameters, so for example, including "xjoin.external.q=foo" in the SOLR URL results in the XJoin component making a request to "http://myserver/endpoint?q=foo".

      Attachments

        1. SOLR-7341.patch-4_10
          150 kB
          Tom Winch
        2. SOLR-7341.patch-5_3
          147 kB
          Tom Winch
        3. SOLR-7341.patch-trunk
          135 kB
          Tom Winch
        4. SOLR-7341.patch-trunk
          132 kB
          Tom Winch
        5. SOLR-7341.patch-4.10.3
          217 kB
          Tom Winch
        6. SOLR-7341.patch-5.3.2
          217 kB
          Tom Winch
        7. SOLR-7341.patch-master
          217 kB
          Tom Winch
        8. SOLR-7341.patch-7.2.1
          186 kB
          Tom Mortimer

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Tomjon Tom Winch
              Votes:
              14 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated: