Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9193

Add scoreNodes Streaming Expression

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Resolved
    • Affects Version/s: None
    • Fix Version/s: 6.2
    • Component/s: SolrJ
    • Labels:
      None

      Description

      The scoreNodes Streaming Expression is another GraphExpression. It will decorate a gatherNodes expression and use a tf-idf scoring algorithm to score the nodes.

      The gatherNodes expression only gathers nodes and aggregations. This is similar in nature to tf in search ranking, where the number of times a node appears in the traversal represents the tf. But this skews recommendations towards nodes that appear frequently in the index.

      Using the idf for each node we can score each node as a function of tf-idf. This will provide a boost to nodes that appear less frequently in the index.

      The scoreNodes expression will gather the idf's from the shards for each node emitted by the underlying gatherNodes expression. It will then assign the score to each node.

      The computed score will be added to each node in the nodeScore field. The docFreq of the node across the entire collection will be added to each node in the docFreq field. Other streaming expressions can then perform a ranking based on the nodeScore or compute their own score using the nodeFreq.

      proposed syntax:

      top(n="10",
            sort="nodeScore desc",
            scoreNodes(gatherNodes(...))) 
      
      1. SOLR-9193.patch
        22 kB
        Joel Bernstein

        Issue Links

          Activity

          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          First patch with a working scoreNodes expression. A simple test case is included.

          This builds on the work to the TermsComponent in SOLR-9243.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited First patch with a working scoreNodes expression. A simple test case is included. This builds on the work to the TermsComponent in SOLR-9243 .
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          I'm also planning on making the /terms handler an implicit handler in this ticket.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited I'm also planning on making the /terms handler an implicit handler in this ticket.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I've pushed out my latest work on this to my lucene / solr clone.
          Here is the link to the combined changes for this ticket and SOLR-9243, which this ticket relies on:

          https://github.com/apache/lucene-solr/compare/master...joel-bernstein:master

          Show
          joel.bernstein Joel Bernstein added a comment - I've pushed out my latest work on this to my lucene / solr clone. Here is the link to the combined changes for this ticket and SOLR-9243 , which this ticket relies on: https://github.com/apache/lucene-solr/compare/master...joel-bernstein:master
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Added a new test using the termFreq param and added some error handling. The link above incorporates these changes.

          This ticket is pretty close to being ready. I'll do some testing at scale and see if this turns up any issues.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Added a new test using the termFreq param and added some error handling. The link above incorporates these changes. This ticket is pretty close to being ready. I'll do some testing at scale and see if this turns up any issues.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Pushed out what I think are the last set of changes for this ticket. The link above includes all the changes.

          The manual testing looked very good. Tested scoreNodes with 250 node id's and it takes less then 10 milliseconds to complete.

          I'll probably give this one last review then push out the commits to the apache/lucene-solr.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Pushed out what I think are the last set of changes for this ticket. The link above includes all the changes. The manual testing looked very good. Tested scoreNodes with 250 node id's and it takes less then 10 milliseconds to complete. I'll probably give this one last review then push out the commits to the apache/lucene-solr.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e1f51a20d74daec2521ad8945a9f642f568147aa in lucene-solr's branch refs/heads/master from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e1f51a2 ]

          SOLR-9193: Add scoreNodes Streaming Expression

          Show
          jira-bot ASF subversion and git services added a comment - Commit e1f51a20d74daec2521ad8945a9f642f568147aa in lucene-solr's branch refs/heads/master from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e1f51a2 ] SOLR-9193 : Add scoreNodes Streaming Expression
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 360c4da90b8a416b369f49bc948bfd20338ff39d in lucene-solr's branch refs/heads/master from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=360c4da ]

          SOLR-9193: fixing failing tests due to changes in TermsComponent

          Show
          jira-bot ASF subversion and git services added a comment - Commit 360c4da90b8a416b369f49bc948bfd20338ff39d in lucene-solr's branch refs/heads/master from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=360c4da ] SOLR-9193 : fixing failing tests due to changes in TermsComponent
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ad8b22d0b2a05425fbd51bd01ddb621a1ebe98b4 in lucene-solr's branch refs/heads/master from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ad8b22d ]

          SOLR-9193: Fix conflict between parameters of TermsComponent and json facet API

          Show
          jira-bot ASF subversion and git services added a comment - Commit ad8b22d0b2a05425fbd51bd01ddb621a1ebe98b4 in lucene-solr's branch refs/heads/master from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ad8b22d ] SOLR-9193 : Fix conflict between parameters of TermsComponent and json facet API
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 12741cc933b57bbddc20d10ebca3dd776703498b in lucene-solr's branch refs/heads/master from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=12741cc ]

          SOLR-9193: Added test using the termFreq param and basic error handling

          Show
          jira-bot ASF subversion and git services added a comment - Commit 12741cc933b57bbddc20d10ebca3dd776703498b in lucene-solr's branch refs/heads/master from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=12741cc ] SOLR-9193 : Added test using the termFreq param and basic error handling
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c47344195860750cb5758c1cf1f43b8c26cd3260 in lucene-solr's branch refs/heads/master from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c473441 ]

          SOLR-9193: Added terms.limit and distrib=true params to /terms request

          Show
          jira-bot ASF subversion and git services added a comment - Commit c47344195860750cb5758c1cf1f43b8c26cd3260 in lucene-solr's branch refs/heads/master from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c473441 ] SOLR-9193 : Added terms.limit and distrib=true params to /terms request
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 879a245e4e0b63edaa240e1e138223dd9e86b301 in lucene-solr's branch refs/heads/branch_6x from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=879a245 ]

          SOLR-9193: Add scoreNodes Streaming Expression

          Conflicts:
          solr/core/src/java/org/apache/solr/handler/StreamHandler.java

          Show
          jira-bot ASF subversion and git services added a comment - Commit 879a245e4e0b63edaa240e1e138223dd9e86b301 in lucene-solr's branch refs/heads/branch_6x from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=879a245 ] SOLR-9193 : Add scoreNodes Streaming Expression Conflicts: solr/core/src/java/org/apache/solr/handler/StreamHandler.java
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ed86e014f61474843a8dc064c912d91d51ff5cba in lucene-solr's branch refs/heads/branch_6x from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ed86e01 ]

          SOLR-9193: fixing failing tests due to changes in TermsComponent

          Show
          jira-bot ASF subversion and git services added a comment - Commit ed86e014f61474843a8dc064c912d91d51ff5cba in lucene-solr's branch refs/heads/branch_6x from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ed86e01 ] SOLR-9193 : fixing failing tests due to changes in TermsComponent
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit bc0eac8b6b95bfc4d6cfa612b494fc184cee1a8c in lucene-solr's branch refs/heads/branch_6x from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bc0eac8 ]

          SOLR-9193: Fix conflict between parameters of TermsComponent and json facet API

          Show
          jira-bot ASF subversion and git services added a comment - Commit bc0eac8b6b95bfc4d6cfa612b494fc184cee1a8c in lucene-solr's branch refs/heads/branch_6x from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bc0eac8 ] SOLR-9193 : Fix conflict between parameters of TermsComponent and json facet API
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7a5e6a5f7e479b0950cf0d26484f8789c5aa5fcf in lucene-solr's branch refs/heads/branch_6x from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7a5e6a5 ]

          SOLR-9193: Added test using the termFreq param and basic error handling

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7a5e6a5f7e479b0950cf0d26484f8789c5aa5fcf in lucene-solr's branch refs/heads/branch_6x from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7a5e6a5 ] SOLR-9193 : Added test using the termFreq param and basic error handling
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e27849052ebd7d2314560eb5a1704ca33d442565 in lucene-solr's branch refs/heads/branch_6x from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e278490 ]

          SOLR-9193: Added terms.limit and distrib=true params to /terms request

          Show
          jira-bot ASF subversion and git services added a comment - Commit e27849052ebd7d2314560eb5a1704ca33d442565 in lucene-solr's branch refs/heads/branch_6x from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e278490 ] SOLR-9193 : Added terms.limit and distrib=true params to /terms request
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d9a0eba1a3551b722a700d0fe973ce657b1ce6d8 in lucene-solr's branch refs/heads/master from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d9a0eba ]

          SOLR-9193: Fix-up javdoc

          Show
          jira-bot ASF subversion and git services added a comment - Commit d9a0eba1a3551b722a700d0fe973ce657b1ce6d8 in lucene-solr's branch refs/heads/master from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d9a0eba ] SOLR-9193 : Fix-up javdoc
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2bd6c4ecd774a818168b37e6f09208f8ee4ec45f in lucene-solr's branch refs/heads/master from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2bd6c4e ]

          SOLR-9193,SOLR-9243: update CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2bd6c4ecd774a818168b37e6f09208f8ee4ec45f in lucene-solr's branch refs/heads/master from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2bd6c4e ] SOLR-9193 , SOLR-9243 : update CHANGES.txt
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a86f25ea0c3cb7e1f628d93cfbc4c7b73dbb92a8 in lucene-solr's branch refs/heads/branch_6x from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a86f25e ]

          SOLR-9193: Fix-up javdoc

          Show
          jira-bot ASF subversion and git services added a comment - Commit a86f25ea0c3cb7e1f628d93cfbc4c7b73dbb92a8 in lucene-solr's branch refs/heads/branch_6x from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a86f25e ] SOLR-9193 : Fix-up javdoc
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit de7a3f6f6842af8b211baa4a0291c967932297c1 in lucene-solr's branch refs/heads/branch_6x from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=de7a3f6 ]

          SOLR-9193,SOLR-9243: update CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit de7a3f6f6842af8b211baa4a0291c967932297c1 in lucene-solr's branch refs/heads/branch_6x from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=de7a3f6 ] SOLR-9193 , SOLR-9243 : update CHANGES.txt
          Hide
          mikemccand Michael McCandless added a comment -

          Bulk close resolved issues after 6.2.0 release.

          Show
          mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.
          Hide
          arafalov Alexandre Rafalovitch added a comment -

          I know this issue is closed, but I wanted to check before I open a new one.

          The implicit definition of "/terms" is now:

             "/terms": {
                "class": "solr.SearchHandler",
                "useParams":"_TERMS",
                "components": [
                  "terms"
                ]
              },
          

          This conflicts with all explicit definitions we currently have in solrconfig.xml file:

          <requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
            <lst name="defaults">
              <bool name="terms">true</bool>
              <bool name="distrib">false</bool>
            </lst>
            <arr name="components">
              <str>terms</str>
            </arr>
          </requestHandler>
          

          Specifically, the existing definition is terms=true and distrib=false. As is, we cannot remove those definitions from the solrconfig. Any specific reasons those were not included when this ticket did the implicit definition (especially distrib) or was that just an oversight?

          Show
          arafalov Alexandre Rafalovitch added a comment - I know this issue is closed, but I wanted to check before I open a new one. The implicit definition of "/terms" is now: "/terms": { "class": "solr.SearchHandler", "useParams":"_TERMS", "components": [ "terms" ] }, This conflicts with all explicit definitions we currently have in solrconfig.xml file: <requestHandler name="/terms" class="solr.SearchHandler" startup="lazy"> <lst name="defaults"> <bool name="terms">true</bool> <bool name="distrib">false</bool> </lst> <arr name="components"> <str>terms</str> </arr> </requestHandler> Specifically, the existing definition is terms=true and distrib=false . As is, we cannot remove those definitions from the solrconfig. Any specific reasons those were not included when this ticket did the implicit definition (especially distrib ) or was that just an oversight?
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I think this is more of an oversight. Let's create a new ticket to add the default params. We may have to update the ScoreNodesStream to override the distrib=false param but it may already be sending this param.

          Show
          joel.bernstein Joel Bernstein added a comment - I think this is more of an oversight. Let's create a new ticket to add the default params. We may have to update the ScoreNodesStream to override the distrib=false param but it may already be sending this param.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I just checked and the ScoreNodesStream should not need any adjustment when defaults are added.

          Show
          joel.bernstein Joel Bernstein added a comment - I just checked and the ScoreNodesStream should not need any adjustment when defaults are added.
          Hide
          arafalov Alexandre Rafalovitch added a comment -

          Great. I created SOLR-9607 to add the parameters and cleanup config files.

          Show
          arafalov Alexandre Rafalovitch added a comment - Great. I created SOLR-9607 to add the parameters and cleanup config files.

            People

            • Assignee:
              joel.bernstein Joel Bernstein
              Reporter:
              joel.bernstein Joel Bernstein
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development