Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9193

Add scoreNodes Streaming Expression

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Resolved
    • None
    • 6.2
    • SolrJ
    • None

    Description

      The scoreNodes Streaming Expression is another GraphExpression. It will decorate a gatherNodes expression and use a tf-idf scoring algorithm to score the nodes.

      The gatherNodes expression only gathers nodes and aggregations. This is similar in nature to tf in search ranking, where the number of times a node appears in the traversal represents the tf. But this skews recommendations towards nodes that appear frequently in the index.

      Using the idf for each node we can score each node as a function of tf-idf. This will provide a boost to nodes that appear less frequently in the index.

      The scoreNodes expression will gather the idf's from the shards for each node emitted by the underlying gatherNodes expression. It will then assign the score to each node.

      The computed score will be added to each node in the nodeScore field. The docFreq of the node across the entire collection will be added to each node in the docFreq field. Other streaming expressions can then perform a ranking based on the nodeScore or compute their own score using the nodeFreq.

      proposed syntax:

      top(n="10",
            sort="nodeScore desc",
            scoreNodes(gatherNodes(...))) 
      

      Attachments

        1. SOLR-9193.patch
          22 kB
          Joel Bernstein

        Issue Links

          Activity

            People

              jbernste Joel Bernstein
              jbernste Joel Bernstein
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: