The scoreNodes Streaming Expression is another GraphExpression. It will decorate a gatherNodes expression and use a tf-idf scoring algorithm to score the nodes.
The gatherNodes expression only gathers nodes and aggregations. This is similar in nature to tf in search ranking, where the number of times a node appears in the traversal represents the tf. But this skews recommendations towards nodes that appear frequently in the index.
Using the idf for each node we can score each node as a function of tf-idf. This will provide a boost to nodes that appear less frequently in the index.
The scoreNodes expression will gather the idf's from the shards for each node emitted by the underlying gatherNodes expression. It will then assign the score to each node.
The computed score will be added to each node in the nodeScore field. The docFreq of the node across the entire collection will be added to each node in the docFreq field. Other streaming expressions can then perform a ranking based on the nodeScore or compute their own score using the nodeFreq.
top(n="10", sort="nodeScore desc", scoreNodes(gatherNodes(...)))
- depends upon
SOLR-9243 Add terms.list parameter to the TermsComponent to fetch the docFreq for a list of terms
- is related to
SOLR-14036 TermsComponent distributed search (shards) doesn't work with SolrCloud