[SOLR-13838] igain query parser generating invalid output - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 8.2
Fix Version/s: None
Component/s: query parsers
Labels:
None
Environment:

The issue is a generic Java defect and therefore will be independent of the operating system or software platform.

Description

Investigating the output from the "features()" stream source, terms are being returned with NaN for the score_f field:

{{    "docs": [}}
{{      {}}
{{        "featureSet_s": "business",}}
{{        "score_f": "NaN",}}
{{        "term_s": "1,011.15",}}
{{        "idf_d": "-Infinity",}}
{{        "index_i": 1,}}
{{        "id": "business_1"}}
{{      },}}
{{      {}}
{{        "featureSet_s": "business",}}
{{        "score_f": "NaN",}}
{{        "term_s": "10.3m",}}
{{        "idf_d": "-Infinity",}}
{{        "index_i": 2,}}
{{        "id": "business_2"}}
{{      },}}
{{      {}}
{{        "featureSet_s": "business",}}
{{        "score_f": "NaN",}}
{{        "term_s": "01",}}
{{        "idf_d": "-Infinity",}}
{{        "index_i": 3,}}
{{        "id": "business_3"}}
{{      },...}}

Looking into{{ org/apache/solr/search/IGainTermsQParserPlugin.java}}, it seems that when a term is not included in the positive or negative documents, the docFreq calculation (docFreq = xc + nc) is 0, which means that subsequent calculations result in NaN (division by 0).

Attached is a patch which skips terms for which docFreq
is 0 in the finish() method of IGainTermsQParserPlugin and this resolves the issues with NaN scores in the features() output.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

IGainTermsQParserPlugin.java.patch
12/Oct/19 04:30
0.7 kB
Peter Davie

Activity

People

Assignee:: Unassigned

Reporter:: Peter Davie

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 12/Oct/19 04:38

Updated:: 21/Jan/20 16:57