Solr
  1. Solr
  2. SOLR-1785

Handle +/-Inf, NaN when scoring

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Scores of -Inf or NaN being collected can cause exceptions.

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment - - edited

          In Solr 1.3 and before, +-Inf were handled normally, and scores of NaN caused the relative ordering of other documents to be mixed up.
          The new lucene collectors used in 1.4 can't all handle -Inf and NaN (they can return Integer.MAX_VAL to be returned as a docid), leading to exceptions as seen in SOLR-1778. Because of this Solr's function queries normalized their output to exclude -Inf and NaN. Unfortunately, this is not sufficient, because non-infinite scores can combine with a boolean query into an infinite score. And once you have an infinite score, a simple multiplication by zero will yield a NaN.

          example: http://localhost:8983/solr/select?fl=id,score&q=_val_:"-3e38"+val:"-3e38"

          Properly handing -Inf is an easy fix. The bigger question is how to handle NaN.
          We could:
          1) punt and realize that any NaNs will mess up the ordering of all other documents for that request
          2) Move the FunctionQuery normalization that changes -Inf and NaN into -Float.MAX_VALUE to right before collection (probably with a wrapper collector). This would preserve the ordering of all the other documents, at the cost of a little performance, and information loss (the fact that there was a NaN or -Inf).
          3) Fix -Inf handling, and normalize NaN to -Inf
          4) Completely order NaNs (probably after -Inf)... This keeps the most information, but would require implementing a custom comparator for score sorting (for anything other than a simple score desc).

          Show
          Yonik Seeley added a comment - - edited In Solr 1.3 and before, +-Inf were handled normally, and scores of NaN caused the relative ordering of other documents to be mixed up. The new lucene collectors used in 1.4 can't all handle -Inf and NaN (they can return Integer.MAX_VAL to be returned as a docid), leading to exceptions as seen in SOLR-1778 . Because of this Solr's function queries normalized their output to exclude -Inf and NaN. Unfortunately, this is not sufficient, because non-infinite scores can combine with a boolean query into an infinite score. And once you have an infinite score, a simple multiplication by zero will yield a NaN. example: http://localhost:8983/solr/select?fl=id,score&q=_val_: "-3e38"+ val :"-3e38" Properly handing -Inf is an easy fix. The bigger question is how to handle NaN. We could: 1) punt and realize that any NaNs will mess up the ordering of all other documents for that request 2) Move the FunctionQuery normalization that changes -Inf and NaN into -Float.MAX_VALUE to right before collection (probably with a wrapper collector). This would preserve the ordering of all the other documents, at the cost of a little performance, and information loss (the fact that there was a NaN or -Inf). 3) Fix -Inf handling, and normalize NaN to -Inf 4) Completely order NaNs (probably after -Inf)... This keeps the most information, but would require implementing a custom comparator for score sorting (for anything other than a simple score desc).
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Hoss Man added a comment -

          Bulk changing fixVersion 3.6 to 4.0 for any open issues that are unassigned and have not been updated since March 19.

          Email spam suppressed for this bulk edit; search for hoss20120323nofix36 to identify all issues edited

          Show
          Hoss Man added a comment - Bulk changing fixVersion 3.6 to 4.0 for any open issues that are unassigned and have not been updated since March 19. Email spam suppressed for this bulk edit; search for hoss20120323nofix36 to identify all issues edited
          Hide
          Hoss Man added a comment -

          Removing fix version since this issue hasn't gotten much attention lately and doesn't appear to be a priority for anyone for 4.0.

          As always: if someone wants to take on this work they are welcome to do so at any time and the target release can be revisited

          Show
          Hoss Man added a comment - Removing fix version since this issue hasn't gotten much attention lately and doesn't appear to be a priority for anyone for 4.0. As always: if someone wants to take on this work they are welcome to do so at any time and the target release can be revisited

            People

            • Assignee:
              Unassigned
              Reporter:
              Yonik Seeley
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Development