Solr
  1. Solr
  2. SOLR-6540

strdist() causes NPE if doc is missing field

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      If you try to use the strdist function on a field which is missing in some docs, you'll get a NullPointerException

      A workarround in some contexts can be to wrap the strdist function in an "if" that checks exists(fieldname) and returns some suitable default if it's not found.

      THIS:           if(exists(field_name_s),strdist("literal",field_name_s,edit),0)
      INSTEAD OF:     strdist("literal",field_name_s,edit)
      

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Steps to reproduce...

          hossman@frisbee:~$ curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/collection1/update?commit=true' --data-binary '[{"id":"1","foo_s":"yak"},{"id":"2","foo_s":"zak"}]'
          {"responseHeader":{"status":0,"QTime":369}}
          hossman@frisbee:~$ curl 'http://localhost:8983/solr/collection1/select?q=*:*&indent=true&wt=json&fl=id,strdist("ack",foo_s,edit)'{
            "responseHeader":{
              "status":0,
              "QTime":15,
              "params":{
                "fl":"id,strdist(\"ack\",foo_s,edit)",
                "indent":"true",
                "q":"*:*",
                "wt":"json"}},
            "response":{"numFound":2,"start":0,"docs":[
                {
                  "id":"1",
                  "strdist(\"ack\",foo_s,edit)":0.3333333},
                {
                  "id":"2",
                  "strdist(\"ack\",foo_s,edit)":0.3333333}]
            }}
          hossman@frisbee:~$ curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/collection1/update?commit=true' --data-binary '[{"id":"3"}]'
          {"responseHeader":{"status":0,"QTime":329}}
          hossman@frisbee:~$ curl 'http://localhost:8983/solr/collection1/select?q=*:*&indent=true&wt=json&fl=id,strdist("ack",foo_s,edit)'
          
          ... ERROR!
          
          java.lang.NullPointerException
          	at org.apache.lucene.search.spell.LevensteinDistance.getDistance(LevensteinDistance.java:66)
          	at org.apache.solr.search.function.distance.StringDistanceFunction$1.floatVal(StringDistanceFunction.java:54)
          	at org.apache.lucene.queries.function.docvalues.FloatDocValues.objectVal(FloatDocValues.java:71)
          	at org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:99)
          	at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:254)
          	at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172)
          

          A quick glance suggests that hte root cause is StringDistanceFunction.getValues's anonymous inner subclass of FloatDocValues (aka: StringDistanceFunction$1). Either str1DV.strVal or str1DV.strVal can return null (in which case both of their exists() methods should have returned false, but i haven't verified that) and they do in fact return null when dealing with string fields that may not always have a value.

          for the particular example shown above, adding an exists() impl to StringDistanceFunction$1 should prevent ValueSourceAugmenter from ever calling floatVal().

          But the question remains as to what floatVal() should return if/when it is called in this situation? Infinity? NaN? 0.0F?

          Show
          Hoss Man added a comment - Steps to reproduce... hossman@frisbee:~$ curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/collection1/update?commit=true' --data-binary '[{"id":"1","foo_s":"yak"},{"id":"2","foo_s":"zak"}]' {"responseHeader":{"status":0,"QTime":369}} hossman@frisbee:~$ curl 'http://localhost:8983/solr/collection1/select?q=*:*&indent=true&wt=json&fl=id,strdist("ack",foo_s,edit)'{ "responseHeader":{ "status":0, "QTime":15, "params":{ "fl":"id,strdist(\"ack\",foo_s,edit)", "indent":"true", "q":"*:*", "wt":"json"}}, "response":{"numFound":2,"start":0,"docs":[ { "id":"1", "strdist(\"ack\",foo_s,edit)":0.3333333}, { "id":"2", "strdist(\"ack\",foo_s,edit)":0.3333333}] }} hossman@frisbee:~$ curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/collection1/update?commit=true' --data-binary '[{"id":"3"}]' {"responseHeader":{"status":0,"QTime":329}} hossman@frisbee:~$ curl 'http://localhost:8983/solr/collection1/select?q=*:*&indent=true&wt=json&fl=id,strdist("ack",foo_s,edit)' ... ERROR! java.lang.NullPointerException at org.apache.lucene.search.spell.LevensteinDistance.getDistance(LevensteinDistance.java:66) at org.apache.solr.search.function.distance.StringDistanceFunction$1.floatVal(StringDistanceFunction.java:54) at org.apache.lucene.queries.function.docvalues.FloatDocValues.objectVal(FloatDocValues.java:71) at org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:99) at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:254) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172) A quick glance suggests that hte root cause is StringDistanceFunction.getValues's anonymous inner subclass of FloatDocValues (aka: StringDistanceFunction$1). Either str1DV.strVal or str1DV.strVal can return null (in which case both of their exists() methods should have returned false, but i haven't verified that) and they do in fact return null when dealing with string fields that may not always have a value. for the particular example shown above, adding an exists() impl to StringDistanceFunction$1 should prevent ValueSourceAugmenter from ever calling floatVal(). But the question remains as to what floatVal() should return if/when it is called in this situation? Infinity? NaN? 0.0F?
          Hide
          Hoss Man added a comment -

          discovered while working on SOLR-6354 – that issue is adding some work arounds to StatsComponentTest (grep for SOLR-6540) that should be cleaned up when resolving this.

          Show
          Hoss Man added a comment - discovered while working on SOLR-6354 – that issue is adding some work arounds to StatsComponentTest (grep for SOLR-6540 ) that should be cleaned up when resolving this.
          Hide
          Hoss Man added a comment -

          But the question remains as to what floatVal() should return if/when it is called in this situation? Infinity? NaN? 0.0F?

          based on the existing StringDistnace contract, the choice i went with...

              // if a ValueSource is missing, it is maximally distant from every other
              // value source *except* for another missing value source 
              // ie: strdist(null,null)==1 but strdist(null,anything)==0
          

          So the exists() method on the strdist() value source returns false if the exists() method on either of hte sub-valuesources returns false – but in cases where a number is asked for anyway (ie: query score context) then 0, unless both are missing in which case they are identical and hte number is 1.

          new tests pass, previously commented out bits of StatsComponentTest passes ... still running full test sweet but i think this is good to go.

          Show
          Hoss Man added a comment - But the question remains as to what floatVal() should return if/when it is called in this situation? Infinity? NaN? 0.0F? based on the existing StringDistnace contract, the choice i went with... // if a ValueSource is missing, it is maximally distant from every other // value source *except* for another missing value source // ie: strdist(null,null)==1 but strdist(null,anything)==0 So the exists() method on the strdist() value source returns false if the exists() method on either of hte sub-valuesources returns false – but in cases where a number is asked for anyway (ie: query score context) then 0, unless both are missing in which case they are identical and hte number is 1. new tests pass, previously commented out bits of StatsComponentTest passes ... still running full test sweet but i think this is good to go.
          Hide
          ASF subversion and git services added a comment -

          Commit 1631555 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1631555 ]

          SOLR-6540 Fix NPE from strdist() func when doc value source does not exist in a doc

          Show
          ASF subversion and git services added a comment - Commit 1631555 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1631555 ] SOLR-6540 Fix NPE from strdist() func when doc value source does not exist in a doc
          Hide
          ASF subversion and git services added a comment -

          Commit 1631592 from hossman@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1631592 ]

          SOLR-6540 Fix NPE from strdist() func when doc value source does not exist in a doc (merge r1631555)

          Show
          ASF subversion and git services added a comment - Commit 1631592 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1631592 ] SOLR-6540 Fix NPE from strdist() func when doc value source does not exist in a doc (merge r1631555)
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.

            People

            • Assignee:
              Unassigned
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development