Solr
  1. Solr
  2. SOLR-5231

When a boolean field is missing from a doc it is sometimes treated as "true" by the "if" function (based on other docs in segment?)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2, 4.3, 4.3.1, 4.4
    • Fix Version/s: 4.5, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      This issue is hard to explain with out a long example.

      crux of the problem is that the behavior of the if function, wrapped arround a boolean field (ie: "if(fieldName,x,y)" ) is not consistent for documents that do not have any value for that functio – the behavior seems to depend on whether or not other documents in the same segment have a value for that field.

      for brevity, details will follow in a comment - but i've been able to reproduce on trunk, 4.3, and 4.3 (didn't look back farther then that)

      the work around is to explicitly use the exists() function in the if condition (ie: "if(exists(fieldName),x,y)" )

      (Thanks to Elodie Sannier for reporting the initial symptoms of this on the mailing list)

        Issue Links

          Activity

          Hide
          Adrien Grand added a comment -

          4.5 release -> bulk close

          Show
          Adrien Grand added a comment - 4.5 release -> bulk close
          Show
          Hoss Man added a comment - http://svn.apache.org/r1521948 http://svn.apache.org/r1521969
          Hide
          ASF subversion and git services added a comment -

          Commit 1521948 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1521948 ]

          SOLR-5231: Fixed a bug with the behavior of BoolField that caused documents w/o a value for the field to act as if the value were true in functions if no other documents in the same index segment had a value of true.

          Show
          ASF subversion and git services added a comment - Commit 1521948 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1521948 ] SOLR-5231 : Fixed a bug with the behavior of BoolField that caused documents w/o a value for the field to act as if the value were true in functions if no other documents in the same index segment had a value of true.
          Hide
          Robert Muir added a comment -

          So it looks like this was caused by LUCENE-4547

          Caused by complete lack of unit tests for BoolField.

          Show
          Robert Muir added a comment - So it looks like this was caused by LUCENE-4547 Caused by complete lack of unit tests for BoolField.
          Hide
          Yonik Seeley added a comment -

          Note: the code would be simpler if you just changed the initial "tord" value to -2 (that's what was missing in LUCENE-4547 when the change from 1 based ords to 0 based ords was done.)

          Show
          Yonik Seeley added a comment - Note: the code would be simpler if you just changed the initial "tord" value to -2 (that's what was missing in LUCENE-4547 when the change from 1 based ords to 0 based ords was done.)
          Hide
          Yonik Seeley added a comment -

          Nice tracking that down Hoss... that was definitely tricky.
          So it looks like this was caused by LUCENE-4547, and hence a bug since 4.2

          Show
          Yonik Seeley added a comment - Nice tracking that down Hoss... that was definitely tricky. So it looks like this was caused by LUCENE-4547 , and hence a bug since 4.2
          Hide
          Hoss Man added a comment -

          Thanks rob!

          here's a patch with rob's fix, and a test demonstrating that it fixes the problem (and verifing that no other field types seem to be similarly affected)

          i'll commit once i've finished a full test run.

          Show
          Hoss Man added a comment - Thanks rob! here's a patch with rob's fix, and a test demonstrating that it fixes the problem (and verifing that no other field types seem to be similarly affected) i'll commit once i've finished a full test run.
          Hide
          Robert Muir added a comment -

          The bug is BoolField.BoolDocValues.

          look at its implementation of boolVal.

          If there are no values in the segment at all "trueOrd" is -1 (which is bogus).

          Index: solr/core/src/java/org/apache/solr/schema/BoolField.java
          ===================================================================
          --- solr/core/src/java/org/apache/solr/schema/BoolField.java	(revision 1521538)
          +++ solr/core/src/java/org/apache/solr/schema/BoolField.java	(working copy)
          @@ -182,7 +182,8 @@
                 }
               }
           
          -    final int trueOrd = tord;
          +    // if there were no values in the segment, dont let trueOrd be -1 (missing)
          +    final int trueOrd = tord >= 0 ? tord : -2;
           
               return new BoolDocValues(this) {
                 @Override
          
          Show
          Robert Muir added a comment - The bug is BoolField.BoolDocValues. look at its implementation of boolVal. If there are no values in the segment at all "trueOrd" is -1 (which is bogus). Index: solr/core/src/java/org/apache/solr/schema/BoolField.java =================================================================== --- solr/core/src/java/org/apache/solr/schema/BoolField.java (revision 1521538) +++ solr/core/src/java/org/apache/solr/schema/BoolField.java (working copy) @@ -182,7 +182,8 @@ } } - final int trueOrd = tord; + // if there were no values in the segment, dont let trueOrd be -1 (missing) + final int trueOrd = tord >= 0 ? tord : -2; return new BoolDocValues(this) { @Override
          Hide
          Hoss Man added a comment -

          Start with a completely empty index, using the example configs – the only thing that probably really matters is that you have a boolean field named "inStock" defined.

          Add a single document w/o the inStock field and commit...

          $ java -Ddata=args -Durl=http://localhost:8983/solr/collection1/update -jar post.jar '<add><doc><field name="id">NOVAL</field></doc></add>'
          SimplePostTool version 1.5
          POSTing args to http://localhost:8983/solr/collection1/update..
          COMMITting Solr index changes to http://localhost:8983/solr/collection1/update..
          Time spent: 0:00:00.451
          $ curl 'http://localhost:8983/solr/collection1/select?indent=true&q=*:*&fl=id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)'
          <?xml version="1.0" encoding="UTF-8"?>
          <response>
          
          <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">20</int>
            <lst name="params">
              <str name="fl">id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)</str>
              <str name="indent">true</str>
              <str name="q">*:*</str>
            </lst>
          </lst>
          <result name="response" numFound="1" start="0">
            <doc>
              <str name="id">NOVAL</str>
              <bool name="exists(inStock)">false</bool>
              <long name="if(inStock,42,-99)">42</long>
              <long name="if(exists(inStock),42,-99)">-99</long></doc>
          </result>
          </response>
          

          note that the exists() function behalves correctly, but the if() function does not (unless it is used to wrap exists()

          next, add some more docs that do in fact have values...

          $ java -Ddata=args -Durl=http://localhost:8983/solr/collection1/update -jar post.jar '<add><doc><field name="id">FALSE</field><field name="inStock">false</field></doc></add>'SimplePostTool version 1.5
          POSTing args to http://localhost:8983/solr/collection1/update..
          COMMITting Solr index changes to http://localhost:8983/solr/collection1/update..
          Time spent: 0:00:00.356
          $ curl 'http://localhost:8983/solr/collection1/select?indent=true&q=*:*&fl=id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)'
          <?xml version="1.0" encoding="UTF-8"?>
          <response>
          
          <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">1</int>
            <lst name="params">
              <str name="fl">id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)</str>
              <str name="indent">true</str>
              <str name="q">*:*</str>
            </lst>
          </lst>
          <result name="response" numFound="2" start="0">
            <doc>
              <str name="id">NOVAL</str>
              <bool name="exists(inStock)">false</bool>
              <long name="if(inStock,42,-99)">42</long>
              <long name="if(exists(inStock),42,-99)">-99</long></doc>
            <doc>
              <str name="id">FALSE</str>
              <bool name="inStock">false</bool>
              <bool name="exists(inStock)">true</bool>
              <long name="if(inStock,42,-99)">-99</long>
              <long name="if(exists(inStock),42,-99)">42</long></doc>
          </result>
          </response>
          $ java -Ddata=args -Durl=http://localhost:8983/solr/collection1/update -jar post.jar '<add><doc><field name="id">TRUE</field><field name="inStock">true</field></doc></add>'
          SimplePostTool version 1.5
          POSTing args to http://localhost:8983/solr/collection1/update..
          COMMITting Solr index changes to http://localhost:8983/solr/collection1/update..
          Time spent: 0:00:00.356
          $ curl 'http://localhost:8983/solr/collection1/select?indent=true&q=*:*&fl=id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)'
          <?xml version="1.0" encoding="UTF-8"?>
          <response>
          
          <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">2</int>
            <lst name="params">
              <str name="fl">id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)</str>
              <str name="indent">true</str>
              <str name="q">*:*</str>
            </lst>
          </lst>
          <result name="response" numFound="3" start="0">
            <doc>
              <str name="id">NOVAL</str>
              <bool name="exists(inStock)">false</bool>
              <long name="if(inStock,42,-99)">42</long>
              <long name="if(exists(inStock),42,-99)">-99</long></doc>
            <doc>
              <str name="id">FALSE</str>
              <bool name="inStock">false</bool>
              <bool name="exists(inStock)">true</bool>
              <long name="if(inStock,42,-99)">-99</long>
              <long name="if(exists(inStock),42,-99)">42</long></doc>
            <doc>
              <str name="id">TRUE</str>
              <bool name="inStock">true</bool>
              <bool name="exists(inStock)">true</bool>
              <long name="if(inStock,42,-99)">42</long>
              <long name="if(exists(inStock),42,-99)">42</long></doc>
          </result>
          </response>
          

          note that each time we add a doc, the functions behalf as expected for these new docs, but still the if(inStock,...) on our NOVAL doc is not behaving.

          next, let's optimize to put all the docs in a single segment...

          $ java -Ddata=args -Durl=http://localhost:8983/solr/collection1/update -jar post.jar '<optimize/>'SimplePostTool version 1.5
          POSTing args to http://localhost:8983/solr/collection1/update..
          COMMITting Solr index changes to http://localhost:8983/solr/collection1/update..
          Time spent: 0:00:00.382
          $ curl 'http://localhost:8983/solr/collection1/select?indent=true&q=*:*&fl=id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)'
          <?xml version="1.0" encoding="UTF-8"?>
          <response>
          
          <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">2</int>
            <lst name="params">
              <str name="fl">id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)</str>
              <str name="indent">true</str>
              <str name="q">*:*</str>
            </lst>
          </lst>
          <result name="response" numFound="3" start="0">
            <doc>
              <str name="id">FALSE</str>
              <bool name="inStock">false</bool>
              <bool name="exists(inStock)">true</bool>
              <long name="if(inStock,42,-99)">-99</long>
              <long name="if(exists(inStock),42,-99)">42</long></doc>
            <doc>
              <str name="id">TRUE</str>
              <bool name="inStock">true</bool>
              <bool name="exists(inStock)">true</bool>
              <long name="if(inStock,42,-99)">42</long>
              <long name="if(exists(inStock),42,-99)">42</long></doc>
            <doc>
              <str name="id">NOVAL</str>
              <bool name="exists(inStock)">false</bool>
              <long name="if(inStock,42,-99)">-99</long>
              <long name="if(exists(inStock),42,-99)">-99</long></doc>
          </result>
          </response>
          

          Now suddenly our NOVAL doc behaves as expected, and if function returns the false clause.

          Show
          Hoss Man added a comment - Start with a completely empty index, using the example configs – the only thing that probably really matters is that you have a boolean field named "inStock" defined. Add a single document w/o the inStock field and commit... $ java -Ddata=args -Durl=http://localhost:8983/solr/collection1/update -jar post.jar '<add><doc><field name="id">NOVAL</field></doc></add>' SimplePostTool version 1.5 POSTing args to http://localhost:8983/solr/collection1/update.. COMMITting Solr index changes to http://localhost:8983/solr/collection1/update.. Time spent: 0:00:00.451 $ curl 'http://localhost:8983/solr/collection1/select?indent=true&q=*:*&fl=id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">20</int> <lst name="params"> <str name="fl">id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)</str> <str name="indent">true</str> <str name="q">*:*</str> </lst> </lst> <result name="response" numFound="1" start="0"> <doc> <str name="id">NOVAL</str> <bool name="exists(inStock)">false</bool> <long name="if(inStock,42,-99)">42</long> <long name="if(exists(inStock),42,-99)">-99</long></doc> </result> </response> note that the exists() function behalves correctly, but the if() function does not (unless it is used to wrap exists() next, add some more docs that do in fact have values... $ java -Ddata=args -Durl=http://localhost:8983/solr/collection1/update -jar post.jar '<add><doc><field name="id">FALSE</field><field name="inStock">false</field></doc></add>'SimplePostTool version 1.5 POSTing args to http://localhost:8983/solr/collection1/update.. COMMITting Solr index changes to http://localhost:8983/solr/collection1/update.. Time spent: 0:00:00.356 $ curl 'http://localhost:8983/solr/collection1/select?indent=true&q=*:*&fl=id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="fl">id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)</str> <str name="indent">true</str> <str name="q">*:*</str> </lst> </lst> <result name="response" numFound="2" start="0"> <doc> <str name="id">NOVAL</str> <bool name="exists(inStock)">false</bool> <long name="if(inStock,42,-99)">42</long> <long name="if(exists(inStock),42,-99)">-99</long></doc> <doc> <str name="id">FALSE</str> <bool name="inStock">false</bool> <bool name="exists(inStock)">true</bool> <long name="if(inStock,42,-99)">-99</long> <long name="if(exists(inStock),42,-99)">42</long></doc> </result> </response> $ java -Ddata=args -Durl=http://localhost:8983/solr/collection1/update -jar post.jar '<add><doc><field name="id">TRUE</field><field name="inStock">true</field></doc></add>' SimplePostTool version 1.5 POSTing args to http://localhost:8983/solr/collection1/update.. COMMITting Solr index changes to http://localhost:8983/solr/collection1/update.. Time spent: 0:00:00.356 $ curl 'http://localhost:8983/solr/collection1/select?indent=true&q=*:*&fl=id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">2</int> <lst name="params"> <str name="fl">id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)</str> <str name="indent">true</str> <str name="q">*:*</str> </lst> </lst> <result name="response" numFound="3" start="0"> <doc> <str name="id">NOVAL</str> <bool name="exists(inStock)">false</bool> <long name="if(inStock,42,-99)">42</long> <long name="if(exists(inStock),42,-99)">-99</long></doc> <doc> <str name="id">FALSE</str> <bool name="inStock">false</bool> <bool name="exists(inStock)">true</bool> <long name="if(inStock,42,-99)">-99</long> <long name="if(exists(inStock),42,-99)">42</long></doc> <doc> <str name="id">TRUE</str> <bool name="inStock">true</bool> <bool name="exists(inStock)">true</bool> <long name="if(inStock,42,-99)">42</long> <long name="if(exists(inStock),42,-99)">42</long></doc> </result> </response> note that each time we add a doc, the functions behalf as expected for these new docs, but still the if(inStock,...) on our NOVAL doc is not behaving. next, let's optimize to put all the docs in a single segment... $ java -Ddata=args -Durl=http://localhost:8983/solr/collection1/update -jar post.jar '<optimize/>'SimplePostTool version 1.5 POSTing args to http://localhost:8983/solr/collection1/update.. COMMITting Solr index changes to http://localhost:8983/solr/collection1/update.. Time spent: 0:00:00.382 $ curl 'http://localhost:8983/solr/collection1/select?indent=true&q=*:*&fl=id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">2</int> <lst name="params"> <str name="fl">id,inStock,exists(inStock),if(inStock,42,-99),if(exists(inStock),42,-99)</str> <str name="indent">true</str> <str name="q">*:*</str> </lst> </lst> <result name="response" numFound="3" start="0"> <doc> <str name="id">FALSE</str> <bool name="inStock">false</bool> <bool name="exists(inStock)">true</bool> <long name="if(inStock,42,-99)">-99</long> <long name="if(exists(inStock),42,-99)">42</long></doc> <doc> <str name="id">TRUE</str> <bool name="inStock">true</bool> <bool name="exists(inStock)">true</bool> <long name="if(inStock,42,-99)">42</long> <long name="if(exists(inStock),42,-99)">42</long></doc> <doc> <str name="id">NOVAL</str> <bool name="exists(inStock)">false</bool> <long name="if(inStock,42,-99)">-99</long> <long name="if(exists(inStock),42,-99)">-99</long></doc> </result> </response> Now suddenly our NOVAL doc behaves as expected, and if function returns the false clause.

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development