Solr
  1. Solr
  2. SOLR-2829

Filter queries have false-positive matches. Exposed by user's list titled "Regarding geodist and multiple location fields"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.4, 4.0-ALPHA
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: search
    • Labels:
      None
    • Environment:

      N/A

      Description

      I don't know how generic this is, whether it's just a
      problem with fqs when combined with spatial or whether
      it has wider applicability , but here's what I know so far.

      Marc Tinnemeyer in a post titled:

      "Regarding geodist and multiple location fields"
      outlines this. I checked this on 3.4 and trunk and it's
      weird in both cases.

      HOLD THE PRESSES:
      After looking at this a bit more, it looks like a caching
      issue, NOT a geodist issue. When I bounce Solr
      between changing the sfield from "home" to "work",
      it seems to work as expected.

      Hmmmm, very strange. If I comment out BOTH
      the filterCache and queryResultCache then it works
      fine. Switching from "home" to "work" in the query
      finds/fails to find the document.
      But commenting out only one of those caches
      doesn't fix the problem.

      on trunk I used this query; just flipping "home" to "work" and back:
      http://localhost:8983/solr/select?q=id:1&fq=

      {!geofilt sfield=home pt=52.67,7.30 d=5}

      The info below is what I used to test.

      From Marc's posts:

      <field name="home" type="location" indexed="true" stored="true"/>
      <field name="work" type="location" indexed="true" stored="true"/>
      <field name="elsewhere" type="location" indexed="true" stored="true"/>

      At first I thought so too. Here is a simple document.

      <add>
      <doc>
      <field name="id">1</field>
      <field name="name">first</field>
      <field name="work">48.60,11.61</field>
      <field name="home">52.67,7.30</field>
      </doc>
      </add>

      and here is the result that shouldn't be:

      <response>
      ...
      <str name="q">:</str>
      <str name="fq">

      {!geofilt sfield=work pt=52.67,7.30 d=5}

      </str>
      ...
      </lst>
      </lst>
      <result name="response" numFound="1" start="0">
      <doc>
      <str name="home">52.67,7.30</str>
      <str name="id">1</str>
      <str name="name">first</str>
      <str name="work">48.60,11.61</str>
      </doc>
      </result>
      </response>

      ***Yonik's comment*****

      It's going to be a bug in an equals() implementation somewhere in the query.
      The top level equals will be SpatialDistanceQuery.equals() (from
      LatLonField.java)

      On trunk, I already see a bug introduced when the new lucene field
      cache stuff was done.
      DoubleValueSource now just inherits it's equals method from
      NumericFieldCacheSource... and that equals() method only tests if the
      CachedArrayCreator.getClass() is the same! That's definitely wrong.

      I don't know why 3x would also have this behavior (unless there's more
      than one bug!)
      Anyway, first step is to modify the spatial tests to catch the bug...
      from there it should be pretty easy to debug.

      1. SOLR-2829.patch
        2 kB
        Emmanuel Espina
      2. SOLR-2829.patch
        3 kB
        Hoss Man
      3. SOLR-2829.patch
        4 kB
        Hoss Man
      4. SOLR-2829.patch
        15 kB
        Erick Erickson
      5. SOLR-2829.patch
        11 kB
        Erick Erickson
      6. SOLR-2829.patch
        11 kB
        Erick Erickson
      7. SOLR-2829-3x.patch
        16 kB
        Erick Erickson

        Activity

        Hide
        Erick Erickson added a comment -

        Right, just adding this:

        && this.origField.equals(other.origField)

        to LatLonType.equals fixes the problem, but I really can't pursue it further just now; I don't have time to really look at whether this is the right place to put this, nor whether this should be here, nor whether it matches FQs it shouldn't. A quick glance at the test code doesn't show anything happening for testing equals on a LatLon type, but I'm not looking very carefully this morning....

        Show
        Erick Erickson added a comment - Right, just adding this: && this.origField.equals(other.origField) to LatLonType.equals fixes the problem, but I really can't pursue it further just now; I don't have time to really look at whether this is the right place to put this, nor whether this should be here, nor whether it matches FQs it shouldn't. A quick glance at the test code doesn't show anything happening for testing equals on a LatLon type, but I'm not looking very carefully this morning....
        Hide
        Emmanuel Espina added a comment -

        I modified the tests to reproduce the issue in the mailing list.
        The suggestion Erick made about adding this.origField.equals(other.origField) solves the problem. That line is included in the patch.

        Show
        Emmanuel Espina added a comment - I modified the tests to reproduce the issue in the mailing list. The suggestion Erick made about adding this.origField.equals(other.origField) solves the problem. That line is included in the patch.
        Hide
        Yonik Seeley added a comment -

        Although adding the field will solve this specific problem, there is still the underlying bug this uncovered. The value sources should not have compared as equal.

        && this.lonSource.equals(other.lonSource)
        && this.latSource.equals(other.latSource)
        
        Show
        Yonik Seeley added a comment - Although adding the field will solve this specific problem, there is still the underlying bug this uncovered. The value sources should not have compared as equal. && this .lonSource.equals(other.lonSource) && this .latSource.equals(other.latSource)
        Hide
        Hoss Man added a comment -

        I don't really understand yonik's comment at all, but the patch looks good to me (especially since it includes a test!)

        I did tweak it a bit to ensure that SpatialDistanceQuery calls super.equals(o) for safety.

        Show
        Hoss Man added a comment - I don't really understand yonik's comment at all, but the patch looks good to me (especially since it includes a test!) I did tweak it a bit to ensure that SpatialDistanceQuery calls super.equals(o) for safety.
        Hide
        Yonik Seeley added a comment -

        I don't really understand yonik's comment at all

        the equals() method in question includes the following:

        && this.lonSource.equals(other.lonSource)
        && this.latSource.equals(other.latSource)
        

        latSource and lonSource are ValueSources derived from the field. They should not compare equal when derived from a different field. Hence there is a value source / function query issue here somewhere (the real bug).

        Show
        Yonik Seeley added a comment - I don't really understand yonik's comment at all the equals() method in question includes the following: && this .lonSource.equals(other.lonSource) && this .latSource.equals(other.latSource) latSource and lonSource are ValueSources derived from the field. They should not compare equal when derived from a different field. Hence there is a value source / function query issue here somewhere (the real bug).
        Hide
        Yonik Seeley added a comment -

        I don't really understand yonik's comment at all

        the equals() method in question includes the following:

        && this.lonSource.equals(other.lonSource)
        && this.latSource.equals(other.latSource)
        

        latSource and lonSource are ValueSources derived from the field. They should not compare equal when derived from a different field. Hence there is a value source / function query issue here somewhere (the real bug).

        Show
        Yonik Seeley added a comment - I don't really understand yonik's comment at all the equals() method in question includes the following: && this .lonSource.equals(other.lonSource) && this .latSource.equals(other.latSource) latSource and lonSource are ValueSources derived from the field. They should not compare equal when derived from a different field. Hence there is a value source / function query issue here somewhere (the real bug).
        Hide
        Hoss Man added a comment -

        ah .. ok. i see what you mean now.

        pretty sure the bug is that NumericFieldCacheSource.equals is explicitly checking that the class of the (CachedArrayCreator)creator's are equal, w/o every actually checking that this.creator.equals(that.creator).

        (CachedArrayCreator extends EntityCreator implemsnt equals() based on getCacheKey, and CachedArrayCreator uses it's class, the array type, and the field name in it's EntryKey)

        that NumericFieldCacheSource.equals is so sketchy looking it almost seems like someone was deliberately choosing to only do a class equality check instead of deep equality checking - but i can't fathom why.

        Show
        Hoss Man added a comment - ah .. ok. i see what you mean now. pretty sure the bug is that NumericFieldCacheSource.equals is explicitly checking that the class of the (CachedArrayCreator)creator's are equal, w/o every actually checking that this.creator.equals(that.creator). (CachedArrayCreator extends EntityCreator implemsnt equals() based on getCacheKey, and CachedArrayCreator uses it's class, the array type, and the field name in it's EntryKey) that NumericFieldCacheSource.equals is so sketchy looking it almost seems like someone was deliberately choosing to only do a class equality check instead of deep equality checking - but i can't fathom why.
        Hide
        Yonik Seeley added a comment -

        pretty sure the bug is that NumericFieldCacheSource.equals is explicitly checking that the class of the (CachedArrayCreator)creator's are equal, w/o every actually checking that this.creator.equals(that.creator).

        That was my first thought (for trunk at least) - but I missed the "super.equals(other)" part... and the parent class does have "this.field.equals(other.field)". But given that we have a testcase that reproduces the bug, a simple debugging session should quickly show what's up.

        Show
        Yonik Seeley added a comment - pretty sure the bug is that NumericFieldCacheSource.equals is explicitly checking that the class of the (CachedArrayCreator)creator's are equal, w/o every actually checking that this.creator.equals(that.creator). That was my first thought (for trunk at least) - but I missed the "super.equals(other)" part... and the parent class does have "this.field.equals(other.field)". But given that we have a testcase that reproduces the bug, a simple debugging session should quickly show what's up.
        Hide
        Hoss Man added a comment -

        Oh man ... it's a fucking precendence problem.

        "&&" binds more tightly then "?:" ... so NumericFieldCacheSource.equals() method has been returning false anytime the objects are actually equal, and true anytime the creators classes are equal.

        wonder how badly that's been fucking up cache hit rates.

        updated patch makes the test work by fixing NumericFieldCacheSource w/o an explicit field check in LatLonType, but i still think we need to fix all these equals – right now they read about as cleaning as smeared shit:

        • every class should call super.equals first
        • base classes should be checking 'this.getClass == o.getClass()' (not just 'o instanceof FieldCacheSource') so that subclasses don't have to duplicate class checking
        • classes with complex members (like NumericFieldCacheSource's CachedArrayCreator<T> creator) should delegate to the member's .equals method, not check the member's properties directly.

        (unless you know of some reason why NumericFieldCacheSource should only care about equality on this.creator.getClass() instead of this.creator ?)

        Show
        Hoss Man added a comment - Oh man ... it's a fucking precendence problem. "&&" binds more tightly then "?:" ... so NumericFieldCacheSource.equals() method has been returning false anytime the objects are actually equal, and true anytime the creators classes are equal. wonder how badly that's been fucking up cache hit rates. updated patch makes the test work by fixing NumericFieldCacheSource w/o an explicit field check in LatLonType, but i still think we need to fix all these equals – right now they read about as cleaning as smeared shit: every class should call super.equals first base classes should be checking ' this.getClass == o.getClass() ' (not just ' o instanceof FieldCacheSource ') so that subclasses don't have to duplicate class checking classes with complex members (like NumericFieldCacheSource's CachedArrayCreator<T> creator) should delegate to the member's .equals method, not check the member's properties directly. (unless you know of some reason why NumericFieldCacheSource should only care about equality on this.creator.getClass() instead of this.creator ?)
        Hide
        Yonik Seeley added a comment -

        "&&" binds more tightly then "?:"

            return super.equals(other)
                    && this.creator == null ? other.creator == null :
                    this.creator.getClass() == other.creator.getClass();
        

        Ouch! Given that "creator" is never null (for trie fields at least) this always boils down to just comparing the creator class.
        What normally saves us is that the hash code will normally slot to a different bucket, and the fact that we start off with a relatively large number of buckets (size=512, which means 1024 buckets when accounting for load factor and rounding up to the next power of two).

        This is a bad bug since it can stay hidden and strike randomly.

        (unless you know of some reason why NumericFieldCacheSource should only care about equality on this.creator.getClass() instead of this.creator ?)

        I never fully grokked the creator stuff... and I understand the trunk fieldcache code is slated to be replaced by the 3x fieldcache code, so I wouldn't worry about cleaning anything up (other than making it work).

        Show
        Yonik Seeley added a comment - "&&" binds more tightly then "?:" return super .equals(other) && this .creator == null ? other.creator == null : this .creator.getClass() == other.creator.getClass(); Ouch! Given that "creator" is never null (for trie fields at least) this always boils down to just comparing the creator class. What normally saves us is that the hash code will normally slot to a different bucket, and the fact that we start off with a relatively large number of buckets (size=512, which means 1024 buckets when accounting for load factor and rounding up to the next power of two). This is a bad bug since it can stay hidden and strike randomly. (unless you know of some reason why NumericFieldCacheSource should only care about equality on this.creator.getClass() instead of this.creator ?) I never fully grokked the creator stuff... and I understand the trunk fieldcache code is slated to be replaced by the 3x fieldcache code, so I wouldn't worry about cleaning anything up (other than making it work).
        Hide
        Erick Erickson added a comment -

        Well, now that others have done the hard part, any objections if I commit it? This seems serious enough it shouldn't get lost in the shuffle. And I'm assuming it should be both a trunk and 3x change, right?

        Should I create another JIRA reflecting Hoss's comments today (2-Nov) on cleaning up generally that refers back to this issue while I'm at it?

        Show
        Erick Erickson added a comment - Well, now that others have done the hard part, any objections if I commit it? This seems serious enough it shouldn't get lost in the shuffle. And I'm assuming it should be both a trunk and 3x change, right? Should I create another JIRA reflecting Hoss's comments today (2-Nov) on cleaning up generally that refers back to this issue while I'm at it?
        Hide
        Hoss Man added a comment -

        Erick: your call. personally i would just fix the equals of any class we're touching (the only reason i didn't in the patch was because it almost seemed intentional – if it was let the change break something and someone can justify it then)

        the one other thing i would suggest is adding some explicit tests of FunctionQuery.equals() so we have a more direct test of the underlying problem then just expecting to not get a cache hit (as the patch stands now: someone could change the schema to eliminate the cache, then someone else could re-introduce the equals bug, and we'd never know)

        there's a QueryUtils class in the lucene test-framework that has helper method for checking that Query.equals and Query.hashCode make sense, i would add some explicit tests that look something like...

          QParser func = ...;
          check(func.parse("geodist(work,...)"));
          checkEqual(func.parse("geodist(work,...)"), 
                     func.parse("geodist(work...)");
          checkUnEqual(func.parse("geodist(home,...)"), 
                       func.parse("geodist(work...)");
          ...
          check(func.parse("product(pow(a_d,b_l),c_i,d_f)"));
          checkEqual(func.parse("product(pow(a_d,b_l),c_i,d_f)"),
                     func.parse("product(pow(a_d,b_l),c_i,d_f)"));
          checkUnEqual(func.parse("product(pow(a_d,b_l),c_i,d_f)"),
                       func.parse("product(pow(a_i,b_l),c_i,d_f)"));
          ...
        

        ...for a few different types of function queries (using different field cache source types: double, float, int, tint, etc...)

        Show
        Hoss Man added a comment - Erick: your call. personally i would just fix the equals of any class we're touching (the only reason i didn't in the patch was because it almost seemed intentional – if it was let the change break something and someone can justify it then) the one other thing i would suggest is adding some explicit tests of FunctionQuery.equals() so we have a more direct test of the underlying problem then just expecting to not get a cache hit (as the patch stands now: someone could change the schema to eliminate the cache, then someone else could re-introduce the equals bug, and we'd never know) there's a QueryUtils class in the lucene test-framework that has helper method for checking that Query.equals and Query.hashCode make sense, i would add some explicit tests that look something like... QParser func = ...; check(func.parse( "geodist(work,...)" )); checkEqual(func.parse( "geodist(work,...)" ), func.parse( "geodist(work...)" ); checkUnEqual(func.parse( "geodist(home,...)" ), func.parse( "geodist(work...)" ); ... check(func.parse( "product(pow(a_d,b_l),c_i,d_f)" )); checkEqual(func.parse( "product(pow(a_d,b_l),c_i,d_f)" ), func.parse( "product(pow(a_d,b_l),c_i,d_f)" )); checkUnEqual(func.parse( "product(pow(a_d,b_l),c_i,d_f)" ), func.parse( "product(pow(a_i,b_l),c_i,d_f)" )); ... ...for a few different types of function queries (using different field cache source types: double, float, int, tint, etc...)
        Hide
        Erick Erickson added a comment -

        Patch for the 3x code line, if I don't get any objections, I'll merge it with trunk and commit over the weekend.

        All tests pass.

        The code changes aren't as interesting as the tests, anyone want to recommend improvements? I verified that the tests catch short, float, long, byte and double if the parens aren't added. Had to add a few types to the default schema.xml.

        I realize that the tests specific to LatLon are redundant, they're caught by the double test. But I don't see any harm leaving them in.

        Show
        Erick Erickson added a comment - Patch for the 3x code line, if I don't get any objections, I'll merge it with trunk and commit over the weekend. All tests pass. The code changes aren't as interesting as the tests, anyone want to recommend improvements? I verified that the tests catch short, float, long, byte and double if the parens aren't added. Had to add a few types to the default schema.xml. I realize that the tests specific to LatLon are redundant, they're caught by the double test. But I don't see any harm leaving them in.
        Hide
        Erick Erickson added a comment -

        Patch against trunk rather than 3x. I think I have all the pieces together, I can merge trunk/3x and reconcile differences this weekend I hope.

        Show
        Erick Erickson added a comment - Patch against trunk rather than 3x. I think I have all the pieces together, I can merge trunk/3x and reconcile differences this weekend I hope.
        Hide
        Hoss Man added a comment -

        Erick: +1 to the patch (although "temp" is probably not the best name for an iterator of of a "templates" array)

        Show
        Hoss Man added a comment - Erick: +1 to the patch (although "temp" is probably not the best name for an iterator of of a "templates" array)
        Hide
        Erick Erickson added a comment -

        Final patch. Renamed variable as per Hoss. I hate it when he's right.

        Show
        Erick Erickson added a comment - Final patch. Renamed variable as per Hoss. I hate it when he's right.
        Hide
        Erick Erickson added a comment -

        Added fix version of 4.0

        Show
        Erick Erickson added a comment - Added fix version of 4.0
        Hide
        Erick Erickson added a comment -

        Attached the 3x patch, reconciling these is kinda unpleasant.

        Show
        Erick Erickson added a comment - Attached the 3x patch, reconciling these is kinda unpleasant.
        Hide
        Uwe Schindler added a comment -

        Bulk close after 3.5 is released

        Show
        Uwe Schindler added a comment - Bulk close after 3.5 is released

          People

          • Assignee:
            Erick Erickson
            Reporter:
            Erick Erickson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development