Solr
  1. Solr
  2. SOLR-5057

queryResultCache should not related with the order of fq's list

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.0, 4.1, 4.2, 4.3
    • Fix Version/s: 4.5, 5.0
    • Component/s: search
    • Labels:
      None

      Description

      There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed.

      case1: q=:&fq=field1:value1&fq=field2:value2
      case2: q=:&fq=field2:value2&fq=field1:value1

      I think queryResultCache should not be related with the order of fq's list.

      1. SOLR-5057.patch
        4 kB
        Feihong Huang
      2. SOLR-5057.patch
        3 kB
        Erick Erickson
      3. SOLR-5057.patch
        3 kB
        Feihong Huang
      4. SOLR-5057.patch
        3 kB
        Erick Erickson

        Issue Links

          Activity

          Hide
          Adrien Grand added a comment -

          4.5 release -> bulk close

          Show
          Adrien Grand added a comment - 4.5 release -> bulk close
          Hide
          Erick Erickson added a comment -

          Thanks to Feihong Huang!

          Show
          Erick Erickson added a comment - Thanks to Feihong Huang!
          Hide
          ASF subversion and git services added a comment -

          Commit 1516307 from Erick Erickson in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1516307 ]

          SOLR-5057 - queryResultCache should match out-of-order fq clauses

          Show
          ASF subversion and git services added a comment - Commit 1516307 from Erick Erickson in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1516307 ] SOLR-5057 - queryResultCache should match out-of-order fq clauses
          Hide
          Erick Erickson added a comment -

          Applies cleanly, I suspect there is a charset problem.

          Show
          Erick Erickson added a comment - Applies cleanly, I suspect there is a charset problem.
          Hide
          ASF subversion and git services added a comment -

          Commit 1516299 from Erick Erickson in branch 'dev/trunk'
          [ https://svn.apache.org/r1516299 ]

          SOLR-5057 - queryResultCache should match out-of-order fq clauses

          Show
          ASF subversion and git services added a comment - Commit 1516299 from Erick Erickson in branch 'dev/trunk' [ https://svn.apache.org/r1516299 ] SOLR-5057 - queryResultCache should match out-of-order fq clauses
          Hide
          Feihong Huang added a comment -

          hi, erick, patch attached again. I upload by mistake previous time.
          I'm so sorry.

          Show
          Feihong Huang added a comment - hi, erick, patch attached again. I upload by mistake previous time. I'm so sorry.
          Hide
          Erick Erickson added a comment -

          Hmmm, sorry it took so long. But it appears that your latest patch doesn't have any of Yonik's suggestions in it and still has the test in a completely new file rather than just another test case in QueryResultKeyTest.java as my patch had. Did you upload your first patch again by mistake?

          Please upload a new version and I promise I'll get it committed soon.

          Show
          Erick Erickson added a comment - Hmmm, sorry it took so long. But it appears that your latest patch doesn't have any of Yonik's suggestions in it and still has the test in a completely new file rather than just another test case in QueryResultKeyTest.java as my patch had. Did you upload your first patch again by mistake? Please upload a new version and I promise I'll get it committed soon.
          Hide
          Erick Erickson added a comment -

          I'll give this another go-over in the next day or two.

          Show
          Erick Erickson added a comment - I'll give this another go-over in the next day or two.
          Hide
          Feihong Huang added a comment -

          Patch attached. Just rename several variable'name using Yonik's code.

          Show
          Feihong Huang added a comment - Patch attached. Just rename several variable'name using Yonik's code.
          Hide
          Erick Erickson added a comment -

          You can certainly submit the patch. So take Yonik's version,
          put it in your code and try your test.

          Then, please run the entire test suite (i.e. execute
          'ant clean test' from the root).

          But sure, then you can submit the patch. I think Yonik's
          version address Hoss's comments, it seems to me that this
          patch preserves efficiency without having to make it
          a several step operation to handle this case.

          Show
          Erick Erickson added a comment - You can certainly submit the patch. So take Yonik's version, put it in your code and try your test. Then, please run the entire test suite (i.e. execute 'ant clean test' from the root). But sure, then you can submit the patch. I think Yonik's version address Hoss's comments, it seems to me that this patch preserves efficiency without having to make it a several step operation to handle this case.
          Hide
          Feihong Huang added a comment -

          So, Can anyone make a final decision for this featrue ?

          hi, Erickson, if we decide to fix the feature, who is responsible for submit the patch?
          Can i do it?

          Show
          Feihong Huang added a comment - So, Can anyone make a final decision for this featrue ? hi, Erickson, if we decide to fix the feature, who is responsible for submit the patch? Can i do it?
          Hide
          Hoss Man added a comment -

          Similar to yoniks point initial point: In my experience, the situations where folks are going to be most concerned about having good cache usage are the situations where queries are generated programatically and the order of the filter queries is already deterministic (or can be made deterministic easy enough by the client)

          My straw man suggestion would be to not modify QueryResultKey at all, and instead write a new (optional) SearchComponent that did nothing by sort the getFilters() array in it's prepare() method. Users who can't ensure that requests with equivalent "fq" params queries come in the same order can register it to run just after the "query" component and get good cache hit ratios, but it wouldn't affect the performance in any way for users who send queries with fqs i na determinstic manner

          Show
          Hoss Man added a comment - Similar to yoniks point initial point: In my experience, the situations where folks are going to be most concerned about having good cache usage are the situations where queries are generated programatically and the order of the filter queries is already deterministic (or can be made deterministic easy enough by the client) My straw man suggestion would be to not modify QueryResultKey at all, and instead write a new (optional) SearchComponent that did nothing by sort the getFilters() array in it's prepare() method. Users who can't ensure that requests with equivalent "fq" params queries come in the same order can register it to run just after the "query" component and get good cache hit ratios, but it wouldn't affect the performance in any way for users who send queries with fqs i na determinstic manner
          Hide
          Feihong Huang added a comment -

          Wonderful! The code above is much more better.

          Show
          Feihong Huang added a comment - Wonderful! The code above is much more better.
          Hide
          Yonik Seeley added a comment -

          I think the common case will be fqs being ordered (most fqs will be generated from a system and hence normally appear in the same order.)
          A few issues with this patch:

          • the sort is expensive - it generates hashcodes for elements on every single compare.
          • it's still not 100% - hashcodes for two different queries could match so sorting doesn't guarantee to get in the right order
          • it slows down the common case of ordered filters (the sort needs to be done even when just checking the cache too)

          How about this (untested) code instead:

            // Do fast version, expecting that filters are ordered and only
            // fall back to unordered compare on the first non-equal elements.
            // This will only be called if the hash code of the entire key already
            // matched, so the slower unorderedCompare should pretty much never
            // be called if filter lists are generally ordered.
            private static boolean isEqual(List<Query> a, List<Query> b) {
              if (a==b) return true;  // takes care of identity and null cases
              if (a==null || b==null) return false;
              int sz = a.size();
              if (sz != b.size()) return false;
              for (int i=0; i<sz; i++) {
                if (!a.get(i).equals(b.get(i))) {
                  return unorderedCompare(a, b, i);
                }
              }
              return true;
            }
          
            private static boolean unorderedCompare(List<Query> a, List<Query> b, int start) {
              int sz = a.size();
              outer: for (int aIdx = start; aIdx<sz; aIdx++) {
                Query aQ = a.get(aIdx);
                for (int bIdx = start; bIdx<sz; bIdx++) {
                  Query bQ =b.get(bIdx);
                  if (aQ.equals(bQ)) continue outer;
                }
                return false;
              }
              return true;
            }
          
          Show
          Yonik Seeley added a comment - I think the common case will be fqs being ordered (most fqs will be generated from a system and hence normally appear in the same order.) A few issues with this patch: the sort is expensive - it generates hashcodes for elements on every single compare. it's still not 100% - hashcodes for two different queries could match so sorting doesn't guarantee to get in the right order it slows down the common case of ordered filters (the sort needs to be done even when just checking the cache too) How about this (untested) code instead: // Do fast version, expecting that filters are ordered and only // fall back to unordered compare on the first non-equal elements. // This will only be called if the hash code of the entire key already // matched, so the slower unorderedCompare should pretty much never // be called if filter lists are generally ordered. private static boolean isEqual(List<Query> a, List<Query> b) { if (a==b) return true ; // takes care of identity and null cases if (a== null || b== null ) return false ; int sz = a.size(); if (sz != b.size()) return false ; for ( int i=0; i<sz; i++) { if (!a.get(i).equals(b.get(i))) { return unorderedCompare(a, b, i); } } return true ; } private static boolean unorderedCompare(List<Query> a, List<Query> b, int start) { int sz = a.size(); outer : for ( int aIdx = start; aIdx<sz; aIdx++) { Query aQ = a.get(aIdx); for ( int bIdx = start; bIdx<sz; bIdx++) { Query bQ =b.get(bIdx); if (aQ.equals(bQ)) continue outer ; } return false ; } return true ; }
          Hide
          Erick Erickson added a comment -

          Feihong:

          Never mind, I somehow got off on a tangent. I was thinking about fq clauses and somehow got it in my mind (even though I know better) that the filter cache re-use was sensitive to the order of fq clauses when it isn't, they are all individually cached.

          Sorry for the confusion!

          So I'll give people a chance to make comments and probably commit if not this weekend. All tests pass.

          Show
          Erick Erickson added a comment - Feihong: Never mind, I somehow got off on a tangent. I was thinking about fq clauses and somehow got it in my mind (even though I know better) that the filter cache re-use was sensitive to the order of fq clauses when it isn't, they are all individually cached. Sorry for the confusion! So I'll give people a chance to make comments and probably commit if not this weekend. All tests pass.
          Hide
          Feihong Huang added a comment -

          hi, erickson. I think that way can re-use the filterCache when the fq clauses were ordered differently. I am a new comer in learning solr. If there is some points that i am not considering sufficiently, i apologize for this.

          Show
          Feihong Huang added a comment - hi, erickson. I think that way can re-use the filterCache when the fq clauses were ordered differently. I am a new comer in learning solr. If there is some points that i am not considering sufficiently, i apologize for this.
          Hide
          Erick Erickson added a comment -

          Moved test to pre-existing file.

          Show
          Erick Erickson added a comment - Moved test to pre-existing file.
          Hide
          Erick Erickson added a comment -

          Didn't mean to sound like it wouldn't be done, just making you aware that you only have read-only access to the repository and one of the committers has to pick it up and commit it.

          That said, I took a quick look at it and it looks reasonable, I've assigned it to myself. I rearranged things a bit (I think the test you wrote fits better in a pre-existing file), I'll attach the change momentarily.

          Do you think you could extend this for the filterCache to? That way we'd be able to re-use the filterCache when the fq clauses were ordered differently.

          Yonik Seeley Chris Hostetter I've gotten myself in trouble by not understanding the nuances of query semantics, do you see a problem with this approach? Seems like an easy win, which makes me nervous that it hasn't been done before <G>...

          Erick

          Show
          Erick Erickson added a comment - Didn't mean to sound like it wouldn't be done, just making you aware that you only have read-only access to the repository and one of the committers has to pick it up and commit it. That said, I took a quick look at it and it looks reasonable, I've assigned it to myself. I rearranged things a bit (I think the test you wrote fits better in a pre-existing file), I'll attach the change momentarily. Do you think you could extend this for the filterCache to? That way we'd be able to re-use the filterCache when the fq clauses were ordered differently. Yonik Seeley Chris Hostetter I've gotten myself in trouble by not understanding the nuances of query semantics, do you see a problem with this approach? Seems like an easy win, which makes me nervous that it hasn't been done before <G>... Erick
          Hide
          Feihong Huang added a comment -

          Well, Thank you for your reply. I am interested in contributing my work to solr.

          Show
          Feihong Huang added a comment - Well, Thank you for your reply. I am interested in contributing my work to solr.
          Hide
          Erick Erickson added a comment -

          I don't think you have commit rights <G>..... one of the committers will have to pick it up. And everyone is swamped so it may take some gentle nudging....

          Show
          Erick Erickson added a comment - I don't think you have commit rights <G>..... one of the committers will have to pick it up. And everyone is swamped so it may take some gentle nudging....
          Hide
          Feihong Huang added a comment -

          hi, erickson. Thank you for your comments. Patch attached, with new test. If it is ok, I'll commit shortly.

          Show
          Feihong Huang added a comment - hi, erickson. Thank you for your comments. Patch attached, with new test. If it is ok, I'll commit shortly.
          Hide
          Erick Erickson added a comment -

          Hmmm, my initial reaction was that it'd be hard to figure out what clauses were "equivalent". But I suppose a simple lexical ordering of all the "relevant" clauses would work for this. Of course the work-around is to generate your fq clauses in the same order every time.....

          Show
          Erick Erickson added a comment - Hmmm, my initial reaction was that it'd be hard to figure out what clauses were "equivalent". But I suppose a simple lexical ordering of all the "relevant" clauses would work for this. Of course the work-around is to generate your fq clauses in the same order every time.....

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Feihong Huang
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development