Lucene - Core
  1. Lucene - Core
  2. LUCENE-6720

new FunctionRangeQuery, plus ValueSourceScorer improvements

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This issue provides a new FunctionRangeQuery, which is basically a wrapper around ValueSourceScorer (retrieved from FunctionValues.getRangeScorer). It replaces ValueSourceFilter in the spatial module. Solr has a class by the same name which is similar but isn't suitable to being ported.

      Also, it includes refactorings to the ValueSourceScorer, to include performance enhancements by making it work with the TwoPhaseIterator API.

      note: I posted this to LUCENE-4251 initially but then felt it's really its own issue.

        Issue Links

          Activity

          Hide
          David Smiley added a comment -

          The attached patch implements the aforementioned functionality, including tests.

          Notes:

          ValueSourceScorer:

          • marked as lucene.experimental; although might be considered internal. FunctionValues explicitly returns this scorer, and Solr calls a method on it, so I figured experimental over internal.
          • removed deleted doc handling since it’s redundant at this layer; the bulk doc scorer handles that.
          • made abstract and made matches() abstract (renamed from matchesValue) to clarify it’s typical use. matches() vs. matchesValue() is more consistent (e.g. TwoPhaseIterator.matches), I feel. This rename caused some changes in a bunch of files.

          FunctionRangeQuery:

          • Added basic test, including with deleting a doc.
          • FYI discovered this Query is similar to DocValuesRangeQuery (now in Sandbox). DVRQ is constant scoring, is specific to docValues (not generic ValueSource), and won’t match docs with no value
          • Future: don’t match if FunctionValues.exists() returns false? How to prevent extra value fetching? Oddly, IntFieldSource & friends have an exist() method that fetches the value to see if it’s 0. I don’t know why it should care.

          Question: Should a FunctionRangeQuery/ValueSourceScorer actually be based on the MutableValue API? I haven’t looked at that API much before now; it’s an odd one. Presumably it’s existence is made to facilitate scenarios like FunctionRangeQuery to avoid a bunch of type-specific code, since the type-specific code is already in the MutableValue API? This would effectively mean that IntDocValues (a subclass of ValueSource that has a FunctionValues) & it’s type friends would have simpler implementations of getRangeScorer since once the lower & upper Strings are parsed and loaded into a MutableValue of the right type, a generic range scorer could handle it from there.

          Show
          David Smiley added a comment - The attached patch implements the aforementioned functionality, including tests. Notes: ValueSourceScorer: marked as lucene.experimental; although might be considered internal. FunctionValues explicitly returns this scorer, and Solr calls a method on it, so I figured experimental over internal. removed deleted doc handling since it’s redundant at this layer; the bulk doc scorer handles that. made abstract and made matches() abstract (renamed from matchesValue) to clarify it’s typical use. matches() vs. matchesValue() is more consistent (e.g. TwoPhaseIterator.matches), I feel. This rename caused some changes in a bunch of files. FunctionRangeQuery: Added basic test, including with deleting a doc. FYI discovered this Query is similar to DocValuesRangeQuery (now in Sandbox). DVRQ is constant scoring, is specific to docValues (not generic ValueSource), and won’t match docs with no value Future: don’t match if FunctionValues.exists() returns false? How to prevent extra value fetching? Oddly, IntFieldSource & friends have an exist() method that fetches the value to see if it’s 0. I don’t know why it should care. Question: Should a FunctionRangeQuery/ValueSourceScorer actually be based on the MutableValue API? I haven’t looked at that API much before now; it’s an odd one. Presumably it’s existence is made to facilitate scenarios like FunctionRangeQuery to avoid a bunch of type-specific code, since the type-specific code is already in the MutableValue API? This would effectively mean that IntDocValues (a subclass of ValueSource that has a FunctionValues) & it’s type friends would have simpler implementations of getRangeScorer since once the lower & upper Strings are parsed and loaded into a MutableValue of the right type, a generic range scorer could handle it from there.
          Hide
          Adrien Grand added a comment -
          +    if (lowerVal != null ? !lowerVal.equals(that.lowerVal) : that.lowerVal != null) return false;
          +    return !(upperVal != null ? !upperVal.equals(that.upperVal) : that.upperVal != null);
          

          This is a bit hard to read for me (many negations), maybe we could do something like

          return Objects.equals(lowerVal, that.lowerVal) && Objects.equals(upperVal, that.upperVal)?
          

          explain calls scorer.advance(doc), which could be very slow if the index is large and has few docs in the range, could we just check if the given doc is in range?

          > Future: don’t match if FunctionValues.exists() returns false? How to prevent extra value fetching? I don’t know why it should care.

          Indeed I don't think it should match docs that don't have a value. This behaviour you are seeing with IntFieldSource is consistent with what we are doing with doc values: doc values guarantee that if a doc misses a value, then the NumericDocValues will return 0 as a value. This way most of the time you can know both that a document has a value and its value with a single random access. We only need to check the bit set for missing values (which is disk-based) when a doc value is exactly 0.

          I can't help with the MutableValue/ValueFiller API, it is odd to me to.

          Show
          Adrien Grand added a comment - + if (lowerVal != null ? !lowerVal.equals(that.lowerVal) : that.lowerVal != null ) return false ; + return !(upperVal != null ? !upperVal.equals(that.upperVal) : that.upperVal != null ); This is a bit hard to read for me (many negations), maybe we could do something like return Objects.equals(lowerVal, that.lowerVal) && Objects.equals(upperVal, that.upperVal)? explain calls scorer.advance(doc), which could be very slow if the index is large and has few docs in the range, could we just check if the given doc is in range? > Future: don’t match if FunctionValues.exists() returns false? How to prevent extra value fetching? I don’t know why it should care. Indeed I don't think it should match docs that don't have a value. This behaviour you are seeing with IntFieldSource is consistent with what we are doing with doc values: doc values guarantee that if a doc misses a value, then the NumericDocValues will return 0 as a value. This way most of the time you can know both that a document has a value and its value with a single random access. We only need to check the bit set for missing values (which is disk-based) when a doc value is exactly 0. I can't help with the MutableValue/ValueFiller API, it is odd to me to.
          Hide
          David Smiley added a comment -

          Thanks for the review Adrien! I was hoping you'd chime in.

          This addresses your two main concerns, I hope.

          • Both equals() & hashCode() implementations were re-generated using Objects.equals & Objects.hash. FYI I use IntelliJ which has multiple templates, so I switched it to the template that uses these methods.
          • explain() no longer calls advance to detect if the doc matches; instead it uses the ValueSourceScorer.match method. I added a comment to clarify why this is done.

          I'd like to file a separate issue for ValueSourceScorer implementations to honor exists() when matching. I don't want that to delay or crowd out the point of this issue.

          Show
          David Smiley added a comment - Thanks for the review Adrien! I was hoping you'd chime in. This addresses your two main concerns, I hope. Both equals() & hashCode() implementations were re-generated using Objects.equals & Objects.hash. FYI I use IntelliJ which has multiple templates, so I switched it to the template that uses these methods. explain() no longer calls advance to detect if the doc matches; instead it uses the ValueSourceScorer.match method. I added a comment to clarify why this is done. I'd like to file a separate issue for ValueSourceScorer implementations to honor exists() when matching. I don't want that to delay or crowd out the point of this issue.
          Hide
          Adrien Grand added a comment -

          +1

          Show
          Adrien Grand added a comment - +1
          Hide
          ASF subversion and git services added a comment -

          Commit 1694543 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1694543 ]

          LUCENE-6720: new FunctionRangeQuery wrapper around ValueSourceScorer.
          And ValueSourceScorer improvements.

          Show
          ASF subversion and git services added a comment - Commit 1694543 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1694543 ] LUCENE-6720 : new FunctionRangeQuery wrapper around ValueSourceScorer. And ValueSourceScorer improvements.
          Hide
          ASF subversion and git services added a comment -

          Commit 1694546 from David Smiley in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1694546 ]

          LUCENE-6720: new FunctionRangeQuery wrapper around ValueSourceScorer.
          And ValueSourceScorer improvements.

          Show
          ASF subversion and git services added a comment - Commit 1694546 from David Smiley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694546 ] LUCENE-6720 : new FunctionRangeQuery wrapper around ValueSourceScorer. And ValueSourceScorer improvements.

            People

            • Assignee:
              David Smiley
              Reporter:
              David Smiley
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development