Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6975

Add dimensional "equals" query to match docs containing precisely a given value

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today, you can make a dimensional range query using e.g. DimensionalRangeQuery.new1DIntRange, etc., plus a direct ctor for "expert" (2D, 3D, etc.) usages, but matching a single value is awkward and users ask about it from time to time.

      We could maybe rename DimensionalRangeQuery to DimensionalQuery and add new "factories" like newIntEqualsQuery or something.

      Or, we could make new classes, DimensionalIntEqualsQuery etc., and you get to use ordinary constructors?

      Or something else?

      1. LUCENE-6975.patch
        579 kB
        Michael McCandless
      2. LUCENE-6975.patch
        642 kB
        Michael McCandless

        Activity

        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1725998 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1725998 ]

        LUCENE-6975: rename dimensional values to points; add ExactPointQuery to match documents containing exactly an indexed point

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1725998 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1725998 ] LUCENE-6975 : rename dimensional values to points; add ExactPointQuery to match documents containing exactly an indexed point
        Hide
        mikemccand Michael McCandless added a comment -

        Good idea Robert Muir, here's a new patch with LatLonPoint.

        Show
        mikemccand Michael McCandless added a comment - Good idea Robert Muir , here's a new patch with LatLonPoint .
        Hide
        rcmuir Robert Muir added a comment -

        Well I agree we need to make the 1D case simpler. I'm not sure if separate classes really does that, or if we just have to improve the apis of what we have. I think we have a couple of choices.

        For sure though, Point is better than Dimensional. I do think we should make the additional tweak here, of "escape from Field", meaning PointLatLonField -> LatLonPoint and so on. It can be a followup though.

        Show
        rcmuir Robert Muir added a comment - Well I agree we need to make the 1D case simpler. I'm not sure if separate classes really does that, or if we just have to improve the apis of what we have. I think we have a couple of choices. For sure though, Point is better than Dimensional. I do think we should make the additional tweak here, of "escape from Field", meaning PointLatLonField -> LatLonPoint and so on. It can be a followup though.
        Hide
        romseygeek Alan Woodward added a comment -

        +1, nice!

        Regarding Robert's point about having to think of simple numeric queries as being points, maybe it's worth adding some sugar classes? So SingleIntField extends IntPoint and takes a single value, similarly ExactNumericQuery is an extension of ExactPointQuery.

        Show
        romseygeek Alan Woodward added a comment - +1, nice! Regarding Robert's point about having to think of simple numeric queries as being points, maybe it's worth adding some sugar classes? So SingleIntField extends IntPoint and takes a single value, similarly ExactNumericQuery is an extension of ExactPointQuery.
        Hide
        rcmuir Robert Muir added a comment -

        +1!

        Show
        rcmuir Robert Muir added a comment - +1!
        Hide
        mikemccand Michael McCandless added a comment -

        Here's a patch, renaming dimensional -> point across the board, and adding a new ExactPointQuery.

        Show
        mikemccand Michael McCandless added a comment - Here's a patch, renaming dimensional -> point across the board, and adding a new ExactPointQuery .
        Hide
        mikemccand Michael McCandless added a comment -

        I like `IntPoint` and `PointRangeQuery` (and maybe `ExactPointQuery`?) ... I'll try this out.

        Show
        mikemccand Michael McCandless added a comment - I like `IntPoint` and `PointRangeQuery` (and maybe `ExactPointQuery`?) ... I'll try this out.
        Hide
        rcmuir Robert Muir added a comment -

        i do like your idea, those are just some devil's advocate ideas. At least point seems way more intuitive than what we have in trunk and i dont have anything better. It definitely seems better than dimensional and doesn't make me think about mad scientists or time travel.

        Just the general idea is something to think about, simple names like fields, columns, and points could help people use the right datastructure for the job.

        Show
        rcmuir Robert Muir added a comment - i do like your idea, those are just some devil's advocate ideas. At least point seems way more intuitive than what we have in trunk and i dont have anything better. It definitely seems better than dimensional and doesn't make me think about mad scientists or time travel. Just the general idea is something to think about, simple names like fields, columns, and points could help people use the right datastructure for the job.
        Hide
        rcmuir Robert Muir added a comment -

        Well I do think we should avoid "Value" unless it adds some additional meaning. Thats one problem with doc values, its basically a name that says nothing!

        Do you mean something like IntPointValue etc? Why not just IntPoint? Point isn't bad at all, since it does represent what is happening.

        On the other hand, if i just have 'int age' in my document and i want to do range queries on it, its hard to know if that would be the intuitive thing to look into, you'd have to think "oh, i need to treat it like a 1-dimensional point".

        Show
        rcmuir Robert Muir added a comment - Well I do think we should avoid "Value" unless it adds some additional meaning. Thats one problem with doc values, its basically a name that says nothing! Do you mean something like IntPointValue etc? Why not just IntPoint? Point isn't bad at all, since it does represent what is happening. On the other hand, if i just have 'int age' in my document and i want to do range queries on it, its hard to know if that would be the intuitive thing to look into, you'd have to think "oh, i need to treat it like a 1-dimensional point".
        Hide
        romseygeek Alan Woodward added a comment -

        But what to use for Dim values? FloatValue wouldn't be totally horrible, but I feel like "value" is overloaded too much, is there a better term we can use?

        How about PointValue? And then we have PointRangeQuery.

        Show
        romseygeek Alan Woodward added a comment - But what to use for Dim values? FloatValue wouldn't be totally horrible, but I feel like "value" is overloaded too much, is there a better term we can use? How about PointValue? And then we have PointRangeQuery.
        Hide
        rcmuir Robert Muir added a comment -

        To me its the word "Dimensional" in these names that is the scary one.

        But we need to break here, we can't name things e.g. FloatField or there is a big risk of users mixing the two things. We've also got FloatDocValuesField that we don't want to cause confusion with. And somehow it needs to be clear that e.g. you need DimensionalQuery to use it, TermQuery won't work. To me thats a difference vs a regular Field.

        One idea is the part "Field" could be something different. For example instead of FloatDocValuesField we could have "FloatColumn", you add that to your document, and then we have ColumnRangeQuery that goes with it, instead of DocValuesRangeQuery. In that case, i feel like the different term really helps you think about what is going on vs Field.

        But what to use for Dim values? FloatValue wouldn't be totally horrible, but I feel like "value" is overloaded too much, is there a better term we can use?

        Show
        rcmuir Robert Muir added a comment - To me its the word "Dimensional" in these names that is the scary one. But we need to break here, we can't name things e.g. FloatField or there is a big risk of users mixing the two things. We've also got FloatDocValuesField that we don't want to cause confusion with. And somehow it needs to be clear that e.g. you need DimensionalQuery to use it, TermQuery won't work. To me thats a difference vs a regular Field. One idea is the part "Field" could be something different. For example instead of FloatDocValuesField we could have "FloatColumn", you add that to your document, and then we have ColumnRangeQuery that goes with it, instead of DocValuesRangeQuery. In that case, i feel like the different term really helps you think about what is going on vs Field. But what to use for Dim values? FloatValue wouldn't be totally horrible, but I feel like "value" is overloaded too much, is there a better term we can use?
        Hide
        dsmiley David Smiley added a comment -

        I like your idea of renaming DimensionalRangeQuery to DimensionalQuery and adding factory methods.

        Show
        dsmiley David Smiley added a comment - I like your idea of renaming DimensionalRangeQuery to DimensionalQuery and adding factory methods.
        Hide
        mikemccand Michael McCandless added a comment -

        I have no patch here for now ... need to hash out the design first ...

        Show
        mikemccand Michael McCandless added a comment - I have no patch here for now ... need to hash out the design first ...

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development