Lucene - Core
  1. Lucene - Core
  2. LUCENE-6650

Remove dependency of lucene/spatial on oal.search.Filter

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We should try to remove usage of oal.search.Filter in lucene/spatial. I gave it a try but this module makes non-trivial use of filters so I wouldn't mind some help here.

      1. LUCENE-6650.patch
        86 kB
        David Smiley
      2. LUCENE-6650.patch
        33 kB
        David Smiley

        Issue Links

          Activity

          Hide
          David Smiley added a comment -

          I'll help. One of the nights in the next few days I'll take a stab at it. I presume this is just trunk; on 5x SpatialStrategy.makeFilter can be deprecated (for back-compat).

          Show
          David Smiley added a comment - I'll help. One of the nights in the next few days I'll take a stab at it. I presume this is just trunk; on 5x SpatialStrategy.makeFilter can be deprecated (for back-compat).
          Hide
          Adrien Grand added a comment -

          Thanks David! This issue is mostly important for trunk indeed, but backporting to 5.x and deprecating the old stuff would be nice too.

          on 5x SpatialStrategy.makeFilter can be deprecated (for back-compat)

          +1 Maybe we could even make this method final and return a QueryWrapperFilter around the result of makeQuery? Since QueryWrapperFilter rewrites to the wrapped query and FilteredQuery rewrites to a BooleanQuery, this means that running a FilteredQuery with the result of makeFilter or a BooleanQuery with a FILTER clause with the result of makeQuery would do exactly the same thing.

          I know that some spatial filters throw an exception when someone tries to pull an iterator. I don't think a similar behaviour would be necessary anymore given that we have tests that BooleanQuery takes advantage of approximations when they are available, while FilteredQuery could still use an inefficient iterator if you did not configure the appropriate FilterStrategy?

          Show
          Adrien Grand added a comment - Thanks David! This issue is mostly important for trunk indeed, but backporting to 5.x and deprecating the old stuff would be nice too. on 5x SpatialStrategy.makeFilter can be deprecated (for back-compat) +1 Maybe we could even make this method final and return a QueryWrapperFilter around the result of makeQuery? Since QueryWrapperFilter rewrites to the wrapped query and FilteredQuery rewrites to a BooleanQuery, this means that running a FilteredQuery with the result of makeFilter or a BooleanQuery with a FILTER clause with the result of makeQuery would do exactly the same thing. I know that some spatial filters throw an exception when someone tries to pull an iterator. I don't think a similar behaviour would be necessary anymore given that we have tests that BooleanQuery takes advantage of approximations when they are available, while FilteredQuery could still use an inefficient iterator if you did not configure the appropriate FilterStrategy?
          Hide
          David Smiley added a comment -

          Maybe we could even make this method final and return a QueryWrapperFilter around the result of makeQuery?

          +1

          I know that some spatial filters throw an exception when someone tries to pull an iterator. I don't think a similar behaviour would be necessary anymore given that we have tests that BooleanQuery takes advantage of approximations when they are available, while FilteredQuery could still use an inefficient iterator if you did not configure the appropriate FilterStrategy?

          Right; not necessary. The only spatial filter that applies here is via SerializedDVStrategy. In the patch I post I may even enhance SerializedDVStrategy to use TwoPhaseIterator.

          Show
          David Smiley added a comment - Maybe we could even make this method final and return a QueryWrapperFilter around the result of makeQuery? +1 I know that some spatial filters throw an exception when someone tries to pull an iterator. I don't think a similar behaviour would be necessary anymore given that we have tests that BooleanQuery takes advantage of approximations when they are available, while FilteredQuery could still use an inefficient iterator if you did not configure the appropriate FilterStrategy? Right; not necessary. The only spatial filter that applies here is via SerializedDVStrategy. In the patch I post I may even enhance SerializedDVStrategy to use TwoPhaseIterator.
          Hide
          David Smiley added a comment - - edited

          Removing SpatialStrategy.makeFilter was pretty easy, but 3-4 SpatialStrategy impls use or have Filters that will take a bit to triage. I worked on LUCENE-6720 today – a FunctionRangeQuery.

          Show
          David Smiley added a comment - - edited Removing SpatialStrategy.makeFilter was pretty easy, but 3-4 SpatialStrategy impls use or have Filters that will take a bit to triage. I worked on LUCENE-6720 today – a FunctionRangeQuery.
          Hide
          Adrien Grand added a comment -

          David, do you think you will have time to work on this again soon? I would like to deprecate Filter, but it is a bit weird if Filter is still part of the public API of some of our modules?

          Show
          Adrien Grand added a comment - David, do you think you will have time to work on this again soon? I would like to deprecate Filter, but it is a bit weird if Filter is still part of the public API of some of our modules?
          Hide
          David Smiley added a comment -

          Hi Adrien.
          I'm attaching my patch in progress from when I last touched it a month ago. Probably the main thing left to do is to change the Filters in the org.apache.lucene.spatial.prefix package to be Queries. I'll try and resume working on it in a week.

          Show
          David Smiley added a comment - Hi Adrien. I'm attaching my patch in progress from when I last touched it a month ago. Probably the main thing left to do is to change the Filters in the org.apache.lucene.spatial.prefix package to be Queries. I'll try and resume working on it in a week.
          Hide
          David Smiley added a comment -

          Finally I finished. Filters were used a lot here, particularly in the "prefix" package. A summary of changes:

          • The main high-level API change to put in CHANGES.txt is that SpatialStrategy.makeFilter doesn't exist any more; people should call makeQuery which already existed. FYI this abstraction is a facade to details most users won't see. It's still marked lucene.experimental but arguably it isn't.
          • The PrefixTreeStrategy.calcFacets() method (and the internal collaborators that take the same parameters) doesn't take a Filter now, it has a Bits topAcceptDocs instead.
          • Any subclasses of Filter had their names changed to reflect Query. These queries are all lucene.experimental or internal.
          • The removal of acceptDocs in LUCENE-6553 meant I could follow-suit in most places. But the PrefixTreeFacetCounter (used in e.g. heatmap) needs to be aware of acceptDocs since it's not a Query so just there did I need to keep that. It worked out fine.
          • SerializedDVStrategy's makeQuery no longer throws an exception if you try to iterate on it versus access it via a matches() on TwoPhaseIterator because it would be hard/impossible and basically not worthwhile. There's less code there to implement now any way, thanks to RandomAccessWeight.
          • (Not strictly related?) In BBoxStrategy's boolean queries, I substituted BooleanClause.Occur.FILTER instead of MUST through I'm not sure if it matters? (score doesn't matter; the whole thing is wrapped in a ConstantScoreQuery).
          Show
          David Smiley added a comment - Finally I finished. Filters were used a lot here, particularly in the "prefix" package. A summary of changes: The main high-level API change to put in CHANGES.txt is that SpatialStrategy.makeFilter doesn't exist any more; people should call makeQuery which already existed. FYI this abstraction is a facade to details most users won't see. It's still marked lucene.experimental but arguably it isn't. The PrefixTreeStrategy.calcFacets() method (and the internal collaborators that take the same parameters) doesn't take a Filter now, it has a Bits topAcceptDocs instead. Any subclasses of Filter had their names changed to reflect Query. These queries are all lucene.experimental or internal. The removal of acceptDocs in LUCENE-6553 meant I could follow-suit in most places. But the PrefixTreeFacetCounter (used in e.g. heatmap) needs to be aware of acceptDocs since it's not a Query so just there did I need to keep that. It worked out fine. SerializedDVStrategy's makeQuery no longer throws an exception if you try to iterate on it versus access it via a matches() on TwoPhaseIterator because it would be hard/impossible and basically not worthwhile. There's less code there to implement now any way, thanks to RandomAccessWeight. (Not strictly related?) In BBoxStrategy's boolean queries, I substituted BooleanClause.Occur.FILTER instead of MUST through I'm not sure if it matters? (score doesn't matter; the whole thing is wrapped in a ConstantScoreQuery).
          Hide
          Adrien Grand added a comment -

          I substituted BooleanClause.Occur.FILTER instead of MUST through I'm not sure if it matters?

          It should be equivalent.

          The patch looks good to me overall, it feels cleaner to me that makeQuery and makeFilter were merged to a single method that has a two-phase iterator. The one thing that feels awkward to me is this topAcceptDocs which is top-level to the reader, while everywhere else we work with per-segment data-structures?

          Show
          Adrien Grand added a comment - I substituted BooleanClause.Occur.FILTER instead of MUST through I'm not sure if it matters? It should be equivalent. The patch looks good to me overall, it feels cleaner to me that makeQuery and makeFilter were merged to a single method that has a two-phase iterator. The one thing that feels awkward to me is this topAcceptDocs which is top-level to the reader, while everywhere else we work with per-segment data-structures?
          Hide
          David Smiley added a comment -

          Thanks for the review Adrien.

          I'll change the BooleanClause.Occur.FILTER back to MUST only because it feels more orthogonal to intermixed use with SHOULD in some of those boolean queries.

          The one thing that feels awkward to me is this topAcceptDocs which is top-level to the reader, while everywhere else we work with per-segment data-structures?

          I felt the same, but we don't have a class/abstraction that yields a Bits (or similar) when given a LeafReaderContext as a parameter. Filter was fairly close, but it wasn't even perfect since bits() could return null so I had to have code to build a Bits from the iterator. In the end – the code here really wants a Bits – which is a very simple 2-method interface. So my feeling is that this is okay, and it's not worthwhile coming up with some new abstraction to return the Bits per-segment.

          Show
          David Smiley added a comment - Thanks for the review Adrien. I'll change the BooleanClause.Occur.FILTER back to MUST only because it feels more orthogonal to intermixed use with SHOULD in some of those boolean queries. The one thing that feels awkward to me is this topAcceptDocs which is top-level to the reader, while everywhere else we work with per-segment data-structures? I felt the same, but we don't have a class/abstraction that yields a Bits (or similar) when given a LeafReaderContext as a parameter. Filter was fairly close, but it wasn't even perfect since bits() could return null so I had to have code to build a Bits from the iterator. In the end – the code here really wants a Bits – which is a very simple 2-method interface. So my feeling is that this is okay, and it's not worthwhile coming up with some new abstraction to return the Bits per-segment.
          Hide
          Adrien Grand added a comment -

          Fair enough.

          Show
          Adrien Grand added a comment - Fair enough.
          Hide
          David Smiley added a comment -

          I was just thinking – maybe a Java 8 Function<LeafReaderContext,Bits> ? On the 5x side I'd probably have a little interface for it.

          Show
          David Smiley added a comment - I was just thinking – maybe a Java 8 Function<LeafReaderContext,Bits> ? On the 5x side I'd probably have a little interface for it.
          Hide
          ASF subversion and git services added a comment -

          Commit 1706181 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1706181 ]

          LUCENE-6650: Spatial module no longer uses Filter.

          Show
          ASF subversion and git services added a comment - Commit 1706181 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1706181 ] LUCENE-6650 : Spatial module no longer uses Filter.
          Hide
          ASF subversion and git services added a comment -

          Commit 1706184 from David Smiley in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1706184 ]

          LUCENE-6650: Spatial module no longer uses Filter.

          Show
          ASF subversion and git services added a comment - Commit 1706184 from David Smiley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1706184 ] LUCENE-6650 : Spatial module no longer uses Filter.
          Hide
          David Smiley added a comment -

          (I didn't bother going with the Function)

          Show
          David Smiley added a comment - (I didn't bother going with the Function)

            People

            • Assignee:
              David Smiley
              Reporter:
              Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development