Lucene - Core
  1. Lucene - Core
  2. LUCENE-4580

Facet DrillDown should return a ConstantScoreQuery

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, Trunk
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      DrillDown is a helper class which the user can use to convert a facet value that a user selected into a Query for performing drill-down or narrowing the results. The API has several static methods that create e.g. a Term or Query.

      Rather than creating a Query, it would make more sense to create a Filter I think. In most cases, the clicked facets should not affect the scoring of documents. Anyway, even if it turns out that it must return a Query (which I doubt), we should at least modify the impl to return a ConstantScoreQuery.

      1. LUCENE-4580.patch
        11 kB
        Shai Erera
      2. LUCENE-4580.patch
        13 kB
        Shai Erera

        Activity

        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1416864

        LUCENE-4580: Facet DrillDown should return a ConstantScoreQuery

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1416864 LUCENE-4580 : Facet DrillDown should return a ConstantScoreQuery
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1416864

        LUCENE-4580: Facet DrillDown should return a ConstantScoreQuery

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1416864 LUCENE-4580 : Facet DrillDown should return a ConstantScoreQuery
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x.

        Show
        Shai Erera added a comment - Committed to trunk and 4x.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1416860

        LUCENE-4580: Facet DrillDown should return a ConstantScoreQuery

        Show
        Commit Tag Bot added a comment - [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1416860 LUCENE-4580 : Facet DrillDown should return a ConstantScoreQuery
        Hide
        Gilad Barkai added a comment -

        +1

        Show
        Gilad Barkai added a comment - +1
        Hide
        Shai Erera added a comment -
        • Removed DrillDown.query() which didn't take baseQuery – DrillDown will usually be used for drill-down queries. However, the query() methods handle the case where baseQuery is null.
        • Fixed SimpleSearcher.searchWithDrillDown to initialize FacetIndexingParams and pass it to the other methods.

        I think this is ready to go in.

        Show
        Shai Erera added a comment - Removed DrillDown.query() which didn't take baseQuery – DrillDown will usually be used for drill-down queries. However, the query() methods handle the case where baseQuery is null. Fixed SimpleSearcher.searchWithDrillDown to initialize FacetIndexingParams and pass it to the other methods. I think this is ready to go in.
        Hide
        Shai Erera added a comment -

        I understand what you mean by 0f, sorry. I will fix it.

        Also, I think that I'll simplify SimpleSearcher to create the FacetSearchParams up front, so that it's clear where they're created. In addition, I think that we can cut down the number of query() methods in DrillDown, to always take a baseQuery and handle the case that it's null. Will post an updated patch soon.

        Show
        Shai Erera added a comment - I understand what you mean by 0f, sorry. I will fix it. Also, I think that I'll simplify SimpleSearcher to create the FacetSearchParams up front, so that it's clear where they're created. In addition, I think that we can cut down the number of query() methods in DrillDown, to always take a baseQuery and handle the case that it's null. Will post an updated patch soon.
        Hide
        Shai Erera added a comment -

        In SimpleSearcher the code is changed as such the DrillDown.query() always takes a new FacetSearchParams()

        I think that the example code was just misleading. The FacetSearchParams are initialized in another method and used there, over the default indexing params. Notice how searchWithDrillDown calls searchWithRequest(null), which initializes default FacetIndexingParams. Since you cannot really execute faceted search without FacetSearchParams (you need to at least add FacetRequests), I think that having at least FSP on DrillDown API is ok. The API already has too many duplicates IMO.

        In the tests, perhaps use Float.MIN_VALUE or 0f when asserting equality of floats instead against 0.00000.

        I don't assert against 0.00000, but against 0.0f. The 0.00000 part is for the precision requires by assertEquals when you use doubles or floats.

        Show
        Shai Erera added a comment - In SimpleSearcher the code is changed as such the DrillDown.query() always takes a new FacetSearchParams() I think that the example code was just misleading. The FacetSearchParams are initialized in another method and used there, over the default indexing params. Notice how searchWithDrillDown calls searchWithRequest(null), which initializes default FacetIndexingParams. Since you cannot really execute faceted search without FacetSearchParams (you need to at least add FacetRequests), I think that having at least FSP on DrillDown API is ok. The API already has too many duplicates IMO. In the tests, perhaps use Float.MIN_VALUE or 0f when asserting equality of floats instead against 0.00000. I don't assert against 0.00000, but against 0.0f. The 0.00000 part is for the precision requires by assertEquals when you use doubles or floats.
        Hide
        Gilad Barkai added a comment - - edited

        Patch looks good.

        Two comments:

        • In SimpleSearcher the code is changed as such the DrillDown.query() always takes a new FacetSearchParams() (no default) - but that's not obvious it is the same one used for the search, or that it contains the FacetIndexingParams used for indexing. The defaults made sure that if none was specified along the way - the user should not bother himself with it. It is a small piece of code, but in an example might confuse the reader. I'm somewhat leaning toward allowing a default query() call without specifying a FSP.
        • In the tests, perhaps use Float.MIN_VALUE or 0f when asserting equality of floats instead against 0.00000.
        Show
        Gilad Barkai added a comment - - edited Patch looks good. Two comments: In SimpleSearcher the code is changed as such the DrillDown.query() always takes a new FacetSearchParams() (no default) - but that's not obvious it is the same one used for the search, or that it contains the FacetIndexingParams used for indexing. The defaults made sure that if none was specified along the way - the user should not bother himself with it. It is a small piece of code, but in an example might confuse the reader. I'm somewhat leaning toward allowing a default query() call without specifying a FSP. In the tests, perhaps use Float.MIN_VALUE or 0f when asserting equality of floats instead against 0.00000.
        Hide
        Shai Erera added a comment -

        I also added a CHANGES entry. All tests pass, so I think it's ready to go in.

        Show
        Shai Erera added a comment - I also added a CHANGES entry. All tests pass, so I think it's ready to go in.
        Hide
        Shai Erera added a comment -

        Patch addresses the following:

        • Add tests to DrillDownTest that verify that scores are not affected when running drill-down queries.
        • Fix DrillDown.query to return CSQ with boost set to 0.0f, as well as creating the BQ with disable coord (thanks Uwe for the recommendations !)
        • Improved DrillDown javadocs, as well as removed DrillDown.query() which took no FacetSearchParams – faceted search applications must work with FacetSearchParams, even if the default one.
        Show
        Shai Erera added a comment - Patch addresses the following: Add tests to DrillDownTest that verify that scores are not affected when running drill-down queries. Fix DrillDown.query to return CSQ with boost set to 0.0f, as well as creating the BQ with disable coord (thanks Uwe for the recommendations !) Improved DrillDown javadocs, as well as removed DrillDown.query() which took no FacetSearchParams – faceted search applications must work with FacetSearchParams, even if the default one.
        Hide
        Uwe Schindler added a comment -

        OK. I would add a test that verifies that the scores dont change...

        Show
        Uwe Schindler added a comment - OK. I would add a test that verifies that the scores dont change...
        Hide
        Shai Erera added a comment -

        It seems then that the only thing that needs to be done here is fix the query() code to return CSQ (and set the coord and boost properly). The API today doesn't support disjunction between categories, but it is doable with a combination of term() and query() calls, so rather than adding more API, I say that we leave it simple.

        If you agree, I'll rename this issue and fix DrillDown.

        Show
        Shai Erera added a comment - It seems then that the only thing that needs to be done here is fix the query() code to return CSQ (and set the coord and boost properly). The API today doesn't support disjunction between categories, but it is doable with a combination of term() and query() calls, so rather than adding more API, I say that we leave it simple. If you agree, I'll rename this issue and fix DrillDown.
        Hide
        Uwe Schindler added a comment -

        If a Filter is not cached, how efficient is using TermsFilter(oneTerm) vs. CSQ(TermQuery)? Are we talking huge gains here? If not, let's keep the API simple. DrillDown offers the terms() API too, so one can construct BooleanFilter, TermsFilter and whatever he wants out of them.

        CSQ(TermQuery)) is way faster, as it can leap-frog. TermsFilter with one term will allocate a Bitset and then mark all positings in it; also those postings which are not needed (this depends on the FilteredQuery mode, which is used to apply filters). CSQ(TermQuery) will leap-frog so the original query and the single TermQuery will advance each other and lead to fastest execution, while the TermsFilter prepares a bitset before the query execution, so the latency will be bigger (2 steps).

        Show
        Uwe Schindler added a comment - If a Filter is not cached, how efficient is using TermsFilter(oneTerm) vs. CSQ(TermQuery)? Are we talking huge gains here? If not, let's keep the API simple. DrillDown offers the terms() API too, so one can construct BooleanFilter, TermsFilter and whatever he wants out of them. CSQ(TermQuery)) is way faster, as it can leap-frog. TermsFilter with one term will allocate a Bitset and then mark all positings in it; also those postings which are not needed (this depends on the FilteredQuery mode, which is used to apply filters). CSQ(TermQuery) will leap-frog so the original query and the single TermQuery will advance each other and lead to fastest execution, while the TermsFilter prepares a bitset before the query execution, so the latency will be bigger (2 steps).
        Hide
        Uwe Schindler added a comment -

        Hi,

        In general I would prefer another approach for the whole thing. We should not make the users decide if then need to use a Filter or Query or whatever drill down approach. The user API should only use Query: Query in, Query out:

        Query drilldown(Query originalQuery, CategoryPath... categories)
        

        This would get the user query to drill down as input and return a new Query with the same scoring, but somehow filtered. Internally this method can use a Filter or Query or whatever to do the drill down, the user does not need to think about. It should just add 2 options: conjunction or disjunction.

        The following possibilities are available:

        • one or more category path, conjunction: returns new BooleanQuery(true) [no coord], consisting of the original Query as clause and multiple CSQ(TermQuery(category)) with boost=0.0 (boost=0 means the BQ does not get any value from the filter clause and with disableCoord=true nothing changes)
        • more than one category path, disjunction between categories: return FilteredQuery(originalQuery, new TermsFilter(terms))
        Show
        Uwe Schindler added a comment - Hi, In general I would prefer another approach for the whole thing. We should not make the users decide if then need to use a Filter or Query or whatever drill down approach. The user API should only use Query: Query in, Query out: Query drilldown(Query originalQuery, CategoryPath... categories) This would get the user query to drill down as input and return a new Query with the same scoring, but somehow filtered. Internally this method can use a Filter or Query or whatever to do the drill down, the user does not need to think about. It should just add 2 options: conjunction or disjunction. The following possibilities are available: one or more category path, conjunction: returns new BooleanQuery(true) [no coord] , consisting of the original Query as clause and multiple CSQ(TermQuery(category)) with boost=0.0 (boost=0 means the BQ does not get any value from the filter clause and with disableCoord=true nothing changes) more than one category path, disjunction between categories: return FilteredQuery(originalQuery, new TermsFilter(terms))
        Hide
        Shai Erera added a comment -

        Uwe, the thinking that I had about Filter is that if you e.g. wrap it w/ CWF, then you pay that cost once, and that's it. Therefore BooleanFilter is just used as a means to create a more complicated Filter.

        But, I'm not sure that I want to over-complicate DrillDown API. So perhaps this is what we do:

        • Fix DrillDown to always return CSQ, irregardless of the case.
        • Document that for caching purposes, one can wrap the returned Query with CachingWrapperFilter(QueryWrapperFilter(Query))

        If a Filter is not cached, how efficient is using TermsFilter(oneTerm) vs. CSQ(TermQuery)? Are we talking huge gains here? If not, let's keep the API simple. DrillDown offers the terms() API too, so one can construct BooleanFilter, TermsFilter and whatever he wants out of them.

        Show
        Shai Erera added a comment - Uwe, the thinking that I had about Filter is that if you e.g. wrap it w/ CWF, then you pay that cost once, and that's it. Therefore BooleanFilter is just used as a means to create a more complicated Filter. But, I'm not sure that I want to over-complicate DrillDown API. So perhaps this is what we do: Fix DrillDown to always return CSQ, irregardless of the case. Document that for caching purposes, one can wrap the returned Query with CachingWrapperFilter(QueryWrapperFilter(Query)) If a Filter is not cached, how efficient is using TermsFilter(oneTerm) vs. CSQ(TermQuery)? Are we talking huge gains here? If not, let's keep the API simple. DrillDown offers the terms() API too, so one can construct BooleanFilter, TermsFilter and whatever he wants out of them.
        Hide
        Uwe Schindler added a comment -

        This is exactly what I proposed. I'm +1 for it (and BooleanFilter).

        -1, BooleanFilter is horrible and slow for this use-case.

        Show
        Uwe Schindler added a comment - This is exactly what I proposed. I'm +1 for it (and BooleanFilter). -1, BooleanFilter is horrible and slow for this use-case.
        Hide
        Uwe Schindler added a comment -

        Or better, move TermsFilter and BooleanFilter to core – why are they treated differently than TermQuery and BooleanQuery? Especially now that Filter is applied more efficiently, I suspect more people will want to use it?

        TermsFilter - yes. See my comment above. We already have a very good Automaton-based one in test-framework that also needs to be moved to core (as a MTQ rewrite method).

        BUT, about BooleanFilter: This class is horrible ineffective, inconsistent, and not good for drill downs (you should use it only when you want to do caching of filters with bitsets). If you use it for those type of queries you pay the price of allocating bitsets, iterate the wrapped queries/filters completely instead of advanceing the underlying scorers (leap-frogging). So for drilldowns BooleanFilter is the worst you can do!

        The way to go from my opinion is to use constant score queries (like Solr does).

        In addition we recently reopened / discussed again the very old issue to nuke Filters at all and just provide queries and nothing more. Filters are nothing more than constant score queries

        Show
        Uwe Schindler added a comment - Or better, move TermsFilter and BooleanFilter to core – why are they treated differently than TermQuery and BooleanQuery? Especially now that Filter is applied more efficiently, I suspect more people will want to use it? TermsFilter - yes. See my comment above. We already have a very good Automaton-based one in test-framework that also needs to be moved to core (as a MTQ rewrite method). BUT, about BooleanFilter: This class is horrible ineffective, inconsistent, and not good for drill downs (you should use it only when you want to do caching of filters with bitsets). If you use it for those type of queries you pay the price of allocating bitsets, iterate the wrapped queries/filters completely instead of advanceing the underlying scorers (leap-frogging). So for drilldowns BooleanFilter is the worst you can do! The way to go from my opinion is to use constant score queries (like Solr does). In addition we recently reopened / discussed again the very old issue to nuke Filters at all and just provide queries and nothing more. Filters are nothing more than constant score queries
        Hide
        Gilad Barkai added a comment -

        it should be a combination of TermsFilter and BooleanFilter. So in fact if we want to keep DrillDown behave like today, we should use BooleanFilter and TermsFilter.

        +1

        Show
        Gilad Barkai added a comment - it should be a combination of TermsFilter and BooleanFilter. So in fact if we want to keep DrillDown behave like today, we should use BooleanFilter and TermsFilter. +1
        Hide
        Shai Erera added a comment -

        In my opinion, for Lucene 4.x we should move the TermsFilter to core.

        This is exactly what I proposed. I'm +1 for it (and BooleanFilter).

        TermsFilter is a Disjunction, but for drill downs you generally need Conjunctions

        You're right, it should be a combination of TermsFilter and BooleanFilter. So in fact if we want to keep DrillDown behave like today, we should use BooleanFilter and TermsFilter.

        Show
        Shai Erera added a comment - In my opinion, for Lucene 4.x we should move the TermsFilter to core. This is exactly what I proposed. I'm +1 for it (and BooleanFilter). TermsFilter is a Disjunction, but for drill downs you generally need Conjunctions You're right, it should be a combination of TermsFilter and BooleanFilter. So in fact if we want to keep DrillDown behave like today, we should use BooleanFilter and TermsFilter.
        Hide
        Shai Erera added a comment -

        Not that I'm against adding dependencies between modules, but just to give some data points:

        • The queries module is not a MUST for every search application (let alone faceted search). The basic query components are in core already (Filter, Query, TermQuery, BooleanQuery etc.). I found the queries module useful (so far) for the BooleanFilter and TermsFilter classes.
        • A question was recently asked on the user list how to make DrillDown create OR queries instead of AND. The scenario – you have a facet dimension for which you would like to allow people to select multiple values, and OR them (while still AND-ing with other dimensions). Since DrillDown doesn't have that option, I offered the user to use DrillDown.term() and construct his own BooleanQuery.
          • My point is that DrillDown is a helper class that doesn't cover all cases already. Even if we make it return a Filter, that user will still need to construct BooleanFilter doing several API calls.
          • So I'm ok if it only exposes terms(), but I'm also ok if we add the queries dependency and just make the cut over to Filter instead of Query.
          • Or better, move TermsFilter and BooleanFilter to core – why are they treated differently than TermQuery and BooleanQuery? Especially now that Filter is applied more efficiently, I suspect more people will want to use it?
        • I am all for usability, but TermsFilter is not like BooleanQuery in the sense that it's very easy to create it (just one line of code). I'm not sure that if BooleanQuery had a ctor which accepts List<Term>, we wouldn't have used it in DrillDown, or if we even create the DrillDown.query API. So the 'same code over and over' is not comparable between the two cases, I think.
        Show
        Shai Erera added a comment - Not that I'm against adding dependencies between modules, but just to give some data points: The queries module is not a MUST for every search application (let alone faceted search). The basic query components are in core already (Filter, Query, TermQuery, BooleanQuery etc.). I found the queries module useful (so far) for the BooleanFilter and TermsFilter classes. A question was recently asked on the user list how to make DrillDown create OR queries instead of AND. The scenario – you have a facet dimension for which you would like to allow people to select multiple values, and OR them (while still AND-ing with other dimensions). Since DrillDown doesn't have that option, I offered the user to use DrillDown.term() and construct his own BooleanQuery. My point is that DrillDown is a helper class that doesn't cover all cases already. Even if we make it return a Filter, that user will still need to construct BooleanFilter doing several API calls. So I'm ok if it only exposes terms(), but I'm also ok if we add the queries dependency and just make the cut over to Filter instead of Query. Or better, move TermsFilter and BooleanFilter to core – why are they treated differently than TermQuery and BooleanQuery? Especially now that Filter is applied more efficiently, I suspect more people will want to use it? I am all for usability, but TermsFilter is not like BooleanQuery in the sense that it's very easy to create it (just one line of code). I'm not sure that if BooleanQuery had a ctor which accepts List<Term> , we wouldn't have used it in DrillDown , or if we even create the DrillDown.query API. So the 'same code over and over' is not comparable between the two cases, I think.
        Hide
        Uwe Schindler added a comment -

        In my opinion, for Lucene 4.x we should move the TermsFilter to core. This filter is very often used and we already have a good Automaton-based variant (DahizukMihov) filter that performs very well on lots of terms!

        On the other hand: TermsFilter is a Disjunction, but for drill downs you generally need Conjunctions?

        Show
        Uwe Schindler added a comment - In my opinion, for Lucene 4.x we should move the TermsFilter to core. This filter is very often used and we already have a good Automaton-based variant (DahizukMihov) filter that performs very well on lots of terms! On the other hand: TermsFilter is a Disjunction, but for drill downs you generally need Conjunctions?
        Hide
        Gilad Barkai added a comment -

        DrillDown is a useful class with a straight-forward API, which makes the life of basic users simpler.
        As Shai pointed out, today there is no dependency on the Queries module, but the code contains a hidden bug in which a 'drill down' operation may change the score of the results. And adding a Filter or a ConstantScoreQuery looks the right way to go.
        That sort of a fix is possible, while keeping the usefulness of the DrillDown class, only if the code becomes dependent on the queries module.
        On the other hand, removing the dependency would force most faceted users to write that exact extra code as mentioned. Preventing such cases was the reason that utility class was created.

        'Drilling Down' is a basic feature of a faceted search application, and the DrillDown class provides an easy way of invoking it.
        Having a faceted search application without utilizing the queries module (e.g filtering) seems remote - is there any such scenario?
        Module dependency may result with a user loading jars he does not need or care about, but the queries module jar is likely to be found on any faceted search application.

        Modules should be independent, but I see enough gain in here. It would not bother me if the faceted module would depend on the query module. I find it logical.

        -1 for forcing users to write same code over and over to keep facet module independent of the queries module
        +1 for adding DrillDown.filter(CategoryPath...) - That looks like the way to go

        Show
        Gilad Barkai added a comment - DrillDown is a useful class with a straight-forward API, which makes the life of basic users simpler. As Shai pointed out, today there is no dependency on the Queries module, but the code contains a hidden bug in which a 'drill down' operation may change the score of the results. And adding a Filter or a ConstantScoreQuery looks the right way to go. That sort of a fix is possible, while keeping the usefulness of the DrillDown class, only if the code becomes dependent on the queries module. On the other hand, removing the dependency would force most faceted users to write that exact extra code as mentioned. Preventing such cases was the reason that utility class was created. 'Drilling Down' is a basic feature of a faceted search application, and the DrillDown class provides an easy way of invoking it. Having a faceted search application without utilizing the queries module (e.g filtering) seems remote - is there any such scenario? Module dependency may result with a user loading jars he does not need or care about, but the queries module jar is likely to be found on any faceted search application. Modules should be independent, but I see enough gain in here. It would not bother me if the faceted module would depend on the query module. I find it logical. -1 for forcing users to write same code over and over to keep facet module independent of the queries module +1 for adding DrillDown.filter(CategoryPath...) - That looks like the way to go
        Hide
        Shai Erera added a comment -

        I took a look at TermsFilter (under the queries module), and I think that it would be a waste to write another version for facets. However, DrillDown is only a helper class and I'm not sure that it's worth adding lucene-queries as a dependency, especially that we have LUCENE-3998 open for removing dependencies from it .

        So I wonder if instead we should have only a Term / List<Term> helper methods and document that the caller can use TermsFilter? E.g.:

        Filter facets = new TermsFilter(DrillDown.terms(CategoryPath...));
        

        Really, this is a very simple code. What do you think?

        Show
        Shai Erera added a comment - I took a look at TermsFilter (under the queries module), and I think that it would be a waste to write another version for facets. However, DrillDown is only a helper class and I'm not sure that it's worth adding lucene-queries as a dependency, especially that we have LUCENE-3998 open for removing dependencies from it . So I wonder if instead we should have only a Term / List<Term> helper methods and document that the caller can use TermsFilter? E.g.: Filter facets = new TermsFilter(DrillDown.terms(CategoryPath...)); Really, this is a very simple code. What do you think?
        Hide
        Gilad Barkai added a comment -

        +1

        Show
        Gilad Barkai added a comment - +1
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development