Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      DrillDown is a utility class for creating drill-down queries over a base query and a bunch of categories. We've been asked to support AND, OR and AND of ORs. The latter is not so simple as a static utility method though, so instead we have some sample code ...

      Rather, I think that we can just create a DrillDownQuery (extends Query) which takes a baseQuery in its ctor and exposes add(CategoryPath...), such that every such group of categories is AND'ed with other groups, and internally they are OR'ed. It's very similar to how you would construct a BooleanQuery, only simpler and specific to facets.

      Internally, it would build a BooleanQuery and delegate rewrite, createWeight etc to it.

      That will remove the need for the static utility methods .. or we can keep static term() for convenience.

      1. LUCENE-4750.patch
        42 kB
        Shai Erera
      2. LUCENE-4750.patch
        42 kB
        Shai Erera
      3. LUCENE-4750.patch
        39 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        +1, this would make the API simpler.

        I think we should be able to do all work in rewrite? And then throw UOE from createWeight?

        Show
        Michael McCandless added a comment - +1, this would make the API simpler. I think we should be able to do all work in rewrite? And then throw UOE from createWeight?
        Hide
        Shai Erera added a comment -

        I think we should be able to do all work in rewrite? And then throw UOE from createWeight?

        Maybe. I was just thinking of a 'complete' Query impl, just in case someone doesn't call rewrite before any other methods.

        Show
        Shai Erera added a comment - I think we should be able to do all work in rewrite? And then throw UOE from createWeight? Maybe. I was just thinking of a 'complete' Query impl, just in case someone doesn't call rewrite before any other methods.
        Hide
        Michael McCandless added a comment -

        Well it's guaranteed that IndexSearcher will call Query.rewrite ... so it should be fine to only implement rewrite (well, and .toString, etc.)

        Show
        Michael McCandless added a comment - Well it's guaranteed that IndexSearcher will call Query.rewrite ... so it should be fine to only implement rewrite (well, and .toString, etc.)
        Hide
        Uwe Schindler added a comment -

        Mike: Yes, you dont need to implement createWeiggt. That was always like this, there are lots of queries shipped with Lucene that have no weight (e.g. MTQ). The Query base class handles all of this.

        Show
        Uwe Schindler added a comment - Mike: Yes, you dont need to implement createWeiggt. That was always like this, there are lots of queries shipped with Lucene that have no weight (e.g. MTQ). The Query base class handles all of this.
        Hide
        Michael McCandless added a comment -

        Initial patch, still w/ nocommits, but the basic idea seems to work.

        But: I had to disable 2 tests, because the new DDQ requires that OR'd paths share the same dim (CP.components[0]) and these 2 tests didn't ... but that makes me wonder: is there ever a "real" use case when we shouldn't enforce this?

        Show
        Michael McCandless added a comment - Initial patch, still w/ nocommits, but the basic idea seems to work. But: I had to disable 2 tests, because the new DDQ requires that OR'd paths share the same dim (CP.components [0] ) and these 2 tests didn't ... but that makes me wonder: is there ever a "real" use case when we shouldn't enforce this?
        Hide
        Michael McCandless added a comment -

        I think we should get this done before drill sideways (LUCENE-4748).

        Show
        Michael McCandless added a comment - I think we should get this done before drill sideways ( LUCENE-4748 ).
        Hide
        Shai Erera added a comment -

        About the two tests. This:

        Query q3 = DrillDown.query(defaultParams, null, Occur.MUST, new CategoryPath("a"), new CategoryPath("b"));

        Should be converted to two add()}?

        The second test, instead of nuking it, make it testAndOfOrs? Not sure if we have any such test, so it would be a good idea anyway.

        And yes, I think that we should just prevent across dimension ORs. It is still a helper class, someone can build his own such query if he wishes to. Unlike "OR between CPs of the same dim", I think that "OR between different dims" is not so common...

        About the patch:

        • I've wanted to do it for a long time – can we have just one ctor with FSP? DDQ should be called from a search context, so passing FIP is odd. Can you check if there's any real code that needs to pass FIP?
          • Same goes for the static term() method.
        • This check in add() if (paths[0].length == 0) is odd, since you check it again for every CP later. Maybe move it to inside the paths.length==1?
        • Regarding baseQuery, while reviewing LUCENE-4748 I wrote that DDQ should probably not take a baseQuery, but rather let the user wrap from the outside (while DDQ can have a static helper class). Do you not need it there?
          • Or ... such users (like DrillSideways) can pass baseQuery=null in that case and do the final wrapping themselves? If that would work for sideways, then I think taking a baseQuery is good as it simplifies usage for apps.

        That looks good, thanks for taking care of this!

        Show
        Shai Erera added a comment - About the two tests. This: Query q3 = DrillDown.query(defaultParams, null, Occur.MUST, new CategoryPath("a"), new CategoryPath("b")); Should be converted to two add() }? The second test, instead of nuking it, make it testAndOfOrs? Not sure if we have any such test, so it would be a good idea anyway. And yes, I think that we should just prevent across dimension ORs. It is still a helper class, someone can build his own such query if he wishes to. Unlike "OR between CPs of the same dim", I think that "OR between different dims" is not so common... About the patch: I've wanted to do it for a long time – can we have just one ctor with FSP? DDQ should be called from a search context, so passing FIP is odd. Can you check if there's any real code that needs to pass FIP? Same goes for the static term() method. This check in add() if (paths [0] .length == 0) is odd, since you check it again for every CP later. Maybe move it to inside the paths.length==1 ? Regarding baseQuery, while reviewing LUCENE-4748 I wrote that DDQ should probably not take a baseQuery, but rather let the user wrap from the outside (while DDQ can have a static helper class). Do you not need it there? Or ... such users (like DrillSideways) can pass baseQuery=null in that case and do the final wrapping themselves? If that would work for sideways, then I think taking a baseQuery is good as it simplifies usage for apps. That looks good, thanks for taking care of this!
        Hide
        Shai Erera added a comment -

        Mike, I'm migrating the patch to after the packages reorg, and I want to handle the nocommits. In the process I thought about two things:

        1. Shouldn't rewrite() call res.rewrite()? We don't know what baseQuery will rewrite too...
        2. Perhaps instead of keeping a List<Query> drillDownQueries, we can build the BQ on-the-fly in .add()? We can then use it in toString and rewrite directly, vs now that rewrite always creates a new BQ.
          • BTW, is that an error to not create a new BQ on every rewrite? I don't think so, but want to verify...

        If we do the 2nd, then we can check in the ctor already if baseQuery is a BQ and just use it to add the drill-down clauses to? If we need to know what baseQuery was, we can clone() it? Is it perhaps too much?

        Show
        Shai Erera added a comment - Mike, I'm migrating the patch to after the packages reorg, and I want to handle the nocommits. In the process I thought about two things: Shouldn't rewrite() call res.rewrite()? We don't know what baseQuery will rewrite too... Perhaps instead of keeping a List<Query> drillDownQueries , we can build the BQ on-the-fly in .add() ? We can then use it in toString and rewrite directly, vs now that rewrite always creates a new BQ. BTW, is that an error to not create a new BQ on every rewrite? I don't think so, but want to verify... If we do the 2nd, then we can check in the ctor already if baseQuery is a BQ and just use it to add the drill-down clauses to? If we need to know what baseQuery was, we can clone() it? Is it perhaps too much?
        Hide
        Shai Erera added a comment -

        Oh, and if we do build the result BQ on the fly, it will make implementing equals() and hashCode() very simple ...

        Show
        Shai Erera added a comment - Oh, and if we do build the result BQ on the fly, it will make implementing equals() and hashCode() very simple ...
        Hide
        Michael McCandless added a comment -

        Mike, I'm migrating the patch to after the packages reorg, and I want to handle the nocommits.

        Thanks Shai!

        Shouldn't rewrite() call res.rewrite()? We don't know what baseQuery will rewrite too...

        This isn't actually necessary: IndexSearcher will call rewrite on whatever we return.

        Perhaps instead of keeping a List<Query> drillDownQueries, we can build the BQ on-the-fly in .add()? We can then use it in toString and rewrite directly, vs now that rewrite always creates a new BQ.

        I think that's good?

        BTW, is that an error to not create a new BQ on every rewrite? I don't think so, but want to verify...

        I don't think it's an error to not make a new Query on every rewrite. I suppose there is some risk that an app might run the query while continuing to add drill downs in another thread ... but apps just shouldn't do that ...

        That said ... I don't really like how toString shows impl details (eg, the $facet field name), vs eg just the CP/s of each drill-down, but I think that's pretty minor ...

        Oh, and if we do build the result BQ on the fly, it will make implementing equals() and hashCode() very simple ...

        Good!

        Show
        Michael McCandless added a comment - Mike, I'm migrating the patch to after the packages reorg, and I want to handle the nocommits. Thanks Shai! Shouldn't rewrite() call res.rewrite()? We don't know what baseQuery will rewrite too... This isn't actually necessary: IndexSearcher will call rewrite on whatever we return. Perhaps instead of keeping a List<Query> drillDownQueries, we can build the BQ on-the-fly in .add()? We can then use it in toString and rewrite directly, vs now that rewrite always creates a new BQ. I think that's good? BTW, is that an error to not create a new BQ on every rewrite? I don't think so, but want to verify... I don't think it's an error to not make a new Query on every rewrite. I suppose there is some risk that an app might run the query while continuing to add drill downs in another thread ... but apps just shouldn't do that ... That said ... I don't really like how toString shows impl details (eg, the $facet field name), vs eg just the CP/s of each drill-down, but I think that's pretty minor ... Oh, and if we do build the result BQ on the fly, it will make implementing equals() and hashCode() very simple ... Good!
        Hide
        Michael McCandless added a comment -

        If we do the 2nd, then we can check in the ctor already if baseQuery is a BQ and just use it to add the drill-down clauses to? If we need to know what baseQuery was, we can clone() it? Is it perhaps too much?

        I thought more about this ... while this is a tempting optimization ... I think it will mess up scoring in general, because if the original query was using coord, then adding those clauses will mess this up? Maybe we could do this if the original query isn't using coord? But maybe leave this as future TODO optimization ... it seems dangerous

        Show
        Michael McCandless added a comment - If we do the 2nd, then we can check in the ctor already if baseQuery is a BQ and just use it to add the drill-down clauses to? If we need to know what baseQuery was, we can clone() it? Is it perhaps too much? I thought more about this ... while this is a tempting optimization ... I think it will mess up scoring in general, because if the original query was using coord, then adding those clauses will mess this up? Maybe we could do this if the original query isn't using coord? But maybe leave this as future TODO optimization ... it seems dangerous
        Hide
        Shai Erera added a comment -

        I don't really like how toString shows impl details (eg, the $facet field name), vs eg just the CP/s of each drill-down, but I think that's pretty minor ...

        That's the current state of the patch anyway. Creating a BQ on-the-fly won't change it. And I'm not sure if it's bad .. that way, toString shows you the query that is executed, so e.g. if you indexed into multiple category lists, you see that right-away. Although, I agree it may be redundant information for those who index into one category list...

        Maybe we could do this if the original query isn't using coord? But maybe leave this as future TODO optimization

        Ok.

        Show
        Shai Erera added a comment - I don't really like how toString shows impl details (eg, the $facet field name), vs eg just the CP/s of each drill-down, but I think that's pretty minor ... That's the current state of the patch anyway. Creating a BQ on-the-fly won't change it. And I'm not sure if it's bad .. that way, toString shows you the query that is executed, so e.g. if you indexed into multiple category lists, you see that right-away. Although, I agree it may be redundant information for those who index into one category list... Maybe we could do this if the original query isn't using coord? But maybe leave this as future TODO optimization Ok.
        Hide
        Shai Erera added a comment -

        Patch updated to trunk + handles all nocommit. I moved to create a BQ on-the-fly, and implemented hashCode() / equals(). Also implemented clone() (there's a test showing what would happen if we don't).

        .rewrite() throws exception if no baseQuery is given AND no drill-down categories too. However, if BQ has at least one clause (either baseQuery was given, or at least one drill-down category added), it permits it. If only baseQuery was given, it will rewrite to it, so I see no reason to throw an exception.

        Show
        Shai Erera added a comment - Patch updated to trunk + handles all nocommit. I moved to create a BQ on-the-fly, and implemented hashCode() / equals(). Also implemented clone() (there's a test showing what would happen if we don't). .rewrite() throws exception if no baseQuery is given AND no drill-down categories too. However, if BQ has at least one clause (either baseQuery was given, or at least one drill-down category added), it permits it. If only baseQuery was given, it will rewrite to it, so I see no reason to throw an exception.
        Hide
        Michael McCandless added a comment -

        +1, patch looks great, thanks Shai!

        Can we strengthen the javadocs to call out that baseQuery can be null,
        and this means "pure browse"? Or maybe a ctor that doesn't take a
        baseQuery? I feel like it's non-obvious / not advertised now ...

        Show
        Michael McCandless added a comment - +1, patch looks great, thanks Shai! Can we strengthen the javadocs to call out that baseQuery can be null, and this means "pure browse"? Or maybe a ctor that doesn't take a baseQuery? I feel like it's non-obvious / not advertised now ...
        Hide
        Shai Erera added a comment -

        Thanks Mike, I added another ctor which doesn't take a base query and noted the "pure browsing" behavior if you pass null to the 2nd ctor.

        I think that's ready, I'll commit shortly.

        Show
        Shai Erera added a comment - Thanks Mike, I added another ctor which doesn't take a base query and noted the "pure browsing" behavior if you pass null to the 2nd ctor. I think that's ready, I'll commit shortly.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1444482

        LUCENE-4750: convert DrillDown to DrillDownQuery

        Show
        Commit Tag Bot added a comment - [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1444482 LUCENE-4750 : convert DrillDown to DrillDownQuery
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. Thanks Mike!

        Show
        Shai Erera added a comment - Committed to trunk and 4x. Thanks Mike!
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1444484

        LUCENE-4750: convert DrillDown to DrillDownQuery

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1444484 LUCENE-4750 : convert DrillDown to DrillDownQuery
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development