Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7617

Improve GroupingSearch API and extensibility

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      While looking at how to make grouping work with the new XValuesSource API in core, I thought I'd try and clean up GroupingSearch a bit. We have three different ways of grouping at the moment: by doc block, using a single-pass collector; by field; and by ValueSource. The latter two both use essentially the same two-pass mechanism, with different Collector implementations.

      I can see a number of possible improvements here:

      • abstract the two-pass collector creation into a factory API, which should allow us to add the XValuesSource implementations as well
      • clean up the generics on the two-pass collectors - maybe look into removing them entirely? I'm not sure they add anything really, and we don't have them on the equivalent plan search APIs
      • think about moving the document block method into the join module instead, alongside all the other block-indexing code
      • rename the various Collector base classes so that they don't have 'Abstract' in them anymore
      1. LUCENE-7617.patch
        116 kB
        Alan Woodward
      2. LUCENE-7617.patch
        112 kB
        Alan Woodward
      3. LUCENE-7617.patch
        21 kB
        Alan Woodward

        Activity

        Hide
        romseygeek Alan Woodward added a comment -

        Here's a patch for the first point above, creating a new Grouper class that you can pass to GroupingSearch. Needs javadocs, but all tests pass.

        Show
        romseygeek Alan Woodward added a comment - Here's a patch for the first point above, creating a new Grouper class that you can pass to GroupingSearch. Needs javadocs, but all tests pass.
        Hide
        martijn.v.groningen Martijn van Groningen added a comment -

        +1 to this change. This should make using these collectors easier.

        There are a couple of places where I saw if statements without curly brackets. Maybe add these curly brackets. I find it easier to read.

        clean up the generics on the two-pass collectors - maybe look into removing them entirely?

        As far as I can see the bases classes use these generics, so that subclasses don't have to do manual casts. Which parts you like to cleanup?

        think about moving the document block method into the join module instead, alongside all the other block-indexing code

        I would prefer if the `BlockGroupingCollector` stayed in the grouping module. The block indexing is a feature provided by core and the way I see it modules can have features that use that. Also the the join and grouping modules provide each a different functionality. Although from a higher level the functionality is a bit overlapping, in a sense that some use cases could be implemented with both the join or the grouping module.

        rename the various Collector base classes so that they don't have 'Abstract' in them anymore

        agreed, a lot 'Abstract' in the names

        Show
        martijn.v.groningen Martijn van Groningen added a comment - +1 to this change. This should make using these collectors easier. There are a couple of places where I saw if statements without curly brackets. Maybe add these curly brackets. I find it easier to read. clean up the generics on the two-pass collectors - maybe look into removing them entirely? As far as I can see the bases classes use these generics, so that subclasses don't have to do manual casts. Which parts you like to cleanup? think about moving the document block method into the join module instead, alongside all the other block-indexing code I would prefer if the `BlockGroupingCollector` stayed in the grouping module. The block indexing is a feature provided by core and the way I see it modules can have features that use that. Also the the join and grouping modules provide each a different functionality. Although from a higher level the functionality is a bit overlapping, in a sense that some use cases could be implemented with both the join or the grouping module. rename the various Collector base classes so that they don't have 'Abstract' in them anymore agreed, a lot 'Abstract' in the names
        Hide
        romseygeek Alan Woodward added a comment -

        Thanks for the review Martijn! Here's an updated patch:

        • no more 'Abstract' in abstract class names
        • generics are changed so that instead of Class<ComplicatedInnerClassName<T>>, we just use Class<T>. Also, type parameters are all set to T rather than GROUP_VALUE_TYPE. You can shrink your windows when looking at grouping code now
        • Block grouping is left where it is
        • Added javadocs, and extra {} around if statements.

        Given that everything here is marked as experimental, I think we're OK to just backwards-break? Most people will be using GroupingSearch, I think, which stays the same.

        Show
        romseygeek Alan Woodward added a comment - Thanks for the review Martijn! Here's an updated patch: no more 'Abstract' in abstract class names generics are changed so that instead of Class<ComplicatedInnerClassName<T>>, we just use Class<T>. Also, type parameters are all set to T rather than GROUP_VALUE_TYPE. You can shrink your windows when looking at grouping code now Block grouping is left where it is Added javadocs, and extra {} around if statements. Given that everything here is marked as experimental, I think we're OK to just backwards-break? Most people will be using GroupingSearch, I think, which stays the same.
        Hide
        martijn.v.groningen Martijn van Groningen added a comment -

        +1 Thanks for cleaning this up!

        I found a few places still using GROUP_VALUE_TYPE, in SecondPassGroupingCollector.SearchGroupDocs, GroupDocs, TopGroups, AllGroupHeadsCollector.GroupHead and Grouping.Command (in Solr).

        Given that everything here is marked as experimental, I think we're OK to just backwards-break?

        Yes, that is OK.

        Show
        martijn.v.groningen Martijn van Groningen added a comment - +1 Thanks for cleaning this up! I found a few places still using GROUP_VALUE_TYPE, in SecondPassGroupingCollector.SearchGroupDocs, GroupDocs, TopGroups, AllGroupHeadsCollector.GroupHead and Grouping.Command (in Solr). Given that everything here is marked as experimental, I think we're OK to just backwards-break? Yes, that is OK.
        Hide
        romseygeek Alan Woodward added a comment -

        Final patch. I ended up removing the no-op group head collectors, as Solr was relying on the AllGroupHeadCollector returning a FixedBitSet - this should probably be just a Bits instance instead, but that can be dealt with in a later issue. Will commit later on today.

        Show
        romseygeek Alan Woodward added a comment - Final patch. I ended up removing the no-op group head collectors, as Solr was relying on the AllGroupHeadCollector returning a FixedBitSet - this should probably be just a Bits instance instead, but that can be dealt with in a later issue. Will commit later on today.
        Hide
        romseygeek Alan Woodward added a comment -

        The ASF bot didn't pick up the commits, for some reason:

        branch_6x: d4d3ede51cc114ad98fb05e19fd6c6e15e724f34
        master: da30f21f5d2c90a4e3d4fae87a297adfd4bbb3cb

        Thanks for the reviews Martijn!

        Show
        romseygeek Alan Woodward added a comment - The ASF bot didn't pick up the commits, for some reason: branch_6x: d4d3ede51cc114ad98fb05e19fd6c6e15e724f34 master: da30f21f5d2c90a4e3d4fae87a297adfd4bbb3cb Thanks for the reviews Martijn!

          People

          • Assignee:
            romseygeek Alan Woodward
            Reporter:
            romseygeek Alan Woodward
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development