Lucene - Core
  1. Lucene - Core
  2. LUCENE-3997

join module should not depend on grouping module

    Details

    • Type: Task Task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.9, 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I think TopGroups/GroupDocs should simply be in core?

      Both grouping and join modules use these trivial classes, but join depends on grouping just for them.

      I think its better that we try to minimize these inter-module dependencies.
      Of course, another option is to combine grouping and join into one module, but
      last time i brought that up nobody could agree on a name.

      Anyway I think the change is pretty clean: its similar to having basic stuff like Analyzer.java in core,
      so other things can work with Analyzer without depending on any specific implementing modules.

      1. LUCENE-3997.patch
        13 kB
        Robert Muir
      2. LUCENE-3997.patch
        13 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Patch, after:

        svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/TopGroups.java lucene/core/src/java/org/apache/lucene/search
        
        svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/GroupDocs.java lucene/core/src/java/org/apache/lucene/search
        
        Show
        Robert Muir added a comment - Patch, after: svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/TopGroups.java lucene/core/src/java/org/apache/lucene/search svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/GroupDocs.java lucene/core/src/java/org/apache/lucene/search
        Hide
        Robert Muir added a comment -

        Patch, after:

        svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/TopGroups.java lucene/core/src/java/org/apache/lucene/search
        
        svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/GroupDocs.java lucene/core/src/java/org/apache/lucene/search
        
        Show
        Robert Muir added a comment - Patch, after: svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/TopGroups.java lucene/core/src/java/org/apache/lucene/search svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/GroupDocs.java lucene/core/src/java/org/apache/lucene/search
        Hide
        Robert Muir added a comment -

        Sorry for the duplicate upload... jira was going nutso on me

        Show
        Robert Muir added a comment - Sorry for the duplicate upload... jira was going nutso on me
        Hide
        Martijn van Groningen added a comment -

        +1! Good idea. Maybe we can move FunctionValues and ValueSource from the queries modules to core? Then grouping module doesn't have to depend on the queries module.

        Show
        Martijn van Groningen added a comment - +1! Good idea. Maybe we can move FunctionValues and ValueSource from the queries modules to core? Then grouping module doesn't have to depend on the queries module.
        Hide
        Robert Muir added a comment -

        Maybe we can move FunctionValues and ValueSource from the queries modules to core? Then grouping module doesn't have to depend on the queries module.

        +1 (I didn't even think of that or investigate it yet though, but at a glance it looks like the right thing to do).

        Show
        Robert Muir added a comment - Maybe we can move FunctionValues and ValueSource from the queries modules to core? Then grouping module doesn't have to depend on the queries module. +1 (I didn't even think of that or investigate it yet though, but at a glance it looks like the right thing to do).
        Hide
        Yonik Seeley added a comment -

        I think that if you try to make no modules depend on other modules, you'll end up just pulling pretty much everything into core.

        Also, the function query stuff is supposed to be marked as experimental - the notice only got added to FunctionQuery (I think?), so it should be applied to FunctionValues and ValueSource if they are moved to core.

        Show
        Yonik Seeley added a comment - I think that if you try to make no modules depend on other modules, you'll end up just pulling pretty much everything into core. Also, the function query stuff is supposed to be marked as experimental - the notice only got added to FunctionQuery (I think?), so it should be applied to FunctionValues and ValueSource if they are moved to core.
        Hide
        Robert Muir added a comment -

        I think that if you try to make no modules depend on other modules, you'll end up just pulling pretty much everything into core.

        I don't think we should pull everything into core, but if we pull in the simple abstract APIs we can
        have a more pluggable API: just like the abstract Analyzer api is in core, which Highlighter uses,
        but you can highlight UIMA or Japanese or ICU or whatever analyzers this way...

        Show
        Robert Muir added a comment - I think that if you try to make no modules depend on other modules, you'll end up just pulling pretty much everything into core. I don't think we should pull everything into core, but if we pull in the simple abstract APIs we can have a more pluggable API: just like the abstract Analyzer api is in core, which Highlighter uses, but you can highlight UIMA or Japanese or ICU or whatever analyzers this way...
        Hide
        Martijn van Groningen added a comment -

        I also think we can move these classes to core. These are small classes and we can mark these classes as experimental.

        Maybe we can even make this classes 'lighter' by only moving the public methods to core (maybe as interface?). E.g. ValueSource would have all the public methods in core and a BaseValueSource (Or AbstractValueSource) in the queries module that contains ValueSourceComparatorSource and ValueSourceComparator. Just an idea.

        I'll create a new issue to not make grouping module depend on the queries module.

        Show
        Martijn van Groningen added a comment - I also think we can move these classes to core. These are small classes and we can mark these classes as experimental. Maybe we can even make this classes 'lighter' by only moving the public methods to core (maybe as interface?). E.g. ValueSource would have all the public methods in core and a BaseValueSource (Or AbstractValueSource) in the queries module that contains ValueSourceComparatorSource and ValueSourceComparator. Just an idea. I'll create a new issue to not make grouping module depend on the queries module.
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Chris Male added a comment -

        I do echo Yonik's concern here, I don't think we should prevent inter-module dependencies. I think if we move something to lucene/core, it should be because we think its a core API/concept, not just because its used by multiple modules. Analyzer fits into that category, it belongs in core because it is a core concept.

        Do we feel the same about TopGroups and GroupDocs? I kind of think we do. But we should move them for that reason, not just to remove the dependency.

        Show
        Chris Male added a comment - I do echo Yonik's concern here, I don't think we should prevent inter-module dependencies. I think if we move something to lucene/core, it should be because we think its a core API/concept, not just because its used by multiple modules. Analyzer fits into that category, it belongs in core because it is a core concept. Do we feel the same about TopGroups and GroupDocs? I kind of think we do. But we should move them for that reason, not just to remove the dependency.
        Hide
        Chris Male added a comment -

        Of course, another option is to combine grouping and join into one module, but
        last time i brought that up nobody could agree on a name.

        If that is the better option, lets do that. The name seems less important at this stage, we can call it grouping-join if needs be.

        Show
        Chris Male added a comment - Of course, another option is to combine grouping and join into one module, but last time i brought that up nobody could agree on a name. If that is the better option, lets do that. The name seems less important at this stage, we can call it grouping-join if needs be.
        Hide
        Martijn van Groningen added a comment -

        The reason that joining and grouping are different modules is that these are different functionalities. However these functionalities do overlap a bit with each other. Both joining and grouping can be used for a parent child like search. I'm not sure what would be a good option. Joining does use grouping's TopGroups and GroupDocs... If we are going to have a combined module maybe we should name it relational module or parent child module?

        Show
        Martijn van Groningen added a comment - The reason that joining and grouping are different modules is that these are different functionalities. However these functionalities do overlap a bit with each other. Both joining and grouping can be used for a parent child like search. I'm not sure what would be a good option. Joining does use grouping's TopGroups and GroupDocs... If we are going to have a combined module maybe we should name it relational module or parent child module?
        Hide
        Michael McCandless added a comment -

        I don't think we should combine the two modules.

        While they do share a couple classes (to represent a 'grouped' result), the two functions (joining and grouping) are really orthogonal: you can join w/o doing grouping, and you can group w/o doing joining.

        I think we should move TopGroups/GroupDocs to core.

        Show
        Michael McCandless added a comment - I don't think we should combine the two modules. While they do share a couple classes (to represent a 'grouped' result), the two functions (joining and grouping) are really orthogonal: you can join w/o doing grouping, and you can group w/o doing joining. I think we should move TopGroups/GroupDocs to core.
        Hide
        Chris Male added a comment -

        To me they seem to share a lot of similarities and the fact they both use the 'grouped' result notion is an illustration of that.

        While a group could consist of Documents with any kind of relationship, that kind of a relationship could be parent-child. The nature of the relationship and what the result should consist of (if its a parent-child relationship, should the 'grouped' result be parent and children, just children or just the parent) seem to be what dictates the implementations used.

        I feel that having them as a single module would allow us to build some APIs which focus on user land concepts and perhaps hide some of the implementation details and differences in the joining and grouping algorithms.

        Show
        Chris Male added a comment - To me they seem to share a lot of similarities and the fact they both use the 'grouped' result notion is an illustration of that. While a group could consist of Documents with any kind of relationship, that kind of a relationship could be parent-child. The nature of the relationship and what the result should consist of (if its a parent-child relationship, should the 'grouped' result be parent and children, just children or just the parent) seem to be what dictates the implementations used. I feel that having them as a single module would allow us to build some APIs which focus on user land concepts and perhaps hide some of the implementation details and differences in the joining and grouping algorithms.
        Hide
        Steve Rowe added a comment -

        I propose, instead of using lucene-core as the location for code used by multiple modules, that we create a (single) new module that serves this purpose, something like lucene-shared or lucene-common (though common analyzers already use this name...).

        That way the number of inter-module dependencies is limited, and lucene-core doesn't get roped into the act.

        Show
        Steve Rowe added a comment - I propose, instead of using lucene-core as the location for code used by multiple modules, that we create a (single) new module that serves this purpose, something like lucene-shared or lucene-common (though common analyzers already use this name...). That way the number of inter-module dependencies is limited, and lucene-core doesn't get roped into the act.
        Hide
        Martijn van Groningen added a comment -

        Steven, I think this is a good idea for the reasons you mentioned. I think the new shared module should be named 'parent-child'. This name describes the overlapping functionality both existing modules have.

        Directory layout:

        -- lucene
        |
        |--- parent-child
        |         |
        |         |--- grouping
        |         |
        |         |--- join
        |
        |--- ...
        
        Show
        Martijn van Groningen added a comment - Steven, I think this is a good idea for the reasons you mentioned. I think the new shared module should be named 'parent-child'. This name describes the overlapping functionality both existing modules have. Directory layout: -- lucene | |--- parent-child | | | |--- grouping | | | |--- join | |--- ...
        Hide
        Chris Male added a comment -

        I propose, instead of using lucene-core as the location for code used by multiple modules, that we create a (single) new module that serves this purpose, something like lucene-shared or lucene-common (though common analyzers already use this name...)

        I actually created lucene-common that when I first refactored out the FunctionQuery codebase. After some time it was decided (in an issue I can't remember) that the code would go into lucene-core. I agree with your assessment that we shouldn't use lucene-core as a dumping ground, but we should get a discussion about this going.

        Show
        Chris Male added a comment - I propose, instead of using lucene-core as the location for code used by multiple modules, that we create a (single) new module that serves this purpose, something like lucene-shared or lucene-common (though common analyzers already use this name...) I actually created lucene-common that when I first refactored out the FunctionQuery codebase. After some time it was decided (in an issue I can't remember) that the code would go into lucene-core. I agree with your assessment that we shouldn't use lucene-core as a dumping ground, but we should get a discussion about this going.
        Hide
        Robert Muir added a comment -

        Can we please not add more modules here. I'm against that, this is crazy: its only a few classes in question already. it doesnt need 3 modules...

        the purpose of this issue was to help simplify modules and dependencies, not make it worse.

        Show
        Robert Muir added a comment - Can we please not add more modules here. I'm against that, this is crazy: its only a few classes in question already. it doesnt need 3 modules... the purpose of this issue was to help simplify modules and dependencies, not make it worse.
        Hide
        Michael McCandless added a comment -

        I think moving TopGroups/GroupDocs to core is fine. Pragmatism over purity.

        Show
        Michael McCandless added a comment - I think moving TopGroups/GroupDocs to core is fine. Pragmatism over purity.
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Uwe Schindler added a comment -

        Move issue to Lucene 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Lucene 4.9.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development