Lucene - Core
  1. Lucene - Core
  2. LUCENE-3271

Move 'good' contrib/queries classes to Queries module

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      With the Queries module now filled with the FunctionQuery stuff, we should look at closing down contrib/queries. While not a huge contrib, it contains a number of pretty useful classes and some that should go elsewhere.

      Heres my proposed plan:

      • similar.* -> suggest module
      • regex.* -> queries module
      • BooleanFilter -> queries module under .filters package
      • BoostingQuery -> queries module
      • ChainedFilter -> queries module under .filters package
      • DuplicateFilter -> queries module under .filters package
      • FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module. I think we should push it to contrib/misc for the time being. It seems to have quite a few constraints on when its useful. If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont see a purpose for it.
      • FilterClause -> class inside BooleanFilter
      • FuzzyLikeThisQuery -> suggest module. This class seems a mess with its Similarity hardcoded. With all that said, it does seem to do what it claims and with some cleanup, it could be good.
      • TermsFilter -> queries module under .filters package
      • SlowCollated* -> They can stay in the module till we have a better place to nuke them.

      One of the implications of the above moves, is that the xml-query-parser, which supports many of the queries, will need to have a dependency on the queries module. But that seems unavoidable at this stage.

      1. LUCENE-3271-MLT.patch
        10 kB
        Chris Male
      2. LUCENE-3271-MLT.patch
        10 kB
        Chris Male
      3. LUCENE-3271-regex.patch
        11 kB
        Chris Male
      4. LUCENE-3271-good-queries.patch
        16 kB
        Chris Male
      5. LUCENE-3271-good-queries.patch
        16 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Thanks for working to clean this up! a few of my comments:

        similar.* -> suggest module

        Seems a little funky? I guess if we had a query-expansion module, I would think it belonged there.

        FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module. I think we should push it to contrib/misc for the time being. It seems to have quite a few constraints on when its useful. If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont see a purpose for it.

        My vote would actually be to move this to src/test! Yeah there are some scenarios where this thing could be faster, but really I thought it was just a good way to add seek to the doctermsindex termsenum. I do think it and its test would be a nice addition to src/test, if someone wants to use it the can always snag it from there... its that expert.

        FuzzyLikeThisQuery -> suggest module. This class seems a mess with its Similarity hardcoded. With all that said, it does seem to do what it claims and with some cleanup, it could be good.

        I actually made some comments about this query in the flexscoring branch: http://svn.apache.org/repos/asf/lucene/dev/branches/flexscoring/lucene/contrib/queries/src/java/org/apache/lucene/search/FuzzyLikeThisQuery.java

        In my opinion, as a rewrite method (i think it would require 2, one for the variant that ignores TF), we could get better performance out of this with cleaner code... in other words you would just use ordinary FuzzyQuery and set this rewrite method for its scoring heuristic, or a BQ of FuzzyQueries if you are doing the expansion thing

        SlowCollated* -> They can stay in the module till we have a better place to nuke them.

        This is all totally deprecated code that seems nice in contrib to reflect the fact that its not scalable... I don't think its a huge issue where it goes due to the fact its deprecate and the name

        Finally, I wanted to say that its my opinion that we shouldn't put garbage in modules. Modules should be treated like core I think.... yet at the same time I totally support efforts to cleanup contrib, either removing sandy stuff or refactoring it where it belongs in a module.

        One option could to create a sandbox directory either under lucene (it contains src/java and src/test but is totally an unorganized sandbox), or itself as a contrib temporarily (contrib/sandbox) and take a look at contrib and move stuff thats good into modules, but toss all the 'odd things' into this sandbox.

        Show
        Robert Muir added a comment - Thanks for working to clean this up! a few of my comments: similar.* -> suggest module Seems a little funky? I guess if we had a query-expansion module, I would think it belonged there. FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module. I think we should push it to contrib/misc for the time being. It seems to have quite a few constraints on when its useful. If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont see a purpose for it. My vote would actually be to move this to src/test! Yeah there are some scenarios where this thing could be faster, but really I thought it was just a good way to add seek to the doctermsindex termsenum. I do think it and its test would be a nice addition to src/test, if someone wants to use it the can always snag it from there... its that expert. FuzzyLikeThisQuery -> suggest module. This class seems a mess with its Similarity hardcoded. With all that said, it does seem to do what it claims and with some cleanup, it could be good. I actually made some comments about this query in the flexscoring branch: http://svn.apache.org/repos/asf/lucene/dev/branches/flexscoring/lucene/contrib/queries/src/java/org/apache/lucene/search/FuzzyLikeThisQuery.java In my opinion, as a rewrite method (i think it would require 2, one for the variant that ignores TF), we could get better performance out of this with cleaner code... in other words you would just use ordinary FuzzyQuery and set this rewrite method for its scoring heuristic, or a BQ of FuzzyQueries if you are doing the expansion thing SlowCollated* -> They can stay in the module till we have a better place to nuke them. This is all totally deprecated code that seems nice in contrib to reflect the fact that its not scalable... I don't think its a huge issue where it goes due to the fact its deprecate and the name Finally, I wanted to say that its my opinion that we shouldn't put garbage in modules. Modules should be treated like core I think.... yet at the same time I totally support efforts to cleanup contrib, either removing sandy stuff or refactoring it where it belongs in a module. One option could to create a sandbox directory either under lucene (it contains src/java and src/test but is totally an unorganized sandbox), or itself as a contrib temporarily (contrib/sandbox) and take a look at contrib and move stuff thats good into modules, but toss all the 'odd things' into this sandbox.
        Hide
        Chris Male added a comment - - edited

        similar.* -> suggest module

        Seems a little funky? I guess if we had a query-expansion module, I would think it belonged there.

        MoreLikeThis makes suggestions But okay. Do you have other thoughts for what could go into a query-expansion module? If so, then I'll go with it. I just know that MLT doesn't belong with the queries anymore.

        FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module. I think we should push it to contrib/misc for the time being. It seems to have quite a few constraints on when its useful. If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont see a purpose for it.

        My vote would actually be to move this to src/test! Yeah there are some scenarios where this thing could be faster, but really I thought it was just a good way to add seek to the doctermsindex termsenum. I do think it and its test would be a nice addition to src/test, if someone wants to use it the can always snag it from there... its that expert.

        src/test it is.

        In my opinion, as a rewrite method (i think it would require 2, one for the variant that ignores TF), we could get better performance out of this with cleaner code... in other words you would just use ordinary FuzzyQuery and set this rewrite method for its scoring heuristic, or a BQ of FuzzyQueries if you are doing the expansion thing

        So what are you suggesting? We could sandbox it for the time being (see my comments about sandbox below).

        Finally, I wanted to say that its my opinion that we shouldn't put garbage in modules. Modules should be treated like core I think.... yet at the same time I totally support efforts to cleanup contrib, either removing sandy stuff or refactoring it where it belongs in a module.

        +1 to all this. I'm going to do a code cleanup on each of the classes to goes into the module. Test coverage will be looked into as well. At this stage I don't think any of the classes I've suggested moving to would be deemed garbage.

        One option could to create a sandbox directory either under lucene (it contains src/java and src/test but is totally an unorganized sandbox), or itself as a contrib temporarily (contrib/sandbox) and take a look at contrib and move stuff thats good into modules, but toss all the 'odd things' into this sandbox.

        I actually really like the idea of a sandbox. I think for simplicity, its best to make it a contrib. That way we can easily get it up and running. It also won't 'stain' anything that isn't already stained.

        As part of this work, I'll push the SlowCollated* stuff to the sandbox, along with FuzzyLikeThis.

        Show
        Chris Male added a comment - - edited similar.* -> suggest module Seems a little funky? I guess if we had a query-expansion module, I would think it belonged there. MoreLikeThis makes suggestions But okay. Do you have other thoughts for what could go into a query-expansion module? If so, then I'll go with it. I just know that MLT doesn't belong with the queries anymore. FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module. I think we should push it to contrib/misc for the time being. It seems to have quite a few constraints on when its useful. If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont see a purpose for it. My vote would actually be to move this to src/test! Yeah there are some scenarios where this thing could be faster, but really I thought it was just a good way to add seek to the doctermsindex termsenum. I do think it and its test would be a nice addition to src/test, if someone wants to use it the can always snag it from there... its that expert. src/test it is. In my opinion, as a rewrite method (i think it would require 2, one for the variant that ignores TF), we could get better performance out of this with cleaner code... in other words you would just use ordinary FuzzyQuery and set this rewrite method for its scoring heuristic, or a BQ of FuzzyQueries if you are doing the expansion thing So what are you suggesting? We could sandbox it for the time being (see my comments about sandbox below). Finally, I wanted to say that its my opinion that we shouldn't put garbage in modules. Modules should be treated like core I think.... yet at the same time I totally support efforts to cleanup contrib, either removing sandy stuff or refactoring it where it belongs in a module. +1 to all this. I'm going to do a code cleanup on each of the classes to goes into the module. Test coverage will be looked into as well. At this stage I don't think any of the classes I've suggested moving to would be deemed garbage. One option could to create a sandbox directory either under lucene (it contains src/java and src/test but is totally an unorganized sandbox), or itself as a contrib temporarily (contrib/sandbox) and take a look at contrib and move stuff thats good into modules, but toss all the 'odd things' into this sandbox. I actually really like the idea of a sandbox. I think for simplicity, its best to make it a contrib. That way we can easily get it up and running. It also won't 'stain' anything that isn't already stained. As part of this work, I'll push the SlowCollated* stuff to the sandbox, along with FuzzyLikeThis.
        Hide
        Chris Male added a comment -

        Patch that moves the contents of similar.* (MoreLikeThis) to .mlt in the queries module.

        Updates dependencies, everything passes.

        Show
        Chris Male added a comment - Patch that moves the contents of similar.* (MoreLikeThis) to .mlt in the queries module. Updates dependencies, everything passes.
        Hide
        Chris Male added a comment -

        Command for MLT patch:

        svn --parents move lucene/contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis*.java modules/queries/src/java/org/apache/lucene/queries/mlt/
        svn --parents move lucene/contrib/queries/src/java/org/apache/lucene/search/similar/package.html modules/queries/src/java/org/apache/lucene/queries/mlt/
        svn --parents move lucene/contrib/queries/src/test/org/apache/lucene/search/similar/TestMoreLikeThis.java modules/queries/src/test/org/apache/lucene/queries/mlt/TestMoreLikeThis.java
        svn delete lucene/contrib/queries/src/java/org/apache/lucene/search/similar/
        
        Show
        Chris Male added a comment - Command for MLT patch: svn --parents move lucene/contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis*.java modules/queries/src/java/org/apache/lucene/queries/mlt/ svn --parents move lucene/contrib/queries/src/java/org/apache/lucene/search/similar/ package .html modules/queries/src/java/org/apache/lucene/queries/mlt/ svn --parents move lucene/contrib/queries/src/test/org/apache/lucene/search/similar/TestMoreLikeThis.java modules/queries/src/test/org/apache/lucene/queries/mlt/TestMoreLikeThis.java svn delete lucene/contrib/queries/src/java/org/apache/lucene/search/similar/
        Hide
        Chris Male added a comment -

        Better patch which fixes frustrating xml-query-parser javadoc issue.

        Committing now.

        Show
        Chris Male added a comment - Better patch which fixes frustrating xml-query-parser javadoc issue. Committing now.
        Hide
        Chris Male added a comment -

        MLT moved Committed revision 1146542.

        Show
        Chris Male added a comment - MLT moved Committed revision 1146542.
        Hide
        Chris Male added a comment -

        Back on the wagon again.

        Patch moves the regex.* to the queries module.

        Command coming up.

        Everything compiles, passes.

        Show
        Chris Male added a comment - Back on the wagon again. Patch moves the regex.* to the queries module. Command coming up. Everything compiles, passes.
        Hide
        Chris Male added a comment -

        Command for regex patch:

        svn move --parents lucene/contrib/queries/src/java/org/apache/lucene/search/regex/* modules/queries/src/java/org/apache/lucene/queries/regex/
        svn move --parents lucene/contrib/queries/src/test/org/apache/lucene/search/regex/* modules/queries/src/test/org/apache/lucene/queries/regex/
        svn move --parents lucene/contrib/queries/lib/* modules/queries/lib/
        svn delete lucene/contrib/queries/src/test/org/apache/lucene/search/regex
        svn delete lucene/contrib/queries/src/java/org/apache/lucene/search/regex
        
        Show
        Chris Male added a comment - Command for regex patch: svn move --parents lucene/contrib/queries/src/java/org/apache/lucene/search/regex/* modules/queries/src/java/org/apache/lucene/queries/regex/ svn move --parents lucene/contrib/queries/src/test/org/apache/lucene/search/regex/* modules/queries/src/test/org/apache/lucene/queries/regex/ svn move --parents lucene/contrib/queries/lib/* modules/queries/lib/ svn delete lucene/contrib/queries/src/test/org/apache/lucene/search/regex svn delete lucene/contrib/queries/src/java/org/apache/lucene/search/regex
        Hide
        Robert Muir added a comment -

        is regex a 'good' query? maybe we should put Slow in front of its name?

        Show
        Robert Muir added a comment - is regex a 'good' query? maybe we should put Slow in front of its name?
        Hide
        Chris Male added a comment -

        Funny you say that, I was just thinking the same thing over night. I might actually change my direction and sandbox it as it really is incredibly inefficient.

        Show
        Chris Male added a comment - Funny you say that, I was just thinking the same thing over night. I might actually change my direction and sandbox it as it really is incredibly inefficient.
        Hide
        Chris Male added a comment -

        I'll leave the regex stuff aside for a bit.

        New patch which moves remainder of good queries to their new homes. Most are moved to the queries module.

        This just leaves the bad eggs in the queries contrib. I'll move them out next.

        Show
        Chris Male added a comment - I'll leave the regex stuff aside for a bit. New patch which moves remainder of good queries to their new homes. Most are moved to the queries module. This just leaves the bad eggs in the queries contrib. I'll move them out next.
        Hide
        Chris Male added a comment -

        Command for patch:

        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/FieldCacheRewriteMethod.java lucene/src/test/org/apache/lucene/search/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/TestFieldCacheRewriteMethod.java lucene/src/test/org/apache/lucene/search/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/BooleanFilter.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/BooleanFilterTest.java modules/queries/src/test/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/FilterClause.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/TermsFilter.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/TermsFilterTest.java modules/queries/src/test/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/BoostingQuery.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/ChainedFilter.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/DuplicateFilter.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/BoostingQueryTest.java modules/queries/src/test/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/ChainedFilterTest.java modules/queries/src/test/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/DuplicateFilterTest.java modules/queries/src/test/org/apache/lucene/queries/
        
        Show
        Chris Male added a comment - Command for patch: svn move lucene/contrib/queries/src/java/org/apache/lucene/search/FieldCacheRewriteMethod.java lucene/src/test/org/apache/lucene/search/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/TestFieldCacheRewriteMethod.java lucene/src/test/org/apache/lucene/search/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/BooleanFilter.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/BooleanFilterTest.java modules/queries/src/test/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/FilterClause.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/TermsFilter.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/TermsFilterTest.java modules/queries/src/test/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/BoostingQuery.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/ChainedFilter.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/DuplicateFilter.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/BoostingQueryTest.java modules/queries/src/test/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/ChainedFilterTest.java modules/queries/src/test/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/DuplicateFilterTest.java modules/queries/src/test/org/apache/lucene/queries/
        Hide
        Robert Muir added a comment -

        In order to fix LUCENE-3378, I need to do part of this patch (move the FieldCacheRewriteMethod tests).

        I'll apply this locally to a separate checkout and then update the patch and instructions (as I figure you are close to committing)

        Show
        Robert Muir added a comment - In order to fix LUCENE-3378 , I need to do part of this patch (move the FieldCacheRewriteMethod tests). I'll apply this locally to a separate checkout and then update the patch and instructions (as I figure you are close to committing)
        Hide
        Chris Male added a comment -

        Okay! I'll be committing this in about 12hrs or so.

        Show
        Chris Male added a comment - Okay! I'll be committing this in about 12hrs or so.
        Hide
        Robert Muir added a comment -

        updated patch... i think it might be identical to the old one actually!

        but here are the revised instructions:

        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/BooleanFilter.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/BooleanFilterTest.java modules/queries/src/test/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/FilterClause.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/TermsFilter.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/TermsFilterTest.java modules/queries/src/test/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/BoostingQuery.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/ChainedFilter.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/java/org/apache/lucene/search/DuplicateFilter.java modules/queries/src/java/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/BoostingQueryTest.java modules/queries/src/test/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/ChainedFilterTest.java modules/queries/src/test/org/apache/lucene/queries/
        svn move lucene/contrib/queries/src/test/org/apache/lucene/search/DuplicateFilterTest.java modules/queries/src/test/org/apache/lucene/queries/
        
        Show
        Robert Muir added a comment - updated patch... i think it might be identical to the old one actually! but here are the revised instructions: svn move lucene/contrib/queries/src/java/org/apache/lucene/search/BooleanFilter.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/BooleanFilterTest.java modules/queries/src/test/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/FilterClause.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/TermsFilter.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/TermsFilterTest.java modules/queries/src/test/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/BoostingQuery.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/ChainedFilter.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/java/org/apache/lucene/search/DuplicateFilter.java modules/queries/src/java/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/BoostingQueryTest.java modules/queries/src/test/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/ChainedFilterTest.java modules/queries/src/test/org/apache/lucene/queries/ svn move lucene/contrib/queries/src/test/org/apache/lucene/search/DuplicateFilterTest.java modules/queries/src/test/org/apache/lucene/queries/
        Hide
        Chris Male added a comment -

        Committed revision 1158997.

        Moving on to deal with the remaining contents.

        Show
        Chris Male added a comment - Committed revision 1158997. Moving on to deal with the remaining contents.
        Hide
        Chris Male added a comment -

        We've move the good queries, I'll deal with the remainder in a separate issue.

        Show
        Chris Male added a comment - We've move the good queries, I'll deal with the remainder in a separate issue.
        Hide
        Robert Muir added a comment -

        I think the DuplicateFilter should go back to sandbox for now, this one does not work correctly for a multi-segment index?

        Show
        Robert Muir added a comment - I think the DuplicateFilter should go back to sandbox for now, this one does not work correctly for a multi-segment index?
        Hide
        Chris Male added a comment -

        Okay, will make a patch.

        Show
        Chris Male added a comment - Okay, will make a patch.
        Hide
        Chris Male added a comment -

        I've added moving DuplicateFilter to LUCENE-3381.

        Show
        Chris Male added a comment - I've added moving DuplicateFilter to LUCENE-3381 .

          People

          • Assignee:
            Chris Male
            Reporter:
            Chris Male
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development