Lucene - Core
  1. Lucene - Core
  2. LUCENE-3410

Make WordDelimiterFilter's instantiation more readable

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently WordDelimiterFilter's constructor is:

      public WordDelimiterFilter(TokenStream in,
      	                             byte[] charTypeTable,
      	                             int generateWordParts,
      	                             int generateNumberParts,
      	                             int catenateWords,
      	                             int catenateNumbers,
      	                             int catenateAll,
      	                             int splitOnCaseChange,
      	                             int preserveOriginal,
      	                             int splitOnNumerics,
      	                             int stemEnglishPossessive,
      	                             CharArraySet protWords) {
      

      which means its instantiation is an unreadable combination of 1s and 0s.

      We should improve this by either using a Builder, 'int flags' or an EnumSet.

      1. LUCENE-3410.patch
        25 kB
        Chris Male
      2. LUCENE-3410.patch
        21 kB
        Chris Male
      3. LUCENE-3410.patch
        21 kB
        Chris Male

        Activity

        Chris Male created issue -
        Hide
        Robert Muir added a comment -

        I think flags is a good solution here, its very simple and will improve readability: the backwards compat is obvious too.

        I think its a bit scary to use enumset, it will involve complicated generics and the jdk itself does not seem to use enumset anywhere!
        e.g. Pattern.compile(String regex, int flags)

        I think a builder is overkill here, if someone wants a builder they can always make a builder on top of flags for their own use.

        Show
        Robert Muir added a comment - I think flags is a good solution here, its very simple and will improve readability: the backwards compat is obvious too. I think its a bit scary to use enumset, it will involve complicated generics and the jdk itself does not seem to use enumset anywhere! e.g. Pattern.compile(String regex, int flags) I think a builder is overkill here, if someone wants a builder they can always make a builder on top of flags for their own use.
        Hide
        Yonik Seeley added a comment -

        For historical context, the reason I used an int for stuff like generateWordParts was that I had the idea of using it as a minimum (i.e. only generate word parts that are over a certain size, etc). This obviously never happened though

        Show
        Yonik Seeley added a comment - For historical context, the reason I used an int for stuff like generateWordParts was that I had the idea of using it as a minimum (i.e. only generate word parts that are over a certain size, etc). This obviously never happened though
        Hide
        Uwe Schindler added a comment -

        OK, if those integers are always used only as boolean flags, I would prefer a single (int flags) parameter. No builder pattern needed. I would maybe prefer a long to make it extensibler (but 31 flags should be enough, too).

        Show
        Uwe Schindler added a comment - OK, if those integers are always used only as boolean flags, I would prefer a single (int flags) parameter. No builder pattern needed. I would maybe prefer a long to make it extensibler (but 31 flags should be enough, too).
        Hide
        Chris Male added a comment -

        Patch for WDF & co, converting them over to int flags.

        Old behavior is deprecated but we could do that in 3x and nuke in trunk.

        Show
        Chris Male added a comment - Patch for WDF & co, converting them over to int flags. Old behavior is deprecated but we could do that in 3x and nuke in trunk.
        Chris Male made changes -
        Field Original Value New Value
        Attachment LUCENE-3410.patch [ 12492572 ]
        Hide
        Chris Male added a comment -

        Plan to commit soon if theres no objections.

        Show
        Chris Male added a comment - Plan to commit soon if theres no objections.
        Hide
        Uwe Schindler added a comment -

        +1

        Show
        Uwe Schindler added a comment - +1
        Hide
        Robert Muir added a comment -

        looks good overall, a couple tiny nitpicks:

        • looks like there is some dead code in WordDelimiterIterator (the booleans)
        • should the iterator maybe keep the booleans and not use flags? just an idea, because the iterator doesn't "make use" of all the flags. its also not a public class and just a helper class to simplify the filter, so i think its ok for it to take 3 booleans?
        Show
        Robert Muir added a comment - looks good overall, a couple tiny nitpicks: looks like there is some dead code in WordDelimiterIterator (the booleans) should the iterator maybe keep the booleans and not use flags? just an idea, because the iterator doesn't "make use" of all the flags. its also not a public class and just a helper class to simplify the filter, so i think its ok for it to take 3 booleans?
        Hide
        Chris Male added a comment -

        should the iterator maybe keep the booleans and not use flags? just an idea, because the iterator doesn't "make use" of all the flags. its also not a public class and just a helper class to simplify the filter, so i think its ok for it to take 3 booleans?

        Yeah I thought about this as well. It would make the iterator clearer since it wouldn't rely on people looking at the Filter's flags. I will make the change.

        Show
        Chris Male added a comment - should the iterator maybe keep the booleans and not use flags? just an idea, because the iterator doesn't "make use" of all the flags. its also not a public class and just a helper class to simplify the filter, so i think its ok for it to take 3 booleans? Yeah I thought about this as well. It would make the iterator clearer since it wouldn't rely on people looking at the Filter's flags. I will make the change.
        Hide
        Chris Male added a comment -

        Patch with the Iterator back to using booleans. Going to commit.

        Show
        Chris Male added a comment - Patch with the Iterator back to using booleans. Going to commit.
        Chris Male made changes -
        Attachment LUCENE-3410.patch [ 12493270 ]
        Hide
        Chris Male added a comment -

        Better patch.

        Show
        Chris Male added a comment - Better patch.
        Chris Male made changes -
        Attachment LUCENE-3410.patch [ 12493271 ]
        Hide
        Chris Male added a comment -

        Committed revision 1165995.

        Show
        Chris Male added a comment - Committed revision 1165995.
        Chris Male made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Chris Male [ cmale ]
        Fix Version/s 4.0 [ 12314025 ]
        Resolution Fixed [ 1 ]
        Hide
        Robert Muir added a comment -

        any objections to backporting? we could then remove the deprecation in trunk.

        Show
        Robert Muir added a comment - any objections to backporting? we could then remove the deprecation in trunk.
        Hide
        Chris Male added a comment -

        None whatsoever.

        Show
        Chris Male added a comment - None whatsoever.
        Hide
        Robert Muir added a comment -

        I can take it if you want Chris, unless you want to do the honors?

        Show
        Robert Muir added a comment - I can take it if you want Chris, unless you want to do the honors?
        Hide
        Chris Male added a comment -

        Go for it!

        Show
        Chris Male added a comment - Go for it!
        Hide
        Robert Muir added a comment -

        backported to 3.5

        Show
        Robert Muir added a comment - backported to 3.5
        Robert Muir made changes -
        Fix Version/s 3.5 [ 12317877 ]
        Hide
        Uwe Schindler added a comment -

        Bulk close after release of 3.5

        Show
        Uwe Schindler added a comment - Bulk close after release of 3.5
        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Chris Male
            Reporter:
            Chris Male
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development