Lucene - Core
  1. Lucene - Core
  2. LUCENE-1469

isValid should be invoked after analyze rather than before it so it can validate the output of analyze

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The Synonym map has a protected method String analyze(String word) designed for custom stemming.

      However, before analyze is invoked on a word, boolean isValid(String str) is used to validate the word - which causes the program to discard words that maybe useable by the custom analyze method.

      I think that isValid should be invoked after analyze rather than before it so it can validate the output of analyze and allow implemters to decide what is valid for the overridden analyze method. (In fact, if you look at code snippet below, isValid should really go after the empty string check)

      This is a two line change in org.apache.lucene.index.memory.SynonymMap

      /*

      • Part B: ignore phrases (with spaces and hyphens) and
      • non-alphabetic words, and let user customize word (e.g. do some
      • stemming)
        */
        if (!isValid(word)) continue; // ignore
        word = analyze(word);
        if (word == null || word.length() == 0) continue; // ignore

        Activity

        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1
        Hide
        Shai Erera added a comment -

        Assuming that all that was requested here is change isValid to protected, then that's what I did - made it protected and not static, so it can be overridden.

        Committed revision 1064069 (3x).
        Committed revision 1064072 (trunk).

        Thanks Vincent !

        Show
        Shai Erera added a comment - Assuming that all that was requested here is change isValid to protected, then that's what I did - made it protected and not static, so it can be overridden. Committed revision 1064069 (3x). Committed revision 1064072 (trunk). Thanks Vincent !
        Hide
        Vincent Li added a comment -

        Hi Mark, sorry for the late response, I've been away for awhile. Would glady submit one. Can you point me to some info on how to submit a patch?

        Show
        Vincent Li added a comment - Hi Mark, sorry for the late response, I've been away for awhile. Would glady submit one. Can you point me to some info on how to submit a patch?
        Hide
        Mark Miller added a comment -

        This makes sense to me. Care to submit a patch?

        Show
        Mark Miller added a comment - This makes sense to me. Care to submit a patch?
        Hide
        Vincent Li added a comment -

        On second thought - it might be a better idea to change isValid to a protected method so that it can be overridden as needed.

        Show
        Vincent Li added a comment - On second thought - it might be a better idea to change isValid to a protected method so that it can be overridden as needed.

          People

          • Assignee:
            Unassigned
            Reporter:
            Vincent Li
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 5m
              5m
              Remaining:
              Remaining Estimate - 5m
              5m
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development