Lucene - Core
  1. Lucene - Core
  2. LUCENE-2117

Fix SnowballAnalyzer casing behavior for Turkish Language

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 4.0-ALPHA
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      LUCENE-2102 added a new TokenFilter to handle Turkish unique casing behavior correctly. We should fix the casing behavior in SnowballAnalyzer too as it supports a TurkishStemmer.

      1. LUCENE-2117.patch
        7 kB
        Robert Muir
      2. LUCENE-2117.patch
        8 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          patch for the bug that:

          • for Turkish language, when Version >= 3.1, use TurkishLowerCaseFilter instead in SnowballAnalyzer
          • Add javadoc note to SnowballFilter noting that it expects lowercased text to work (and in the turkish case, you must use the special filter)
          • add contrib/analyzers dependency to contrib/snowball (perhaps not the best but what is the other option?)
          Show
          Robert Muir added a comment - patch for the bug that: for Turkish language, when Version >= 3.1, use TurkishLowerCaseFilter instead in SnowballAnalyzer Add javadoc note to SnowballFilter noting that it expects lowercased text to work (and in the turkish case, you must use the special filter) add contrib/analyzers dependency to contrib/snowball (perhaps not the best but what is the other option?)
          Hide
          Simon Willnauer added a comment -

          Robert, the patch looks almost good. You should also change the pom.xml.template to reflect the new dependency. I'm still thinking about moving snowball into analyzers as a analyzers/snowball would that make sense?

          Somewhat unrelated but still ugly:

                Class<?> stemClass = Class.forName("org.tartarus.snowball.ext." + name + "Stemmer");
          

          When I look through the patch I see this "name" parameter which is used to load a stemmer per reflection. We should really define a factory interface that creates the stemmer and get rid of the refelction code

          Show
          Simon Willnauer added a comment - Robert, the patch looks almost good. You should also change the pom.xml.template to reflect the new dependency. I'm still thinking about moving snowball into analyzers as a analyzers/snowball would that make sense? Somewhat unrelated but still ugly: Class <?> stemClass = Class .forName( "org.tartarus.snowball.ext." + name + "Stemmer" ); When I look through the patch I see this "name" parameter which is used to load a stemmer per reflection. We should really define a factory interface that creates the stemmer and get rid of the refelction code
          Hide
          Robert Muir added a comment -

          this patch includes update to pom.xml.template

          Show
          Robert Muir added a comment - this patch includes update to pom.xml.template
          Hide
          Robert Muir added a comment -

          I'm still thinking about moving snowball into analyzers as a analyzers/snowball would that make sense?

          we have to do something about the duplication (LUCENE-2055). There i have suggested we upload the snowball stoplists (which are nice) so that we can get rid of some hand-coded java functionality. It is silly to have the exact same Russian stemmer in two different places in contrib, etc.

          then we have open issues like LUCENE-559...

          Show
          Robert Muir added a comment - I'm still thinking about moving snowball into analyzers as a analyzers/snowball would that make sense? we have to do something about the duplication ( LUCENE-2055 ). There i have suggested we upload the snowball stoplists (which are nice) so that we can get rid of some hand-coded java functionality. It is silly to have the exact same Russian stemmer in two different places in contrib, etc. then we have open issues like LUCENE-559 ...
          Hide
          Simon Willnauer added a comment -

          Robert, Patch looks good and all tests pass.
          I plan to commit this later tomorrow if nobody objects.

          Show
          Simon Willnauer added a comment - Robert, Patch looks good and all tests pass. I plan to commit this later tomorrow if nobody objects.
          Hide
          Simon Willnauer added a comment -

          I will commit shortly if nobody objects

          Show
          Simon Willnauer added a comment - I will commit shortly if nobody objects
          Hide
          Simon Willnauer added a comment -

          committed in revision 888787

          thanks robert

          Show
          Simon Willnauer added a comment - committed in revision 888787 thanks robert

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development