Solr
  1. Solr
  2. SOLR-1410

remove deprecated custom encoding support in russian/greek analysis

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      In this case, analyzers have strange encoding support and it has been deprecated in lucene.

      For example someone using CP1251 in the russian analyzer is simply storing Ж as 0xC6, its being represented as Æ

      LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
      Analyzers. If you need to index text in these encodings, please use Java's
      character set conversion facilities (InputStreamReader, etc) during I/O,
      so that Lucene can analyze this text as Unicode instead.

      I noticed in solr, the factories for these tokenstreams allow these configuration options, which are deprecated in 2.9 to be removed in 3.0

      Let me know the policy (how do you deprecate a config option in solr exactly, log a warning, etc?) and I'd be happy to create a patch.

        Activity

        Hide
        Hoss Man added a comment -

        I don't think we've ever really had a situation like this ...logging a warning seems like the right course of action for now ... then once the functionality is removed, we can change the factory to fail on init if it sees the option is still set in the schema.xml

        Show
        Hoss Man added a comment - I don't think we've ever really had a situation like this ...logging a warning seems like the right course of action for now ... then once the functionality is removed, we can change the factory to fail on init if it sees the option is still set in the schema.xml
        Hide
        Robert Muir added a comment -

        thanks, I will work on a patch later that logs a warning if you try to use a configuration option for anything but Unicode for these two analyzers!

        Show
        Robert Muir added a comment - thanks, I will work on a patch later that logs a warning if you try to use a configuration option for anything but Unicode for these two analyzers!
        Hide
        Robert Muir added a comment -

        for russian and greek analysis factories, warn users if they try to use the deprecated charset parameter.

        Show
        Robert Muir added a comment - for russian and greek analysis factories, warn users if they try to use the deprecated charset parameter.
        Hide
        Shalin Shekhar Mangar added a comment -

        I don't think we've ever really had a situation like this ...logging a warning seems like the right course of action for now ...

        We actually have done this in DataImportHandler in relation to the syntax for evaluators. Logging a warning is the right way to go.

        Show
        Shalin Shekhar Mangar added a comment - I don't think we've ever really had a situation like this ...logging a warning seems like the right course of action for now ... We actually have done this in DataImportHandler in relation to the syntax for evaluators. Logging a warning is the right way to go.
        Hide
        Robert Muir added a comment -

        are there any issues with this... care if i set version 1.4?

        really hoping to remove these pseudo-charsets after the lucene 2.9 release

        Show
        Robert Muir added a comment - are there any issues with this... care if i set version 1.4? really hoping to remove these pseudo-charsets after the lucene 2.9 release
        Hide
        Hoss Man added a comment -

        Committed revision 812760.

        thanks robert

        Show
        Hoss Man added a comment - Committed revision 812760. thanks robert
        Hide
        Robert Muir added a comment -

        Hi, I just removed these deprecations for Lucene 3.0 (which does not affect 1.4)

        However, in doing so I noticed that with the custom charset removed, RussianLowerCaseFilter is really exactly the same as LowerCaseFilter.
        I've marked this RussianLowerCaseFilter as deprecated to be removed in Lucene 3.1

        Will there be a Solr release based on Lucene 3.0, or will 1.5 be based on 3.1?

        Show
        Robert Muir added a comment - Hi, I just removed these deprecations for Lucene 3.0 (which does not affect 1.4) However, in doing so I noticed that with the custom charset removed, RussianLowerCaseFilter is really exactly the same as LowerCaseFilter. I've marked this RussianLowerCaseFilter as deprecated to be removed in Lucene 3.1 Will there be a Solr release based on Lucene 3.0, or will 1.5 be based on 3.1?
        Hide
        Shalin Shekhar Mangar added a comment -

        Will there be a Solr release based on Lucene 3.0, or will 1.5 be based on 3.1?

        I guess it is too early to say. But Solr releases do take time so if I had to guess it is likely that 1.5 will go out with Lucene 3.1

        Show
        Shalin Shekhar Mangar added a comment - Will there be a Solr release based on Lucene 3.0, or will 1.5 be based on 3.1? I guess it is too early to say. But Solr releases do take time so if I had to guess it is likely that 1.5 will go out with Lucene 3.1
        Hide
        Robert Muir added a comment -

        ok, I guess anyway this isn't an issue.
        if 1.5 goes out with 3.1, RussianLowerCaseFilterFactory can be implemented with LowerCaseFilter, but marked deprecated to be removed in 1.6

        Show
        Robert Muir added a comment - ok, I guess anyway this isn't an issue. if 1.5 goes out with 3.1, RussianLowerCaseFilterFactory can be implemented with LowerCaseFilter, but marked deprecated to be removed in 1.6
        Hide
        Grant Ingersoll added a comment -

        Bulk close for Solr 1.4

        Show
        Grant Ingersoll added a comment - Bulk close for Solr 1.4

          People

          • Assignee:
            Hoss Man
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development