Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1410

remove deprecated custom encoding support in russian/greek analysis

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.4
    • Schema and Analysis
    • None

    Description

      In this case, analyzers have strange encoding support and it has been deprecated in lucene.

      For example someone using CP1251 in the russian analyzer is simply storing Ж as 0xC6, its being represented as Æ

      LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
      Analyzers. If you need to index text in these encodings, please use Java's
      character set conversion facilities (InputStreamReader, etc) during I/O,
      so that Lucene can analyze this text as Unicode instead.

      I noticed in solr, the factories for these tokenstreams allow these configuration options, which are deprecated in 2.9 to be removed in 3.0

      Let me know the policy (how do you deprecate a config option in solr exactly, log a warning, etc?) and I'd be happy to create a patch.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hossman Chris M. Hostetter
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment