[SOLR-1410] remove deprecated custom encoding support in russian/greek analysis - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Task
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4
Component/s: Schema and Analysis
Labels:
None

Description

In this case, analyzers have strange encoding support and it has been deprecated in lucene.

For example someone using CP1251 in the russian analyzer is simply storing Ж as 0xC6, its being represented as Æ

~~LUCENE-1793~~: Deprecate the custom encoding support in the Greek and Russian
Analyzers. If you need to index text in these encodings, please use Java's
character set conversion facilities (InputStreamReader, etc) during I/O,
so that Lucene can analyze this text as Unicode instead.

I noticed in solr, the factories for these tokenstreams allow these configuration options, which are deprecated in 2.9 to be removed in 3.0

Let me know the policy (how do you deprecate a config option in solr exactly, log a warning, etc?) and I'd be happy to create a patch.

Attachments

SOLR-1410.patch
04/Sep/09 02:44
5 kB
Robert Muir

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Chris M. Hostetter

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 03/Sep/09 21:28

Updated:: 10/Nov/09 15:52

Resolved:: 09/Sep/09 04:15

Agile

View on Board

remove deprecated custom encoding support in russian/greek analysis

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment