[SOLR-89] new TokenFilters for whitespace trimming and pattern replacing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2
Component/s: None
Labels:
None

Description

(note: lumping these in a single issue since i did them both at the same time)

More then one person has asekd me recently about how they can configure strings which:
a) sort case insensitively
B) ignore leading (and trailing although it's not as big of an issue) whitespace
c ) ignore certain characters anywhere in the string (ie: strip punctuation)

The first can be solved already using the KeywordTokenizer in conjunction with the LowerCaseFilter. I've written a TrimFilter and PatternReplaceFilter to address the later two. (Strictly speaking, TrimFilter isn't needed since you cna make a pattern thta matches leading or trailing whitespace, but for people who are only interested in the whitespace issue, i'm sure String.trim() is more efficient the a regex)

An example of how they can be used...

<fieldtype name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>

<tokenizer class="solr.KeywordTokenizerFactory"/>

<filter class="solr.LowerCaseFilterFactory" />

<filter class="solr.TrimFilterFactory" />
<!-- The PatternReplaceFilter gives you the flexibility to use
Java Regular expression to replace any sequence of characters
matching a pattern with an arbitrary replacement string,
which may include back refrences to portions of the orriginal
string matched by the pattern.

See the Java Regular Expression documentation for more
infomation on pattern and replacement string syntax.

http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html
-->
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z])" replacement="" replace="all"
/>
</analyzer>
</fieldtype>

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

pattern-and-trim-filters.patch
20/Dec/06 22:42
21 kB
Chris M. Hostetter

Activity

People

Assignee:: Chris M. Hostetter

Reporter:: Chris M. Hostetter

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 20/Dec/06 22:41

Updated:: 10/May/13 10:38

Resolved:: 10/Jan/07 01:20