[SOLR-1196] Incorrect matches when using non alphanumeric search string !@#$%\^\&\* - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Reopened
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.3
Fix Version/s: None
Component/s: None
Labels:
None
Environment:

Solr 1.3/ Java 1.6/ Win XP/Eclipse 3.3

Description

When matching strings that do not include alphanumeric chars, all the data is returned as matches. (There is actually no match, so nothing should be returned.)

When I run a query like - (activity_type:NAME) AND title!@#$%^&*()) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped).

My document structure is as follows

<doc>
<str name="activity_type">NAME</str>
<str name="title">Bathing</str>
....
</doc>

The title field is of type text_title which is described below.

</analyzer>
</fieldType>

-----------------------------------------------------
Yonik's analysis as follows.

<str name="rawquerystring">-features:foo features!@#$%^&*())</str>
<str name="querystring">-features:foo features!@#$%^&*())</str>
<str name="parsedquery">-features:foo</str>
<str name="parsedquery_toString">-features:foo</str>

The text analysis is throwing away non alphanumeric chars (probably
the WordDelimiterFilter). The Lucene (and Solr) query parser throws
away term queries when the token is zero length (after analysis).
Solr then interprets the left over "-features:foo" as "all documents
not containing foo in the features field", so you get a bunch of
matches.

As per his suggestion, a bug is filed.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Sam Michael

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Jun/09 15:26

Updated:: 30/Nov/13 14:10