Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2108

ReversedWildcardFilter can create false positives

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Reported from the userlist:

      "For instance, the query *zemog* matches documents that contain Gomez"
      

      http://www.lucidimagination.com/search/document/35abfdabfcec99b7/false_matches_with_reversedwildcardfilterfactory

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        Simple fix: if we are doing a wildcard query on a reversed field, but we
        are not going to reverse it, we must subtract the set of reversed terms (markerChar*) from the query dfa as these could be false positives.

        I also added a basic test.

        Show
        rcmuir Robert Muir added a comment - Simple fix: if we are doing a wildcard query on a reversed field, but we are not going to reverse it, we must subtract the set of reversed terms (markerChar*) from the query dfa as these could be false positives. I also added a basic test.
        Hide
        rcmuir Robert Muir added a comment -

        by the way, I plan to just let this one sit unless we all agree its the right thing to do.

        People using the reversedwildcardfilter can see "false positives"
        from other queries like FuzzyQuery too, because of the reversed terms in the index.

        I think its unreasonable (though possible) to try to ensure that no queries (fuzzy, regex, ...)
        hit false positives from the reversed terms being there.

        On the other hand, it might seem reasonable to fix it just for the Wildcard case, since
        thats why someone used this filter to begin with.

        Show
        rcmuir Robert Muir added a comment - by the way, I plan to just let this one sit unless we all agree its the right thing to do. People using the reversedwildcardfilter can see "false positives" from other queries like FuzzyQuery too, because of the reversed terms in the index. I think its unreasonable (though possible) to try to ensure that no queries (fuzzy, regex, ...) hit false positives from the reversed terms being there. On the other hand, it might seem reasonable to fix it just for the Wildcard case, since thats why someone used this filter to begin with.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        It seems reasonable to me to fix the Wildcard case, regardless of the status of fuzzy & regex.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - It seems reasonable to me to fix the Wildcard case, regardless of the status of fuzzy & regex.
        Hide
        rcmuir Robert Muir added a comment -

        ok, i'd like to commit this to 4.0-only in a few days if no one objects.

        Show
        rcmuir Robert Muir added a comment - ok, i'd like to commit this to 4.0-only in a few days if no one objects.
        Hide
        rcmuir Robert Muir added a comment -

        Committed revision 999424.

        Show
        rcmuir Robert Muir added a comment - Committed revision 999424.

          People

          • Assignee:
            rcmuir Robert Muir
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development