Solr
  1. Solr
  2. SOLR-2108

ReversedWildcardFilter can create false positives

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Reported from the userlist:

      "For instance, the query *zemog* matches documents that contain Gomez"
      

      http://www.lucidimagination.com/search/document/35abfdabfcec99b7/false_matches_with_reversedwildcardfilterfactory

        Activity

        Hide
        Robert Muir added a comment -

        Simple fix: if we are doing a wildcard query on a reversed field, but we
        are not going to reverse it, we must subtract the set of reversed terms (markerChar*) from the query dfa as these could be false positives.

        I also added a basic test.

        Show
        Robert Muir added a comment - Simple fix: if we are doing a wildcard query on a reversed field, but we are not going to reverse it, we must subtract the set of reversed terms (markerChar*) from the query dfa as these could be false positives. I also added a basic test.
        Hide
        Robert Muir added a comment -

        by the way, I plan to just let this one sit unless we all agree its the right thing to do.

        People using the reversedwildcardfilter can see "false positives"
        from other queries like FuzzyQuery too, because of the reversed terms in the index.

        I think its unreasonable (though possible) to try to ensure that no queries (fuzzy, regex, ...)
        hit false positives from the reversed terms being there.

        On the other hand, it might seem reasonable to fix it just for the Wildcard case, since
        thats why someone used this filter to begin with.

        Show
        Robert Muir added a comment - by the way, I plan to just let this one sit unless we all agree its the right thing to do. People using the reversedwildcardfilter can see "false positives" from other queries like FuzzyQuery too, because of the reversed terms in the index. I think its unreasonable (though possible) to try to ensure that no queries (fuzzy, regex, ...) hit false positives from the reversed terms being there. On the other hand, it might seem reasonable to fix it just for the Wildcard case, since thats why someone used this filter to begin with.
        Hide
        Yonik Seeley added a comment -

        It seems reasonable to me to fix the Wildcard case, regardless of the status of fuzzy & regex.

        Show
        Yonik Seeley added a comment - It seems reasonable to me to fix the Wildcard case, regardless of the status of fuzzy & regex.
        Hide
        Robert Muir added a comment -

        ok, i'd like to commit this to 4.0-only in a few days if no one objects.

        Show
        Robert Muir added a comment - ok, i'd like to commit this to 4.0-only in a few days if no one objects.
        Hide
        Robert Muir added a comment -

        Committed revision 999424.

        Show
        Robert Muir added a comment - Committed revision 999424.

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development