Lucene - Core
  1. Lucene - Core
  2. LUCENE-950

IllegalArgumentException parsing "foo~1"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Not a Problem
    • Affects Version/s: 2.1, 2.2
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/queryparser
    • Labels:
      None
    • Environment:

      Java 1.5

    • Lucene Fields:
      New

      Description

      If I run this:

      QueryParser parser = new QueryParser("myField", new SimpleAnalyzer());
      try

      { parser.parse("foo~1"); }

      catch (ParseException e)

      { // OK }

      I get this:

      Exception in thread "main" java.lang.IllegalArgumentException: minimumSimilarity >= 1
      at org.apache.lucene.search.FuzzyQuery.<init>(FuzzyQuery.java:58)
      at org.apache.lucene.queryParser.QueryParser.getFuzzyQuery(QueryParser.java:711)
      at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1090)
      at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:979)
      at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:907)
      at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:896)
      at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:146)

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          I'd like to resolve this issue.

          In LUCENE-2667 we changed the syntax for 4.0 such that foo~1 and foo~2 specify a fuzzyquery with edit distance of 1 or 2, respectively.
          The old syntax of ~0.5 is still accepted, but the default is now ~2 so that we never scan the entire term dictionary by default.
          The new syntax is no problem, since you only got IllegalArgumentException for these things before

          Its also pretty handy to be able to do a search and ask for '1 or 2 characters off' without playing games with floats.
          see http://stackoverflow.com/questions/2073839/search-lucene-with-precise-edit-distances
          For example, we now have a spellchecker (LUCENE-2507) that uses this functionality in just this way.

          So I think we can resolve this issue as fixed since its obselete.

          Show
          Robert Muir added a comment - I'd like to resolve this issue. In LUCENE-2667 we changed the syntax for 4.0 such that foo~1 and foo~2 specify a fuzzyquery with edit distance of 1 or 2, respectively. The old syntax of ~0.5 is still accepted, but the default is now ~2 so that we never scan the entire term dictionary by default. The new syntax is no problem, since you only got IllegalArgumentException for these things before Its also pretty handy to be able to do a search and ask for '1 or 2 characters off' without playing games with floats. see http://stackoverflow.com/questions/2073839/search-lucene-with-precise-edit-distances For example, we now have a spellchecker ( LUCENE-2507 ) that uses this functionality in just this way. So I think we can resolve this issue as fixed since its obselete.
          Hide
          Adriano Crestani added a comment -

          This patch fixes the bug, it no longer throws IllegalArgumentException when the user enters fuzzy queries with similarity greater or equals 1, instead, it converts the FuzzyQuery into a simple TermQuery, ignoring the fuzzy value.

          Show
          Adriano Crestani added a comment - This patch fixes the bug, it no longer throws IllegalArgumentException when the user enters fuzzy queries with similarity greater or equals 1, instead, it converts the FuzzyQuery into a simple TermQuery, ignoring the fuzzy value.
          Hide
          Luis Alves added a comment -

          I can fix this in the new queryparser implementation.

          ... maybe foo~(>=1) should really just map to foo. I think you hit the nail on the hit though, in that it seems silly to throw the illegal arg exception in either case.

          I also agree, I'm adding a comment to LUCENE-1823, to fix this in the implementation.

          Show
          Luis Alves added a comment - I can fix this in the new queryparser implementation. ... maybe foo~(>=1) should really just map to foo. I think you hit the nail on the hit though, in that it seems silly to throw the illegal arg exception in either case. I also agree, I'm adding a comment to LUCENE-1823 , to fix this in the implementation.
          Hide
          Eleanor Joslin added a comment -

          Also note that if the number in the query string is higher than 1 the
          same operation throws a ParseException instead.

          org.apache.lucene.queryParser.ParseException: Cannot parse 'foo~1.01':
          Minimum similarity for a FuzzyQuery has to be between 0.0f and 1.0f !
          at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:150)


          Eleanor Joslin, Software Development DecisionSoft Ltd.
          Telephone: +44-1865-203192 http://www.decisionsoft.com

          Show
          Eleanor Joslin added a comment - Also note that if the number in the query string is higher than 1 the same operation throws a ParseException instead. org.apache.lucene.queryParser.ParseException: Cannot parse 'foo~1.01': Minimum similarity for a FuzzyQuery has to be between 0.0f and 1.0f ! at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:150) – Eleanor Joslin, Software Development DecisionSoft Ltd. Telephone: +44-1865-203192 http://www.decisionsoft.com
          Hide
          Mark Miller added a comment -

          I agree Eleanor - someone else made a good point of maybe foo~(>=1) should really just map to foo. I think you hit the nail on the hit though, in that it seems silly to throw the illegal arg exception in either case.

          Show
          Mark Miller added a comment - I agree Eleanor - someone else made a good point of maybe foo~(>=1) should really just map to foo. I think you hit the nail on the hit though, in that it seems silly to throw the illegal arg exception in either case.
          Hide
          Eleanor Joslin added a comment -

          Perhaps all that needs doing is to note on the javadoc of
          QueryParser.parse(String) that it can throw IllegalArgumentException as
          well as ParseException, so that consuming code can catch it. An
          application shouldn't blow up just because a user types something silly
          in a search field.


          Eleanor Joslin, Software Development DecisionSoft Ltd.
          Telephone: +44-1865-203192 http://www.decisionsoft.com

          Show
          Eleanor Joslin added a comment - Perhaps all that needs doing is to note on the javadoc of QueryParser.parse(String) that it can throw IllegalArgumentException as well as ParseException, so that consuming code can catch it. An application shouldn't blow up just because a user types something silly in a search field. – Eleanor Joslin, Software Development DecisionSoft Ltd. Telephone: +44-1865-203192 http://www.decisionsoft.com
          Hide
          Mark Miller added a comment -

          Do we want to make the wording for this more clear on the query syntax page? To me, saying 'between' two numbers does not include the end numbers. If its between the couch and the chair, your not going to find it on either. Maybe that example is counter to my point...<g>

          If nobody else is confused though, I say we resolve this issue. Between 0 and 1 sounds clear enough to me.

          Show
          Mark Miller added a comment - Do we want to make the wording for this more clear on the query syntax page? To me, saying 'between' two numbers does not include the end numbers. If its between the couch and the chair, your not going to find it on either. Maybe that example is counter to my point...<g> If nobody else is confused though, I say we resolve this issue. Between 0 and 1 sounds clear enough to me.
          Hide
          Grant Ingersoll added a comment -

          Hmmm, this isn't really an error (see http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/FuzzyQuery.html) but I can also see that the QueryParser Syntax (http://lucene.apache.org/java/docs/queryparsersyntax.html) doesn't explicitly state that 1 is excluded, even if the Javadocs do. It says:
          "Starting with Lucene 1.9 an additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched."

          I could see that a patch saying something like "The value is between 0 and 1 (but not including 1), with a value closer ..." would be appropriate.

          So, I will leave this open for now, even though I feel the QueryParser and FuzzyQuery are operating correctly.

          Show
          Grant Ingersoll added a comment - Hmmm, this isn't really an error (see http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/FuzzyQuery.html ) but I can also see that the QueryParser Syntax ( http://lucene.apache.org/java/docs/queryparsersyntax.html ) doesn't explicitly state that 1 is excluded, even if the Javadocs do. It says: "Starting with Lucene 1.9 an additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched." I could see that a patch saying something like "The value is between 0 and 1 (but not including 1), with a value closer ..." would be appropriate. So, I will leave this open for now, even though I feel the QueryParser and FuzzyQuery are operating correctly.

            People

            • Assignee:
              Unassigned
              Reporter:
              Eleanor Joslin
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development