Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10323

SpellingQueryConverter does not remove ":" char when using fielded queries

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.2.1
    • Fix Version/s: 6.6, 7.0
    • Component/s: spellchecker
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      If you pass a fielded query to SpellingQueryConverter.convert, it returns a token that is prefixed with a ":" char.

      Example: The query "foo:bar" is converted to ":bar"

      This bug seems to have been introduced by the fix for SOLR-2556. Before this patch, SpellingQueryConverter.convert returned tokens without the leading colon char.

      1. SOLR-10323.patch
        2 kB
        James Dyer
      2. SOLR-10323.patch
        3 kB
        Amrit Sarkar
      3. SOLR-10323.patch
        0.8 kB
        Amrit Sarkar

        Activity

        Hide
        sarkaramrit2@gmail.com Amrit Sarkar added a comment -

        The issue seems to be bigger than that:

        In SpellingQueryConverterTest::testNumeric()

        @Test
          public void testNumeric() throws Exception {
            SpellingQueryConverter converter = new SpellingQueryConverter();
            converter.init(new NamedList());
            converter.setAnalyzer(new WhitespaceAnalyzer());
            String[] queries = {"12345", "foo:12345", "12345 67890", "foo:(12345 67890)", "foo:(life 67890)", "12345 life",
                "+12345 +life", "-12345 life"};
            int[] tokensToExpect = {1, 1, 2, 2, 2, 2, 2, 2};
            for (int i = 0; i < queries.length; i++) {
              Collection<Token> tokens = converter.convert(queries[i]);
              System.out.println("tkns for "+queries[i]+" >> "+tokens);
              assertTrue("tokens Size: " + tokens.size() + " is not: " + tokensToExpect[i], tokens.size() == tokensToExpect[i]);
            }
          }
        
        tkns for 12345 >> [12345]
        tkns for foo:12345 >> [:12345]
        tkns for 12345 67890 >> [12345, 67890]
        tkns for foo:(12345 67890) >> [(12345, 67890]
        tkns for foo:(life 67890) >> [(life, 67890]
        tkns for 12345 life >> [12345, life]
        tkns for +12345 +life >> [+12345, +life]
        tkns for -12345 life >> [-12345, life]
        

        Observe ( coming anywhere, braces are not handled too. Test cases are not hard enough there.

        Show
        sarkaramrit2@gmail.com Amrit Sarkar added a comment - The issue seems to be bigger than that: In SpellingQueryConverterTest::testNumeric() @Test public void testNumeric() throws Exception { SpellingQueryConverter converter = new SpellingQueryConverter(); converter.init( new NamedList()); converter.setAnalyzer( new WhitespaceAnalyzer()); String [] queries = { "12345" , "foo:12345" , "12345 67890" , "foo:(12345 67890)" , "foo:(life 67890)" , "12345 life" , "+12345 +life" , "-12345 life" }; int [] tokensToExpect = {1, 1, 2, 2, 2, 2, 2, 2}; for ( int i = 0; i < queries.length; i++) { Collection<Token> tokens = converter.convert(queries[i]); System .out.println( "tkns for " +queries[i]+ " >> " +tokens); assertTrue( "tokens Size: " + tokens.size() + " is not: " + tokensToExpect[i], tokens.size() == tokensToExpect[i]); } } tkns for 12345 >> [12345] tkns for foo:12345 >> [:12345] tkns for 12345 67890 >> [12345, 67890] tkns for foo:(12345 67890) >> [(12345, 67890] tkns for foo:(life 67890) >> [(life, 67890] tkns for 12345 life >> [12345, life] tkns for +12345 +life >> [+12345, +life] tkns for -12345 life >> [-12345, life] Observe ( coming anywhere, braces are not handled too. Test cases are not hard enough there.
        Hide
        sarkaramrit2@gmail.com Amrit Sarkar added a comment - - edited

        SOLR-10323.patch uploaded which will ignore ":" and "(" throughout the original term. This fixes our problem but an overhead comes around as the SpellingQueryConverter will break if the original term is like:

        foo:bar(random) ===> [bar, random], in current code, this happens already
        foo:bar:bar ===> [bar, bar], in current code, this happens already

        and there is no way to escape it like:
        foo:bar(random)
        foo:bar\:bar

        But this will work perfectly with other special characters like underscores, highen and alphanumerics.

        I will add relevant test cases very soon if we have +1 with the stated compromise.

        Show
        sarkaramrit2@gmail.com Amrit Sarkar added a comment - - edited SOLR-10323 .patch uploaded which will ignore ":" and "(" throughout the original term. This fixes our problem but an overhead comes around as the SpellingQueryConverter will break if the original term is like: foo:bar(random) ===> [bar, random] , in current code, this happens already foo:bar:bar ===> [bar, bar] , in current code, this happens already and there is no way to escape it like: foo:bar(random) foo:bar\:bar But this will work perfectly with other special characters like underscores, highen and alphanumerics. I will add relevant test cases very soon if we have +1 with the stated compromise.
        Hide
        sarkaramrit2@gmail.com Amrit Sarkar added a comment -

        changed the QUERY_REGEX:

        (?:(?!(([A-Z_a-z\xc0-\xd6\xd8-\xf6\xf8-\u02ff\u0370-\u037d\u037f-\u1fff\u200c-\u200d\u2070-\u218f\u2c00-\u2fef\u2001-\ud7ff\uf900-\ufdcf\ufdf0-\ufffd\-.0-9\xb7\u0300-\u036f\u203f-\u2040]|\p{Cs}{2})+:|[\^.:(]\d+)))[^^.\s][\p{L}_\-0-9]+
        
        .......:|[\^.:(]\d+)))........
        
        Show
        sarkaramrit2@gmail.com Amrit Sarkar added a comment - changed the QUERY_REGEX: (?:(?!(([A-Z_a-z\xc0-\xd6\xd8-\xf6\xf8-\u02ff\u0370-\u037d\u037f-\u1fff\u200c-\u200d\u2070-\u218f\u2c00-\u2fef\u2001-\ud7ff\uf900-\ufdcf\ufdf0-\ufffd\-.0-9\xb7\u0300-\u036f\u203f-\u2040]|\p{Cs}{2})+:|[\^.:(]\d+)))[^^.\s][\p{L}_\-0-9]+ .......:|[\^.:(]\d+)))........
        Hide
        Mikep86 Michael Pellegrini added a comment -

        This compromise works for my needs

        Show
        Mikep86 Michael Pellegrini added a comment - This compromise works for my needs
        Hide
        sarkaramrit2@gmail.com Amrit Sarkar added a comment -

        Cool, I will add the test cases then to the current patch and would request a committer to have a look, suggest any improvements and ultimately commit that. This is a legit bug.

        Show
        sarkaramrit2@gmail.com Amrit Sarkar added a comment - Cool, I will add the test cases then to the current patch and would request a committer to have a look, suggest any improvements and ultimately commit that. This is a legit bug.
        Hide
        sarkaramrit2@gmail.com Amrit Sarkar added a comment -

        SOLR-10323.patch uploaded with light but precise test cases.

        	modified:   solr/core/src/java/org/apache/solr/spelling/SpellingQueryConverter.java
        	modified:   solr/core/src/test/org/apache/solr/spelling/SpellingQueryConverterTest.java
        

        Requesting someone to review the patch.

        Show
        sarkaramrit2@gmail.com Amrit Sarkar added a comment - SOLR-10323 .patch uploaded with light but precise test cases. modified: solr/core/src/java/org/apache/solr/spelling/SpellingQueryConverter.java modified: solr/core/src/test/org/apache/solr/spelling/SpellingQueryConverterTest.java Requesting someone to review the patch.
        Hide
        Mikep86 Michael Pellegrini added a comment -

        James Dyer can you review this patch?

        Show
        Mikep86 Michael Pellegrini added a comment - James Dyer can you review this patch?
        Hide
        jdyer James Dyer added a comment -

        This patch makes the test a little simpler. Straightforward fix. I'll commit this shortly.

        Show
        jdyer James Dyer added a comment - This patch makes the test a little simpler. Straightforward fix. I'll commit this shortly.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit e75a2e6b86bdac5ee86a4c3fa9d2f5e33bfc42d1 in lucene-solr's branch refs/heads/master from jdyer1
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e75a2e6 ]

        SOLR-10323: fix to SpellingQueryConverter to properly strip out colons in field-specific queries

        Show
        jira-bot ASF subversion and git services added a comment - Commit e75a2e6b86bdac5ee86a4c3fa9d2f5e33bfc42d1 in lucene-solr's branch refs/heads/master from jdyer1 [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e75a2e6 ] SOLR-10323 : fix to SpellingQueryConverter to properly strip out colons in field-specific queries
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit d5de738ed4ad561d0e7cc058ac3aa10a814be2f0 in lucene-solr's branch refs/heads/branch_6x from jdyer1
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d5de738 ]

        SOLR-10323: fix to SpellingQueryConverter to properly strip out colons in field-specific queries

        Show
        jira-bot ASF subversion and git services added a comment - Commit d5de738ed4ad561d0e7cc058ac3aa10a814be2f0 in lucene-solr's branch refs/heads/branch_6x from jdyer1 [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d5de738 ] SOLR-10323 : fix to SpellingQueryConverter to properly strip out colons in field-specific queries
        Hide
        jdyer James Dyer added a comment -

        Thank you Amrit for the patch and Michael for reporting this issue.

        Show
        jdyer James Dyer added a comment - Thank you Amrit for the patch and Michael for reporting this issue.

          People

          • Assignee:
            jdyer James Dyer
            Reporter:
            Mikep86 Michael Pellegrini
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development