Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-133

[PATCH] QueryParser assumes getPositionIncrement() == 1

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/queryparser
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: PC

    • Bugzilla Id:
      23307

      Description

      I've written an analyzer that can output several tokens when just one is input.
      Say : "language" is analyzed as "C", "C++", "Java".

      As stated by the docs, the first token (i.e. "C") is given a PositionIncrement
      of 1 while the other ones have a PositionIncrement of 0. All share the same
      positions as well.

      When parsed by the QueryParser, the query :
      language

      ...is interpreted as the PhraseQuery :
      C C++ Java

      ...which is obviously not what I want.

      I think the condition that triggers a PhraseQuery (vector's size > 1) is
      over-simplistic. My tokens should feed a BooleanQuery with 3 clauses :
      C || C++ || Java

      However, if I input a 2 tokens query, I surely want (at least) a PhraseQuery.

      Say now that "OS" is analyzed as "Windows", "Unix", "MacOS" (with
      PositionIncrements set to 1-0-0 and same positions).

      The query "language OS" should be parsed as :
      "C Windows" || "C++ Windows" || "Java Windows" || C Unix" || "C++ Unix"

      "Java Unix" "C MacOS" "C++ MacOS" "Java MacOS".

      Well... there may be a better optimization for that but in any case, I think
      that QueryParser.getFieldQuery(String field, Analyzer analyzer, String
      queryText) can not afford to lose the Tokens.getPositionIncrement as it
      acutally does.

      p.b.

        Activity

        Hide
        pierrick.brihaye@free.fr Pierrick Brihaye added a comment -

        Hi,

        Following the current discussion on a relatively close topic, I provide you my
        own dirty yet functional solution.

        Sorry, I can't give you a patch that would be applicable to Lucene's
        QueryParser.jj because my QueryParser has some extra functionalities.

        Hope this help but I'm very satisfied with it in the context of an arabic
        analysis.

        p.b.

        Show
        pierrick.brihaye@free.fr Pierrick Brihaye added a comment - Hi, Following the current discussion on a relatively close topic, I provide you my own dirty yet functional solution. Sorry, I can't give you a patch that would be applicable to Lucene's QueryParser.jj because my QueryParser has some extra functionalities. Hope this help but I'm very satisfied with it in the context of an arabic analysis. p.b.
        Hide
        pierrick.brihaye@free.fr Pierrick Brihaye added a comment -

        Created an attachment (id=11358)
        A QueryParser that can deal with positionIncrement == 0 (+ extra stuff)

        Show
        pierrick.brihaye@free.fr Pierrick Brihaye added a comment - Created an attachment (id=11358) A QueryParser that can deal with positionIncrement == 0 (+ extra stuff)
        Hide
        daniel.naber@t-online.de Daniel Naber added a comment -

        Created an attachment (id=13397)
        simplified patch

        Show
        daniel.naber@t-online.de Daniel Naber added a comment - Created an attachment (id=13397) simplified patch
        Hide
        daniel.naber@t-online.de Daniel Naber added a comment -

        Created an attachment (id=13398)
        test cases for my patch

        Show
        daniel.naber@t-online.de Daniel Naber added a comment - Created an attachment (id=13398) test cases for my patch
        Hide
        daniel.naber@t-online.de Daniel Naber added a comment -

        I've updated the patch to use MultiPhraseQuery, this simplifies the code and
        it could improve performance. I'll soon commit this to CVS unless someone
        finds a problem with it (and as I'm not so familiar with QueryParser it would
        really be great if someone can check the patch).

        Show
        daniel.naber@t-online.de Daniel Naber added a comment - I've updated the patch to use MultiPhraseQuery, this simplifies the code and it could improve performance. I'll soon commit this to CVS unless someone finds a problem with it (and as I'm not so familiar with QueryParser it would really be great if someone can check the patch).
        Hide
        daniel.naber@t-online.de Daniel Naber added a comment -

        This patch has now been committed to CVS.

        Show
        daniel.naber@t-online.de Daniel Naber added a comment - This patch has now been committed to CVS.

          People

          • Assignee:
            java-dev@lucene.apache.org Lucene Developers
            Reporter:
            pierrick.brihaye@free.fr Pierrick Brihaye
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development