Solr
  1. Solr
  2. SOLR-756

Make DisjunctionMaxQueryParser generally useful by supporting all query types.

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This is an enhancement to the DisjunctionMaxQueryParser to work on all the query variants such as wildcard, prefix, and fuzzy queries, and to support working in "AND" scenarios that are not processed by the min-should-match DisMax QParser. This was not in Solr already because DisMax was only used for a very limited syntax that didn't use those features. In my opinion, this makes a more suitable base parser for general use because unlike the Lucene/Solr parser, this one supports multiple default fields whereas other ones (say Yonik's

      {!prefix}

      one for example, can't do dismax). The notion of a single default field is antiquated and a technical under-the-hood detail of Lucene that I think Solr should shield the user from by on-the-fly using a DisMax when multiple fields are used.
      (patch to be attached soon)

        Issue Links

          Activity

          Hide
          David Smiley added a comment -

          The patch file is only to be applied on SolrPluginUtils.java. The patch file might need to be modified to remove the header that indicates the path to this file which is unique to my development system.
          Testing this would be a little tricky because this parser is only used by one QParser and it prohibits doing the type of queries that would exercise these changes. I have tests on my system but they involve other things that I have not contributed (yet, any way).

          Show
          David Smiley added a comment - The patch file is only to be applied on SolrPluginUtils.java. The patch file might need to be modified to remove the header that indicates the path to this file which is unique to my development system. Testing this would be a little tricky because this parser is only used by one QParser and it prohibits doing the type of queries that would exercise these changes. I have tests on my system but they involve other things that I have not contributed (yet, any way).
          Hide
          Mark Miller added a comment -

          I think the whole point of dismax is to support a very limited subset of the query syntax right? I think you want to add a new queryparser not modify dismax...

          Show
          Mark Miller added a comment - I think the whole point of dismax is to support a very limited subset of the query syntax right? I think you want to add a new queryparser not modify dismax...
          Hide
          David Smiley added a comment -

          The DisjunctionMaxQueryParser in Solr does more than limit the user's syntax. It offers the DisjunctionMax capability which is unique to this query parser, it has a boostingQuery, and a boostingFunctions capability too. Furthermore, the DisjunctionMaxQueryParser prescribes a particular way of limiting the user's syntax that I think applications should be able to specify if they so choose.

          Sure, I could "write my own query parser" but that's a pain... why should everyone who wants the DisMax capability have to write their own just because we don't want it to make assumptions about a limited syntax?

          Show
          David Smiley added a comment - The DisjunctionMaxQueryParser in Solr does more than limit the user's syntax. It offers the DisjunctionMax capability which is unique to this query parser, it has a boostingQuery, and a boostingFunctions capability too. Furthermore, the DisjunctionMaxQueryParser prescribes a particular way of limiting the user's syntax that I think applications should be able to specify if they so choose. Sure, I could "write my own query parser" but that's a pain... why should everyone who wants the DisMax capability have to write their own just because we don't want it to make assumptions about a limited syntax?
          Hide
          Mark Miller added a comment -

          I am not suggesting that you maintain your own, but perhaps that a new queryparser would be a better option. I am not positive in my answer, and am only thinking out loud. I got the impression that dismax goes out of its way to escape as much as it can to avoid parse errors and allow copy paste of text type of uses. The functionality you mention appears all to be from components below the dismax parser. In my mind, I like the idea of keeping dismax lean and mean and starting something else to become a kitchen sink - Lucene's queryparser became a bit of a kitchen sink and I think dismax looks to be coded so as to avoid some of the problems that can emerge from that...

          Show
          Mark Miller added a comment - I am not suggesting that you maintain your own, but perhaps that a new queryparser would be a better option. I am not positive in my answer, and am only thinking out loud. I got the impression that dismax goes out of its way to escape as much as it can to avoid parse errors and allow copy paste of text type of uses. The functionality you mention appears all to be from components below the dismax parser. In my mind, I like the idea of keeping dismax lean and mean and starting something else to become a kitchen sink - Lucene's queryparser became a bit of a kitchen sink and I think dismax looks to be coded so as to avoid some of the problems that can emerge from that...
          Hide
          David Smiley added a comment -

          When anyone says "dismax", there is ambiguity because it might mean one of two things:

          • SolrPluginUtils.DisjunctionMaxQueryParser which extends Solr's QueryParser which extends Lucene's QueryParser. In this JIRA issue I've offered a patch to this file. This code has to deal with the foundational capability of tapping into Lucene's DisjunctionMaxQuery so that multiple fields can be a default instead of Lucene's just one. I believe this change isn't controversial compared to others, IMO.
          • The DisMaxQParserPlugin triggered with "qt=dismax" which uses the QueryParser mentioned above and does things like escape user syntax, ,boost various things, min-should-match, etc. Issue SOLR-758 includes my patch to that. I think this plugin is named poorly because dismax is just one thing it does.

          I suspect your comments might be more applicable to SOLR-758 than they are here. Have you actually seen the source I attached? Are you apposed to the changes I make in this issue?

          Show
          David Smiley added a comment - When anyone says "dismax", there is ambiguity because it might mean one of two things: SolrPluginUtils.DisjunctionMaxQueryParser which extends Solr's QueryParser which extends Lucene's QueryParser. In this JIRA issue I've offered a patch to this file. This code has to deal with the foundational capability of tapping into Lucene's DisjunctionMaxQuery so that multiple fields can be a default instead of Lucene's just one. I believe this change isn't controversial compared to others, IMO. The DisMaxQParserPlugin triggered with "qt=dismax" which uses the QueryParser mentioned above and does things like escape user syntax, ,boost various things, min-should-match, etc. Issue SOLR-758 includes my patch to that. I think this plugin is named poorly because dismax is just one thing it does. I suspect your comments might be more applicable to SOLR-758 than they are here. Have you actually seen the source I attached? Are you apposed to the changes I make in this issue?
          Hide
          Mark Miller added a comment -

          Ah, your right, I've gotten a bit lost floating between these issues and the mailing list email that led me here.

          I am not opposed to anything you have done. Just enriching the issue with some discussion <g> My point is only what I have said, and as you say, while mildly connected to this issue, is more leveled at 758.

          I don't really have anything to say against this patch in particular, and I apologize for the confusion. Looking at the code, seems reasonable to me.

          Show
          Mark Miller added a comment - Ah, your right, I've gotten a bit lost floating between these issues and the mailing list email that led me here. I am not opposed to anything you have done. Just enriching the issue with some discussion <g> My point is only what I have said, and as you say, while mildly connected to this issue, is more leveled at 758. I don't really have anything to say against this patch in particular, and I apologize for the confusion. Looking at the code, seems reasonable to me.
          Hide
          Otis Gospodnetic added a comment -

          My colleagues and I have recently hit a similar wall – DisMax not allowing wildcards.
          I think Hoss will be the loudest (not in a negative sense) here, so here are some specific questions:

          1. are we OK with adding wildcard (and other) support to "dismax"?
          2. if yes, which part of "dismax" should that be added to? (David described the 2 meanings of "dismax")

          I've hit this wall a couple of times that I think it's worth thinking about how to get support for non-plain-keywords-only queries somethere in the DisMax area. Like David, I too want to make use of the other functionality that already exists there, such as bf, ps, etc.

          Thanks.

          Show
          Otis Gospodnetic added a comment - My colleagues and I have recently hit a similar wall – DisMax not allowing wildcards. I think Hoss will be the loudest (not in a negative sense) here, so here are some specific questions: are we OK with adding wildcard (and other) support to "dismax"? if yes, which part of "dismax" should that be added to? (David described the 2 meanings of "dismax") I've hit this wall a couple of times that I think it's worth thinking about how to get support for non-plain-keywords-only queries somethere in the DisMax area. Like David, I too want to make use of the other functionality that already exists there, such as bf, ps, etc. Thanks.
          Hide
          Peter Wolanin added a comment -

          We are regularly hitting this wall and users are very frustrated by not being able to use wildcards becuase we wanted the other advantages of the dismax parser.

          Any chance to get some of these changes in 1.4?

          Show
          Peter Wolanin added a comment - We are regularly hitting this wall and users are very frustrated by not being able to use wildcards becuase we wanted the other advantages of the dismax parser. Any chance to get some of these changes in 1.4?
          Hide
          Hoss Man added a comment -

          at this stage in the process for 1.4, i think most committers are focusing on bug fixes and avoiding adding new features ... especially new features that have been added to existing functionality, so they could trigger new bugs in existing use cases.

          FWIW:

          • the patch doesn't seem to have any test cases demonstrating the new functionality, which makes me leary of committing
          • i'm not certain just from skimming the patch, but it seems like this should break behavior for any existing users of DisjunctionMaxQueryParser who are expecting the field aliases to only affect getFieldQuery ... i could be wrong about that though.
          Show
          Hoss Man added a comment - at this stage in the process for 1.4, i think most committers are focusing on bug fixes and avoiding adding new features ... especially new features that have been added to existing functionality, so they could trigger new bugs in existing use cases. FWIW: the patch doesn't seem to have any test cases demonstrating the new functionality, which makes me leary of committing i'm not certain just from skimming the patch, but it seems like this should break behavior for any existing users of DisjunctionMaxQueryParser who are expecting the field aliases to only affect getFieldQuery ... i could be wrong about that though.
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Jan Høydahl added a comment -

          I think this issue can be closed as it duplicates SOLR-1553, not?

          Show
          Jan Høydahl added a comment - I think this issue can be closed as it duplicates SOLR-1553 , not?
          Hide
          David Smiley added a comment -

          Jan, you refer to the Extended Dismax QParser – and the answer is no. I think you intended to comment on SOLR-758. This patch here, as I said in a comment above here https://issues.apache.org/jira/browse/SOLR-756?focusedCommentId=12630223&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12630223 only has to do with a specific improvement to SolrPluginUtils.java, that is an enabler for other improvements to DismaxQParser. According to Hoss, I need to add tests for this issue.

          Show
          David Smiley added a comment - Jan, you refer to the Extended Dismax QParser – and the answer is no. I think you intended to comment on SOLR-758 . This patch here, as I said in a comment above here https://issues.apache.org/jira/browse/SOLR-756?focusedCommentId=12630223&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12630223 only has to do with a specific improvement to SolrPluginUtils.java, that is an enabler for other improvements to DismaxQParser. According to Hoss, I need to add tests for this issue.
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0

            People

            • Assignee:
              Unassigned
              Reporter:
              David Smiley
            • Votes:
              12 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development