Solr
  1. Solr
  2. SOLR-758

Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.3
    • Fix Version/s: 4.9, 5.0
    • Component/s: search
    • Labels:
      None

      Description

      The DisMaxQParserPlugin has a variety of nice features; chief among them is that is uses the DisjunctionMaxQueryParser. However it imposes limitations on the syntax.

      I've enhanced the DisMax QParser plugin to use a pluggable query string re-writer (via subclass extension) instead of hard-coding the logic currently embedded within it (i.e. the escape nearly everything logic). Additionally, I've made this QParser have a notion of a "simple" syntax (the default) or non-simple in which case some of the logic in this QParser doesn't occur because it's irrelevant (phrase boosting and min-should-max in particular). As part of my work I significantly moved the code around to make it clearer and more extensible. I also chose to rename it to suggest it's role as a parser for user queries.

      Attachment to follow...

      1. UserQParser.java-umlauts.patch
        0.9 kB
        Simon Lachinger
      2. AdvancedQParserPlugin.java
        0.8 kB
        David Smiley
      3. DisMaxQParserPlugin.java
        2 kB
        David Smiley
      4. UserQParser.java
        11 kB
        David Smiley
      5. AdvancedQParserPlugin.java
        0.8 kB
        David Smiley
      6. DisMaxQParserPlugin.java
        2 kB
        David Smiley
      7. UserQParser.java
        11 kB
        David Smiley

        Issue Links

          Activity

          Hide
          David Smiley added a comment -

          I am contributing source files to this issue instead of patches because the code was significantly reworked.
          Note that this patch depends strongly on SOLR-756 and mildly on SOLR-757 which I've contributed separately. They need to be applied for this to compile. Even if you don't get those patches, you can read the source any way to see what it does.

          Show
          David Smiley added a comment - I am contributing source files to this issue instead of patches because the code was significantly reworked. Note that this patch depends strongly on SOLR-756 and mildly on SOLR-757 which I've contributed separately. They need to be applied for this to compile. Even if you don't get those patches, you can read the source any way to see what it does.
          Hide
          David Smiley added a comment -

          Some months ago I upgraded to Solr 1.4 and I made some small changes as part of the port.

          Show
          David Smiley added a comment - Some months ago I upgraded to Solr 1.4 and I made some small changes as part of the port.
          Hide
          Simon Lachinger added a comment -

          First of all thanks for providing wildcard matching for the dismax query handler, that is exactly what I need. However, the WILDCARD_STRIP_CHARS regex in UserQParser.java does not work with umlauts which makes the patch useless for languages like ie. German.

          I will attach a diff file with the changes I have made to get it working with umlauts.

          Show
          Simon Lachinger added a comment - First of all thanks for providing wildcard matching for the dismax query handler, that is exactly what I need. However, the WILDCARD_STRIP_CHARS regex in UserQParser.java does not work with umlauts which makes the patch useless for languages like ie. German. I will attach a diff file with the changes I have made to get it working with umlauts.
          Hide
          Simon Lachinger added a comment -

          Making the UserQParser.java work with umlauts and other special characters.

          Show
          Simon Lachinger added a comment - Making the UserQParser.java work with umlauts and other special characters.
          Hide
          David Smiley added a comment -

          Thanks for the update Simon. I forget you can do things like \w within a regex character class – [...]

          Show
          David Smiley added a comment - Thanks for the update Simon. I forget you can do things like \w within a regex character class – [...]
          Hide
          Otis Gospodnetic added a comment -

          I this still needed with enhanced dismax now available?

          Show
          Otis Gospodnetic added a comment - I this still needed with enhanced dismax now available?
          Hide
          David Smiley added a comment -

          If the use-case is unrestricted Lucene syntax w/ dismax then Enhanced Dismax is the way to go. What I'm shooting for in this issue is a more extensible query parser. E-Dismax is cool but it doesn't look particularly extensible.

          For example, in an app I support, I use this patch to do several things:
          1. check if appears to be using fancy Lucene syntax and if so then treat as such.. but with dismax of course on non-fielded clauses via SOLR-756
          2. If one clause then rewrite query to: clause clause^0.5 – i.e. search for clause and also include partial matches. For a small index I have this is fine but I can use n-gram some day if I need to.
          3. If multiple clauses then rewrite query to: clauseA clauseB clauseC clauseC*^0.5 (clauseC is last clause).

          What I'm hoping for is for Solr to offer better query parsing infrastructure so that I can implement my parsing needs by re-using/plugging into as much as already exists as possible. Committing SOLR-756 is one step there... but then there's some useful capabilty in DismaxQParser like boost queries, boost functions, q.alt. min-should-match is relatively re-usable since it stands alone.

          Show
          David Smiley added a comment - If the use-case is unrestricted Lucene syntax w/ dismax then Enhanced Dismax is the way to go. What I'm shooting for in this issue is a more extensible query parser. E-Dismax is cool but it doesn't look particularly extensible. For example, in an app I support, I use this patch to do several things: 1. check if appears to be using fancy Lucene syntax and if so then treat as such.. but with dismax of course on non-fielded clauses via SOLR-756 2. If one clause then rewrite query to: clause clause ^0.5 – i.e. search for clause and also include partial matches. For a small index I have this is fine but I can use n-gram some day if I need to. 3. If multiple clauses then rewrite query to: clauseA clauseB clauseC clauseC*^0.5 (clauseC is last clause). What I'm hoping for is for Solr to offer better query parsing infrastructure so that I can implement my parsing needs by re-using/plugging into as much as already exists as possible. Committing SOLR-756 is one step there... but then there's some useful capabilty in DismaxQParser like boost queries, boost functions, q.alt. min-should-match is relatively re-usable since it stands alone.
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.

            People

            • Assignee:
              Unassigned
              Reporter:
              David Smiley
            • Votes:
              7 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development