Solr
  1. Solr
  2. SOLR-2368

Improve extended dismax (edismax) parser

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: query parsers
    • Labels:

      Description

      This is a "mother" issue to track further improvements for eDismax parser.
      The goal is to be able to deprecate and remove the old dismax once edismax satisfies all usecases of dismax.

        Issue Links

          Activity

          Hide
          Jan Høydahl added a comment -

          I agree with David's comments on SOLR-1553 that edismax is already good enough to replace dismax already, as it is clearly better, more useful and also backward compatible. It may still need some tuning, but not replacing dismax now in 3.1 could be an example of "perfect being the enemy of good"

          In Cominvent, we've been using edismax as the main query parser on all customer projects for several months now, and it is clearly much better than the old dismax, which is not robust enough nor does it allow the syntaxes which people have come to expect.

          We have not seen any bugs or instabilities on either of these sites where it is live: www.dn.no, www.libris.no, http://www.rechargenews.com/search?q=oil+AND+(usa+OR+eu) and many more.

          May I suggest the following for 3.1:

          • defType=dismax is changed to point to Extended DisMax
          • defType=basicdismax is pointed to the old Basic DisMax (to give people a way to revert if needed)
          • defType=edismax is dropped (or added as a temporary alias to dismax)
          • The wiki page http://wiki.apache.org/solr/DisMaxQParserPlugin is edited to reflect the changes, and specific parameters or features which are likely to be changed in future are marked as "experimental, may change" to warn people.
          Show
          Jan Høydahl added a comment - I agree with David's comments on SOLR-1553 that edismax is already good enough to replace dismax already, as it is clearly better, more useful and also backward compatible. It may still need some tuning, but not replacing dismax now in 3.1 could be an example of "perfect being the enemy of good" In Cominvent, we've been using edismax as the main query parser on all customer projects for several months now, and it is clearly much better than the old dismax, which is not robust enough nor does it allow the syntaxes which people have come to expect. We have not seen any bugs or instabilities on either of these sites where it is live: www.dn.no, www.libris.no, http://www.rechargenews.com/search?q=oil+AND+(usa+OR+eu ) and many more. May I suggest the following for 3.1: defType=dismax is changed to point to Extended DisMax defType=basicdismax is pointed to the old Basic DisMax (to give people a way to revert if needed) defType=edismax is dropped (or added as a temporary alias to dismax) The wiki page http://wiki.apache.org/solr/DisMaxQParserPlugin is edited to reflect the changes, and specific parameters or features which are likely to be changed in future are marked as "experimental, may change" to warn people.
          Hide
          Hoss Man added a comment -

          May I suggest the following for 3.1:

          • defType=dismax is changed to point to Extended DisMax

          -1

          beyond the key value of "don't break on malformed input" that using edismax would bring to existing dismax users, edismax's default behavior changes to many things for me to want to recommend it to existing dismax users (or change the default out from under them)

          the code will be there in 3.1, and savy users can use it, and we can fix the bugs and defaults as we move forward.

          Show
          Hoss Man added a comment - May I suggest the following for 3.1: defType=dismax is changed to point to Extended DisMax -1 beyond the key value of "don't break on malformed input" that using edismax would bring to existing dismax users, edismax's default behavior changes to many things for me to want to recommend it to existing dismax users (or change the default out from under them) the code will be there in 3.1, and savy users can use it, and we can fix the bugs and defaults as we move forward.
          Hide
          Jan Høydahl added a comment -

          As much as I believe the known issues will affect only a tiny percentage of existing (or new) dismax users, I have no problem with a more phased approach. Trying to see what's best for the user community.

          On a humorous note, if I was the non-savy user upgrading Solr from 1.4.1 to 3.1, I'd for sure read those release notes carefylly and test it all, given the huge version leap

          It would really help a quicker resolution of this long-running issue, if the current edismax features and params are documented on the Wiki for others to test, and that all known bugs and planned improvements are detailed here or linked to this issue so me and others may know how to contribute.

          Show
          Jan Høydahl added a comment - As much as I believe the known issues will affect only a tiny percentage of existing (or new) dismax users, I have no problem with a more phased approach. Trying to see what's best for the user community. On a humorous note, if I was the non-savy user upgrading Solr from 1.4.1 to 3.1, I'd for sure read those release notes carefylly and test it all, given the huge version leap It would really help a quicker resolution of this long-running issue, if the current edismax features and params are documented on the Wiki for others to test, and that all known bugs and planned improvements are detailed here or linked to this issue so me and others may know how to contribute.
          Hide
          Yonik Seeley added a comment -

          It would really help a quicker resolution of this long-running issue, if the current edismax features and params are documented on the Wiki for others to test

          Yeah... I really wanted to avoid having separate dismax vs edismax docs, and the additional confusion if edismax then later becomes dismax. We should think carefully about this... perhaps it would be less pain and less confusion to do this switch now with this major version number bump.

          Show
          Yonik Seeley added a comment - It would really help a quicker resolution of this long-running issue, if the current edismax features and params are documented on the Wiki for others to test Yeah... I really wanted to avoid having separate dismax vs edismax docs, and the additional confusion if edismax then later becomes dismax. We should think carefully about this... perhaps it would be less pain and less confusion to do this switch now with this major version number bump.
          Hide
          Yonik Seeley added a comment -

          More specifically, we could just think of edismax as the 3.1 version of dismax (regardless of how it was developed). Along with the major version bump, the defaults change - for example fielded queries work by default. It's the type of change in defaults we'd want to make anyway, even if it's slightly less back compatible.

          We should separate features that aren't back compatible, vs features that aren't fully baked yet (which truly should be marked as experimental).

          Show
          Yonik Seeley added a comment - More specifically, we could just think of edismax as the 3.1 version of dismax (regardless of how it was developed). Along with the major version bump, the defaults change - for example fielded queries work by default. It's the type of change in defaults we'd want to make anyway, even if it's slightly less back compatible. We should separate features that aren't back compatible, vs features that aren't fully baked yet (which truly should be marked as experimental).
          Hide
          Hoss Man added a comment -

          for example fielded queries work by default. It's the type of change in defaults we'd want to make anyway,

          on that point alone i'd pretty much completley disagree with you – the spirt of dismax is locking odwn what users can access via the "q" param by putting those controls in other params. i don't want some users who inadvertantly/maliciously guesses one of hte field names in my schema to suddenly have differnet query behavior unless i explicitly enable that feature.

          Show
          Hoss Man added a comment - for example fielded queries work by default. It's the type of change in defaults we'd want to make anyway, on that point alone i'd pretty much completley disagree with you – the spirt of dismax is locking odwn what users can access via the "q" param by putting those controls in other params. i don't want some users who inadvertantly/maliciously guesses one of hte field names in my schema to suddenly have differnet query behavior unless i explicitly enable that feature.
          Hide
          Yonik Seeley added a comment -

          on that point alone i'd pretty much completley disagree with you

          I guess so - I'd rather that complete lucene queries work out of the box (i.e. as much of a superset as we can get).
          It's up to the developer to lock down stuff if they don't want users querying it. Locking down everything by default is just frustrating for new devs.

          But I think that the documentation issue can still be solved by considering edismax to be dismax in trunk - and documenting certain features as 4.0. There can be a little note about being able to access these 4.0 features using the "edismax" parser in 3.x.

          Show
          Yonik Seeley added a comment - on that point alone i'd pretty much completley disagree with you I guess so - I'd rather that complete lucene queries work out of the box (i.e. as much of a superset as we can get). It's up to the developer to lock down stuff if they don't want users querying it. Locking down everything by default is just frustrating for new devs. But I think that the documentation issue can still be solved by considering edismax to be dismax in trunk - and documenting certain features as 4.0. There can be a little note about being able to access these 4.0 features using the "edismax" parser in 3.x.
          Hide
          Jan Høydahl added a comment -

          When I try to think practically on this, satisfying the vast majority of customers, locking down stuff has been a customer requirement in perhaps 2-3 of the almost 100 enterprise installations I've done over the last 11 years. The FAST search engine and its FQL query language by defaults exposes all features and (internal) fields to the end user, and there is no way to lock it down except through custom coding. If it's good enough for the largest enterprise customers out there, I'm sure it's good enough for Solr users. If someone need more locking down, let them contribute that - they won't sue anyone, this is collaboration

          This public news site runs Solr 1.4.2-dev with SOLR-1553:
          http://www.rechargenews.com/search?q=oil

          Can anyone show a practical examples of queries end users can do here to crash/break the system or security in any way? I can construct one: http://www.rechargenews.com/search?q=norway:s+oil in which the first clause triggers the known foo:bar bug but I cannot see any realistic worryable examples.

          Making edismax the default now (with an explicit way to switch to the old) and finalizing baking in 3.2 before summer would be a win win, at least for those of my customers who are planning to release on 3.1 in the coming months, all of them using edismax already.

          Show
          Jan Høydahl added a comment - When I try to think practically on this, satisfying the vast majority of customers, locking down stuff has been a customer requirement in perhaps 2-3 of the almost 100 enterprise installations I've done over the last 11 years. The FAST search engine and its FQL query language by defaults exposes all features and (internal) fields to the end user, and there is no way to lock it down except through custom coding. If it's good enough for the largest enterprise customers out there, I'm sure it's good enough for Solr users. If someone need more locking down, let them contribute that - they won't sue anyone, this is collaboration This public news site runs Solr 1.4.2-dev with SOLR-1553 : http://www.rechargenews.com/search?q=oil Can anyone show a practical examples of queries end users can do here to crash/break the system or security in any way? I can construct one: http://www.rechargenews.com/search?q=norway:s+oil in which the first clause triggers the known foo:bar bug but I cannot see any realistic worryable examples. Making edismax the default now (with an explicit way to switch to the old) and finalizing baking in 3.2 before summer would be a win win, at least for those of my customers who are planning to release on 3.1 in the coming months, all of them using edismax already.
          Hide
          David Smiley added a comment -

          I really think defType should default to either dismax (or edismax) in Solr 4. It's what most people should use. edismax has already improved in recent 3.x releases for better handling of accidental field references.

          Show
          David Smiley added a comment - I really think defType should default to either dismax (or edismax) in Solr 4. It's what most people should use. edismax has already improved in recent 3.x releases for better handling of accidental field references.
          Hide
          Jamie Johnson added a comment - - edited

          I've found an issue with trailing && ||, mixed && ||

          I've put together a fix to SolrPluginUtils which addresses this issue for me (took pieces of Solr 874), but don't have time to patch the trunk with it. I'm no regex guru so there is probably a better way to handle some of this, but it worked for my use case.

          SolrPluginUtils.java
          	// Pattern to detect operator(s) padded with whitespace on the right, and
          	// dangling operator(s) at end of query:
          	private final static Pattern PADDED_OP_PATTERN = Pattern.compile( "(-|\\+)\\p{Z}" );
          	private final static String PADDED_OR_DANGLING_OP_PATTERN_REPL_STR = "$1";
          	
          	// Pattern to detect dangling operator(s) at end of query
          	private final static Pattern DANGLING_OP_PATTERN =  Pattern.compile("\\s+[-+(\\&\\&)(\\|\\|)\\s]+$");
          	private final static String DANGLING_OP_REPL_STR = "";
          	
          	// Pattern to detect consecutive + and/or - operators
          	static final Pattern CONSECUTIVE_OP_PATTERN = Pattern.compile( "(-)-+|(\\+)\\++" );
          	static final  String CONSECUTIVE_OP_PATTERN_REPL_STR = "$1$2";
          	
          	// Pattern to detect consecutive && and/or || operators
          	static final Pattern CONSECUTIVE_OP_PATTERN2 = Pattern.compile( "(\\&\\&)\\&+|(\\|\\|)\\|+" );
          	
          	// Pattern to detect mixed consecutive + and - operators:
          	static final Pattern MIXED_OP_PATTERN = Pattern.compile( "[-+]*(?:-\\+|\\+-)[-+]*" );
          	static final String MIXED_OP_PATTERN_REPL_STR = " ";
          	
          	// Pattern to detect mixed consecutive AND and OR operators
          	static final Pattern MIXED_OP_PATTERN2 = Pattern.compile( "(\\|\\|\\&\\&)|(\\&\\&|\\|\\|)" );
          	
          	public static CharSequence stripIllegalOperators(CharSequence s) {
          		return MIXED_OP_PATTERN2.matcher(
                           MIXED_OP_PATTERN.matcher(
                             CONSECUTIVE_OP_PATTERN2.matcher(
                               CONSECUTIVE_OP_PATTERN.matcher(
                                 PADDED_OP_PATTERN.matcher(
                                   DANGLING_OP_PATTERN.matcher(
                                     s
                                   ).replaceAll(DANGLING_OP_REPL_STR)
          					   ).replaceAll(PADDED_OR_DANGLING_OP_PATTERN_REPL_STR)
                               ).replaceAll(CONSECUTIVE_OP_PATTERN_REPL_STR)
          				   ).replaceAll(CONSECUTIVE_OP_PATTERN_REPL_STR)
          			     ).replaceAll(MIXED_OP_PATTERN_REPL_STR)
                        ).replaceAll(MIXED_OP_PATTERN_REPL_STR);
          	}
          
          Show
          Jamie Johnson added a comment - - edited I've found an issue with trailing && ||, mixed && || I've put together a fix to SolrPluginUtils which addresses this issue for me (took pieces of Solr 874), but don't have time to patch the trunk with it. I'm no regex guru so there is probably a better way to handle some of this, but it worked for my use case. SolrPluginUtils.java // Pattern to detect operator (s) padded with whitespace on the right, and // dangling operator (s) at end of query: private final static Pattern PADDED_OP_PATTERN = Pattern.compile( "(-|\\+)\\p{Z}" ); private final static String PADDED_OR_DANGLING_OP_PATTERN_REPL_STR = "$1" ; // Pattern to detect dangling operator (s) at end of query private final static Pattern DANGLING_OP_PATTERN = Pattern.compile( "\\s+[-+(\\&\\&)(\\|\\|)\\s]+$" ); private final static String DANGLING_OP_REPL_STR = ""; // Pattern to detect consecutive + and/or - operators static final Pattern CONSECUTIVE_OP_PATTERN = Pattern.compile( "(-)-+|(\\+)\\++" ); static final String CONSECUTIVE_OP_PATTERN_REPL_STR = "$1$2" ; // Pattern to detect consecutive && and/or || operators static final Pattern CONSECUTIVE_OP_PATTERN2 = Pattern.compile( "(\\&\\&)\\&+|(\\|\\|)\\|+" ); // Pattern to detect mixed consecutive + and - operators: static final Pattern MIXED_OP_PATTERN = Pattern.compile( "[-+]*(?:-\\+|\\+-)[-+]*" ); static final String MIXED_OP_PATTERN_REPL_STR = " " ; // Pattern to detect mixed consecutive AND and OR operators static final Pattern MIXED_OP_PATTERN2 = Pattern.compile( "(\\|\\|\\&\\&)|(\\&\\&|\\|\\|)" ); public static CharSequence stripIllegalOperators(CharSequence s) { return MIXED_OP_PATTERN2.matcher( MIXED_OP_PATTERN.matcher( CONSECUTIVE_OP_PATTERN2.matcher( CONSECUTIVE_OP_PATTERN.matcher( PADDED_OP_PATTERN.matcher( DANGLING_OP_PATTERN.matcher( s ).replaceAll(DANGLING_OP_REPL_STR) ).replaceAll(PADDED_OR_DANGLING_OP_PATTERN_REPL_STR) ).replaceAll(CONSECUTIVE_OP_PATTERN_REPL_STR) ).replaceAll(CONSECUTIVE_OP_PATTERN_REPL_STR) ).replaceAll(MIXED_OP_PATTERN_REPL_STR) ).replaceAll(MIXED_OP_PATTERN_REPL_STR); }
          Hide
          Jan Høydahl added a comment -

          I've started a Wiki page to document eDismax here: http://wiki.apache.org/solr/ExtendedDisMax - feel free to contribute!

          Once we get the "userFields" feature SOLR-3026 in, are there any blockers left for retiring the old dismax parser?

          Show
          Jan Høydahl added a comment - I've started a Wiki page to document eDismax here: http://wiki.apache.org/solr/ExtendedDisMax - feel free to contribute! Once we get the "userFields" feature SOLR-3026 in, are there any blockers left for retiring the old dismax parser?
          Hide
          Okke Klein added a comment -

          Was the feature

          advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be) then all will be required.

          from https://issues.apache.org/jira/browse/SOLR-1553 ever implemented? If not can this feature be added?

          Show
          Okke Klein added a comment - Was the feature advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be) then all will be required. from https://issues.apache.org/jira/browse/SOLR-1553 ever implemented? If not can this feature be added?
          Hide
          Jan Høydahl added a comment -

          Attaching SOLR-2649 to this "mother" issue as well

          Show
          Jan Høydahl added a comment - Attaching SOLR-2649 to this "mother" issue as well
          Hide
          Hoss Man added a comment -

          are there any blockers left for retiring the old dismax parser?

          As i've mentioned before, I don't think DismaxQParser should ever be retired ... i'm still not convinced that the (default) parser you get when using "defType=dismax" should change to ExtendedDismaxQParser instance

          My three main reasons for (still) feeling this way are:

          • I see no advantage to changing what QParser you get (by default) when asking for "dismax" ... not when it's so easy for new users (or old users who want to switch) to just use "edismax" by name. (or explicitly declare their own instance of ExtendedDismaxQParser with the name "dismax" if that's what they always want)
          • ExtendedDismaxQParser is a significantly more complex beast then DismaxQParser, and likely to have a lot of little quirks (and bugs) that no one has really noticed yet. For people who are happy with DismaxQParser, we should leave well enough alone.
          • Even with things like SOLR-3026 allowing you to disable field specific queries, ExtendedDismaxQParser still supports more types of queries/syntax then DismaxQParser (ie: fuzzy queries, prefix queries, wildcard queries, etc...) which may have performance impacts on existing dismax users, many of whom probably don't want to start allowing from their users – particularly considering that limited syntax w/o metacharacters was a major advertised advantage of using dismax from day 1.

          Please note: i have no tangible objection to smiley's suggestion that...

          defType should default to ... [edismax] in Solr 4

          ...if folks think that the ExtendedDismaxQParser would make a better default then the LuceneQParser moving forward, i've got no objection to that – but if someone explicitly asks for "defType=dismax" by name, that should be the DismaxQParser (and it's limited syntax) ... ExtendedDismaxQParser is a completely different animal.

          saying defType=dismax should return an ExtendedDismaxQParser makes as much sense to me as saying that defType=lucene should return an ExtendedDismaxQParser – just because the legal syntax of edismax is a super set of dismax/lucene doesn't mean they are equivalent or that we should assume "it's better" for people who ask for a specific QParser by name.

          Show
          Hoss Man added a comment - are there any blockers left for retiring the old dismax parser? As i've mentioned before, I don't think DismaxQParser should ever be retired ... i'm still not convinced that the (default) parser you get when using "defType=dismax" should change to ExtendedDismaxQParser instance My three main reasons for (still) feeling this way are: I see no advantage to changing what QParser you get (by default) when asking for "dismax" ... not when it's so easy for new users (or old users who want to switch) to just use "edismax" by name. (or explicitly declare their own instance of ExtendedDismaxQParser with the name "dismax" if that's what they always want) ExtendedDismaxQParser is a significantly more complex beast then DismaxQParser, and likely to have a lot of little quirks (and bugs) that no one has really noticed yet. For people who are happy with DismaxQParser, we should leave well enough alone. Even with things like SOLR-3026 allowing you to disable field specific queries, ExtendedDismaxQParser still supports more types of queries/syntax then DismaxQParser (ie: fuzzy queries, prefix queries, wildcard queries, etc...) which may have performance impacts on existing dismax users, many of whom probably don't want to start allowing from their users – particularly considering that limited syntax w/o metacharacters was a major advertised advantage of using dismax from day 1. Please note: i have no tangible objection to smiley's suggestion that... defType should default to ... [edismax] in Solr 4 ...if folks think that the ExtendedDismaxQParser would make a better default then the LuceneQParser moving forward, i've got no objection to that – but if someone explicitly asks for "defType=dismax" by name, that should be the DismaxQParser (and it's limited syntax) ... ExtendedDismaxQParser is a completely different animal. saying defType=dismax should return an ExtendedDismaxQParser makes as much sense to me as saying that defType=lucene should return an ExtendedDismaxQParser – just because the legal syntax of edismax is a super set of dismax/lucene doesn't mean they are equivalent or that we should assume "it's better" for people who ask for a specific QParser by name.
          Hide
          Jan Høydahl added a comment -

          Regarding old DisMax, do all bugs being detected in eDismax also get fixed in DisMax if applicable? I'm not sure..
          If/when eDismax can be configured to fill the original role of DisMax, why should we maintain the old one?
          And if edismax was the one written first, it would have the name "dismax", so why could it not get that name the day it supersedes dismax in features and usage?

          It's a bit analogue to IntField/TrieIntField - naming hints on implementation details to distinguish from other implementations, but if TrieIntField was developed in one go and not committed incrementally, it could simply have replaced IntField. eDisMax is @lucene.experimental and when it is up to par with dismax all over, it should in my opinion take over its name with a fat notice in CHANGES.TXT. For a few versions edismax could be a valid alias too, and the old dismax could be kept around as "legacydismax" for the conservative/lazy. How would you like to have to relate to EnhancedCsvUpdateRequestHandler at /solr/update/ecsv just because the original is less complex?

          Created SOLR-3086 to let users lobotomize edismax so it will only accept the query syntaxes they choose.

          Show
          Jan Høydahl added a comment - Regarding old DisMax, do all bugs being detected in eDismax also get fixed in DisMax if applicable? I'm not sure.. If/when eDismax can be configured to fill the original role of DisMax, why should we maintain the old one? And if edismax was the one written first, it would have the name "dismax", so why could it not get that name the day it supersedes dismax in features and usage? It's a bit analogue to IntField/TrieIntField - naming hints on implementation details to distinguish from other implementations, but if TrieIntField was developed in one go and not committed incrementally, it could simply have replaced IntField. eDisMax is @lucene.experimental and when it is up to par with dismax all over, it should in my opinion take over its name with a fat notice in CHANGES.TXT. For a few versions edismax could be a valid alias too, and the old dismax could be kept around as "legacydismax" for the conservative/lazy. How would you like to have to relate to EnhancedCsvUpdateRequestHandler at /solr/update/ecsv just because the original is less complex? Created SOLR-3086 to let users lobotomize edismax so it will only accept the query syntaxes they choose.
          Hide
          Jan Høydahl added a comment -

          Personally I don't think we should worry about the added features after edismax becomes dismax. If people don't read the release notes when upgrading, they cannot complain later: "Noone told me that fuzzy queries was allowed by dismax". Especially if we provide a way to turn it off. In worst case I can commit to changing the config defaults for edismax to resemble dismax behaviour, i.e. uf=-*&us.all=false (see SOLR-3086).

          Anyone of you out there using "dismax" over "edismax" for other reasons than the ones already mentioned?

          Show
          Jan Høydahl added a comment - Personally I don't think we should worry about the added features after edismax becomes dismax. If people don't read the release notes when upgrading, they cannot complain later: "Noone told me that fuzzy queries was allowed by dismax". Especially if we provide a way to turn it off. In worst case I can commit to changing the config defaults for edismax to resemble dismax behaviour, i.e. uf=-*&us.all=false (see SOLR-3086 ). Anyone of you out there using "dismax" over "edismax" for other reasons than the ones already mentioned?
          Hide
          Hoss Man added a comment -

          If/when eDismax can be configured to fill the original role of DisMax, why should we maintain the old one?

          my chief concerns – as i mentioned – are that currently edismax has behavior dismax doesn't support that people may actively not want, and that edismax may have quirks dismax doesn't that we have yet to discover and don't realize because the overall test coverage is low and the EDismaxQParse is so much more significantly complex and there are so many weird edge cases.

          But sure: if SOLR-3086 makes it possible to configure EDisMaxQParser to behave the same as DisMaxQParser, and if we feel confident through testing that (when configured as such) they behave the same, i've won't have any objections what soever to retiring the DisMaxQParser class for simplifying code maintence.

          Personally I don't think we should worry about the added features after edismax becomes dismax.

          this part i don't understand ... even if all of the functionality ultimately merges and only the EDisMaxQparser remains, why should defType=dismax and defType=edismax suddenly become the smae thing? why not offer two instances by default, "edismax" which is open and everything defaults to on, and "dismax" where it's more locked down like it is today? ... what is gained by changing the default behavior when people use "defType=dismax"?

          (as i said before (in a slightly diff way above): would you suggest that defType=lucene should now be an EDisMaxQparser instance as well? with a CHANGES.txt note telling people that if they only want features LuceneQParser supported, they have to add invariant params to disable them????)

          Show
          Hoss Man added a comment - If/when eDismax can be configured to fill the original role of DisMax, why should we maintain the old one? my chief concerns – as i mentioned – are that currently edismax has behavior dismax doesn't support that people may actively not want, and that edismax may have quirks dismax doesn't that we have yet to discover and don't realize because the overall test coverage is low and the EDismaxQParse is so much more significantly complex and there are so many weird edge cases. But sure: if SOLR-3086 makes it possible to configure EDisMaxQParser to behave the same as DisMaxQParser, and if we feel confident through testing that (when configured as such) they behave the same, i've won't have any objections what soever to retiring the DisMaxQParser class for simplifying code maintence. Personally I don't think we should worry about the added features after edismax becomes dismax. this part i don't understand ... even if all of the functionality ultimately merges and only the EDisMaxQparser remains, why should defType=dismax and defType=edismax suddenly become the smae thing? why not offer two instances by default, "edismax" which is open and everything defaults to on, and "dismax" where it's more locked down like it is today? ... what is gained by changing the default behavior when people use "defType=dismax"? (as i said before (in a slightly diff way above): would you suggest that defType=lucene should now be an EDisMaxQparser instance as well? with a CHANGES.txt note telling people that if they only want features LuceneQParser supported, they have to add invariant params to disable them????)
          Hide
          Jan Høydahl added a comment -

          Are you saying it would be possible to define something like this in solrconfig.xml

          <queryParser name="dismax" class="solr.ExtendedDismaxQParserPlugin">
            <lst name="defaults">
              <str name="uf">-*</str>
            </lst>
          </queryParser>
          

          And when someone says &defType=dismax it uses those defaults?

          If so, that is simply a brilliant way to do it.

          Show
          Jan Høydahl added a comment - Are you saying it would be possible to define something like this in solrconfig.xml <queryParser name= "dismax" class= "solr.ExtendedDismaxQParserPlugin" > <lst name= "defaults" > <str name= "uf" > -* </str> </lst> </queryParser> And when someone says &defType=dismax it uses those defaults? If so, that is simply a brilliant way to do it.
          Hide
          Hoss Man added a comment -

          Are you saying it would be possible to define something like this in solrconfig.xml

          ...something like that would certainly be possible if we changed the QParsers to start doing interesting things with their init params (presumably defaults there would be the lowest possible level defaults, overridden by things like request handler defaults/invariants/appends ... and it would really make sense to allow invariants/appends in the qparser init).

          but that would only really help with the backcompat "locked down" dismax situation i'm concerned with if we made sure all of those init params were also used in the implicitly created instance of "dismax"

          what i had in mind was actually far simpler...

          • "dismax" is implicitly an instance of DismaxQParserPlugin (unless overridden in solrconfig.xml) .. just like today
          • "edismax" is implicitly an instance of ExtendedDismaxQParserPlugin (unless overridden in solrconfig.xml) ... just like today
          • ExtendedDismaxQParserPlugin works exactly as it does today but instead of all the hardcoded default param values sprinkled around ExtendedDismaxQParser we put them all in a static Map or SolrParams instance and add a constructor arg to override those defaults
          • DismaxQParserPlugin gets changed to look something like...
            final static SolrParams REALLY_LIMITED_DEFAULTS = new SolrParams("uf", "-*", ...);
            public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {
              return new ExtendedDisMaxQParser(qstr, localParams, params, req, REALLY_LIMITED_DEFAULTS);
            }
            

            ...using that new ExtendedDisMaxQParser constructor arg to override the defaults * DisMaxQParser.java gets svn removed because it's no longer needed.

          ...all of which could then be enhanced with init based overrides of those defaults like you suggested.

          Show
          Hoss Man added a comment - Are you saying it would be possible to define something like this in solrconfig.xml ...something like that would certainly be possible if we changed the QParsers to start doing interesting things with their init params (presumably defaults there would be the lowest possible level defaults, overridden by things like request handler defaults/invariants/appends ... and it would really make sense to allow invariants/appends in the qparser init). but that would only really help with the backcompat "locked down" dismax situation i'm concerned with if we made sure all of those init params were also used in the implicitly created instance of "dismax" what i had in mind was actually far simpler... "dismax" is implicitly an instance of DismaxQParserPlugin (unless overridden in solrconfig.xml) .. just like today "edismax" is implicitly an instance of ExtendedDismaxQParserPlugin (unless overridden in solrconfig.xml) ... just like today ExtendedDismaxQParserPlugin works exactly as it does today but instead of all the hardcoded default param values sprinkled around ExtendedDismaxQParser we put them all in a static Map or SolrParams instance and add a constructor arg to override those defaults DismaxQParserPlugin gets changed to look something like... final static SolrParams REALLY_LIMITED_DEFAULTS = new SolrParams( "uf" , "-*" , ...); public QParser createParser( String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { return new ExtendedDisMaxQParser(qstr, localParams, params, req, REALLY_LIMITED_DEFAULTS); } ...using that new ExtendedDisMaxQParser constructor arg to override the defaults * DisMaxQParser.java gets svn removed because it's no longer needed. ...all of which could then be enhanced with init based overrides of those defaults like you suggested.
          Hide
          Jan Høydahl added a comment -

          Today we can use pf, pf2, pf3 to boost only, as qf is required and always adds a MUST clause.
          I think it would be useful to allow "pf/pf2/pf3" to give results if "qf" is not specified, i.e. empty qf must not add any MUST clause to the query.

          Then this would generate a query with one SHOULD clause returning only docs where foo and bar exist within 100 terms.

          q=foo bar&qf=&pf=text&ps=100
          

          In fact, it is impossible to say qf= to specify an empty qf, it still falls back to schema's defaultSearchField..

          Show
          Jan Høydahl added a comment - Today we can use pf, pf2, pf3 to boost only, as qf is required and always adds a MUST clause. I think it would be useful to allow "pf/pf2/pf3" to give results if "qf" is not specified, i.e. empty qf must not add any MUST clause to the query. Then this would generate a query with one SHOULD clause returning only docs where foo and bar exist within 100 terms. q=foo bar&qf=&pf=text&ps=100 In fact, it is impossible to say qf= to specify an empty qf, it still falls back to schema's defaultSearchField..
          Hide
          Leonhard Maylein added a comment -

          Please consider to also incorporate SOLR-3377 which is marked as fixed but it is not completely solved (see my comment on SOLR-3377).

          Show
          Leonhard Maylein added a comment - Please consider to also incorporate SOLR-3377 which is marked as fixed but it is not completely solved (see my comment on SOLR-3377 ).
          Hide
          Devendra Wangikar added a comment -

          1. do edismax parser supports && || operators ?
          2. Do edismax support 'not' for NOT operator ?

          Thanks in advance
          Devendra W

          Show
          Devendra Wangikar added a comment - 1. do edismax parser supports && || operators ? 2. Do edismax support 'not' for NOT operator ? Thanks in advance Devendra W

            People

            • Assignee:
              Unassigned
              Reporter:
              Yonik Seeley
            • Votes:
              9 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

              • Created:
                Updated:

                Development