Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4381

Query-time multi-word synonym expansion

    Details

      Description

      This is an issue that seems to come up perennially.

      The Solr docs caution that index-time synonym expansion should be preferred to query-time synonym expansion, due to the way multi-word synonyms are treated and how IDF values can be boosted artificially. But query-time expansion should have huge benefits, given that changes to the synonyms don't require re-indexing, the index size stays the same, and the IDF values for the documents don't get permanently altered.

      The proposed solution is to move the synonym expansion logic from the analysis chain (either query- or index-type) and into a new QueryParser. See the attached patch for an implementation.

      The core Lucene functionality is untouched. Instead, the EDismaxQParser is extended, and synonym expansion is done on-the-fly. Queries are parsed into a lattice (i.e. all possible synonym combinations), while individual components of the query are still handled by the EDismaxQParser itself.

      It's not an ideal solution by any stretch. But it's nice and self-contained, so it invites experimentation and improvement. And I think it fits in well with the merry band of misfit query parsers, like func and frange.

      More details about this solution can be found in this blog post and the Github page for the code.

      At the risk of tooting my own horn, I also think this patch sufficiently fixes SOLR-3390 (highlighting problems with multi-word synonyms) and LUCENE-4499 (better support for multi-word synonyms).

      1. SOLR-4381.patch
        27 kB
        Nolan Lawson
      2. SOLR-4381-2.patch
        27 kB
        Nolan Lawson

        Issue Links

          Activity

          Hide
          janhoy Jan Høydahl added a comment -

          Hi. Well written blog post! I agree that the synonym feature is better implemented above analysis, so QP fits well. Question is whether each query parser would need its own implementation or if it could be generalized?

          Also, I quite like the fact that the Analysis-synonyms allow for different dictionaries per field, so that if you have a qf=text_en text_de, to search two languages at the same time, they can expand synonyms differently. A suggestion to allow that in your approach could be for the QP to inspect the query analysis chain for each field in qf, and if it finds a SynoymFilterFactory, it will use that dictionary instead of the global one (and of course disable the analysis filter). This is a trick that eDisMax already does for conditional stopword handling. Such an approach makes it easier to migrate from what people may have now, to this solution.

          I have not tested the patch yet. But I absolutely like the concept!

          Show
          janhoy Jan Høydahl added a comment - Hi. Well written blog post! I agree that the synonym feature is better implemented above analysis, so QP fits well. Question is whether each query parser would need its own implementation or if it could be generalized? Also, I quite like the fact that the Analysis-synonyms allow for different dictionaries per field, so that if you have a qf=text_en text_de, to search two languages at the same time, they can expand synonyms differently. A suggestion to allow that in your approach could be for the QP to inspect the query analysis chain for each field in qf, and if it finds a SynoymFilterFactory, it will use that dictionary instead of the global one (and of course disable the analysis filter). This is a trick that eDisMax already does for conditional stopword handling. Such an approach makes it easier to migrate from what people may have now, to this solution. I have not tested the patch yet. But I absolutely like the concept!
          Hide
          nolanlawson Nolan Lawson added a comment -

          Hi Jan. Thanks for the speedy reply! In answer to your questions:

          Question is whether each query parser would need its own implementation or if it could be generalized?

          I agree that it would be nice to abstract the code out of just EDisMax. I think this parser could subclass DisMax just as easily as EDisMax, or it could be abstracted out into its own class that takes either DisMax or EDisMax as a constructor argument and then delegates to it. But for the Lucene parser it might be a bit more complicated, because I specifically check for some DisMax parameters (e.g. QF), plus there is some code copied from EDisMax itself where it's private rather than protected (e.g. these lines). Cleverer folks than me in the Lucene project might know a better way to do this, though.

          A suggestion to allow that in your approach could be for the QP to inspect the query analysis chain for each field in qf, and if it finds a SynoymFilterFactory, it will use that dictionary instead of the global one (and of course disable the analysis filter).

          I agree that the less configuration, the better. However, I kind of like leaving the SynonymFilterFactory out of the analysis chains, because it makes it clearer that the synonym expansion logic isn't happening there at all. Plus, in most of the use cases we've seen, the only difference between the query-time analyzer and the index-time analyzer was the SynonymFilterFactory itself, so removing it gained us some code simplicity, by allowing us to define just one analyzer for both. Perhaps other folks have had different experiences, though.

          Show
          nolanlawson Nolan Lawson added a comment - Hi Jan. Thanks for the speedy reply! In answer to your questions: Question is whether each query parser would need its own implementation or if it could be generalized? I agree that it would be nice to abstract the code out of just EDisMax. I think this parser could subclass DisMax just as easily as EDisMax, or it could be abstracted out into its own class that takes either DisMax or EDisMax as a constructor argument and then delegates to it. But for the Lucene parser it might be a bit more complicated, because I specifically check for some DisMax parameters (e.g. QF), plus there is some code copied from EDisMax itself where it's private rather than protected (e.g. these lines ). Cleverer folks than me in the Lucene project might know a better way to do this, though. A suggestion to allow that in your approach could be for the QP to inspect the query analysis chain for each field in qf, and if it finds a SynoymFilterFactory, it will use that dictionary instead of the global one (and of course disable the analysis filter). I agree that the less configuration, the better. However, I kind of like leaving the SynonymFilterFactory out of the analysis chains, because it makes it clearer that the synonym expansion logic isn't happening there at all. Plus, in most of the use cases we've seen, the only difference between the query-time analyzer and the index-time analyzer was the SynonymFilterFactory itself, so removing it gained us some code simplicity, by allowing us to define just one analyzer for both. Perhaps other folks have had different experiences, though.
          Hide
          jkrupan Jack Krupansky added a comment -

          I have personally implemented multi-word synonym support within a query parser, bypassing analysis for synonym processing as you suggest, but still examining the analysis chain to discover and load the field-specific synonym table. Yes, that approach can work, but I have refrained from proposing such a solution in Solr/Lucene since it is rather messy and not really an ideal solution because it does bypass analysis. There are ongoing discussions on the Lucene/Solr lists about how best to address query-time synonym processing; there have actually been some hopeful suggestions recently, but still a long way to go. I would rather see those discussions continue and come to fruition than see edismax changed in a way that would be incompatible with a more ideal solution.

          I suppose you could simply have your patch remain a patch forever without integration into the Solr code base, for people who are desperate to have the feature in edismax, but due to its far-from-ideal nature (bypassing analysis and not supporting field-specific synonym tables), it would seem less likely to be integrated into the Solr code base since it would interfere with a broader solution. Note that I am NOT a committer, so I would have no official say in the matter. This is just my own opinion.

          I suppose you could also package it as a separate "contrib" query parser and then it could be integrated into a Solr release and be available to anybody without the need for patching. That might be the more fruitful approach for near-term integration.

          But I would definitely be -1 for direct integration into edismax since it does bypass analysis (and as an incidental objection doesn't support field-specific synonym tables.) Analysis is really important and gives the developer fine-tuning control over field-specific processing without changing any code.

          OTOH, if it could be turned on and off dynamically with a request parameter, maybe direct integration into the Solr code base would be feasible. IOW, if it is simply a user-selectable "plugin", that would be more compelling.

          Again, I am not a committer, so my opinion here can be freely ignored.

          Show
          jkrupan Jack Krupansky added a comment - I have personally implemented multi-word synonym support within a query parser, bypassing analysis for synonym processing as you suggest, but still examining the analysis chain to discover and load the field-specific synonym table. Yes, that approach can work, but I have refrained from proposing such a solution in Solr/Lucene since it is rather messy and not really an ideal solution because it does bypass analysis. There are ongoing discussions on the Lucene/Solr lists about how best to address query-time synonym processing; there have actually been some hopeful suggestions recently, but still a long way to go. I would rather see those discussions continue and come to fruition than see edismax changed in a way that would be incompatible with a more ideal solution. I suppose you could simply have your patch remain a patch forever without integration into the Solr code base, for people who are desperate to have the feature in edismax, but due to its far-from-ideal nature (bypassing analysis and not supporting field-specific synonym tables), it would seem less likely to be integrated into the Solr code base since it would interfere with a broader solution. Note that I am NOT a committer, so I would have no official say in the matter. This is just my own opinion. I suppose you could also package it as a separate "contrib" query parser and then it could be integrated into a Solr release and be available to anybody without the need for patching. That might be the more fruitful approach for near-term integration. But I would definitely be -1 for direct integration into edismax since it does bypass analysis (and as an incidental objection doesn't support field-specific synonym tables.) Analysis is really important and gives the developer fine-tuning control over field-specific processing without changing any code. OTOH, if it could be turned on and off dynamically with a request parameter, maybe direct integration into the Solr code base would be feasible. IOW, if it is simply a user-selectable "plugin", that would be more compelling. Again, I am not a committer, so my opinion here can be freely ignored.
          Hide
          janhoy Jan Høydahl added a comment -

          We'd benefit from a more component based QP framework, then this could be a plugin. But that's for another century I guess

          I agree that the less configuration, the better. However, I kind of like leaving the SynonymFilterFactory out of the analysis chains, because it makes it clearer that the synonym expansion logic isn't happening there at all. Plus, in most of the use cases we've seen, the only difference between the query-time analyzer and the index-time analyzer was the SynonymFilterFactory itself, so removing it gained us some code simplicity, by allowing us to define just one analyzer for both. Perhaps other folks have had different experiences, though.

          Sure, it's confusing not to have a WYSIWYG Analysis. Perhaps we can include fieldType referencs instead of defining analysis with a new syntax, something like what SpellCheckComponent does in config param queryAnalyzerFieldType.

          And perhaps even better than tying dictionary to fieldType, would be to be able to choose dictionary per field name. Here's an imagined config based on these ideas:

          <queryParser name="synonym_edismax" class="solr.SynonymExpandingExtendedDismaxQParserPlugin">
            <str name="defaultDict">english</str>
            <lst name="dictionaries">
              <lst name="english">
                <str name="fieldType">synonym_type_en</str>
                <str name="useForFields">title *_en</str>
              </lst>
              <lst name="addresses">
                <str name="fieldType">synonym_type_addr</str>
                <str name="useForFields">street city state</str>
              </lst>
            </lst>
          </queryparser>
          

          We could even have a convention that if the queryParser config is empty, then look for a fieldType in Schema named "synonymDefaultAnalysis" and use that for synonym expansion for all fields of a TextField type.

          Show
          janhoy Jan Høydahl added a comment - We'd benefit from a more component based QP framework, then this could be a plugin. But that's for another century I guess I agree that the less configuration, the better. However, I kind of like leaving the SynonymFilterFactory out of the analysis chains, because it makes it clearer that the synonym expansion logic isn't happening there at all. Plus, in most of the use cases we've seen, the only difference between the query-time analyzer and the index-time analyzer was the SynonymFilterFactory itself, so removing it gained us some code simplicity, by allowing us to define just one analyzer for both. Perhaps other folks have had different experiences, though. Sure, it's confusing not to have a WYSIWYG Analysis. Perhaps we can include fieldType referencs instead of defining analysis with a new syntax, something like what SpellCheckComponent does in config param queryAnalyzerFieldType . And perhaps even better than tying dictionary to fieldType, would be to be able to choose dictionary per field name. Here's an imagined config based on these ideas: <queryParser name= "synonym_edismax" class= "solr.SynonymExpandingExtendedDismaxQParserPlugin" > <str name= "defaultDict" > english </str> <lst name= "dictionaries" > <lst name= "english" > <str name= "fieldType" > synonym_type_en </str> <str name= "useForFields" > title *_en </str> </lst> <lst name= "addresses" > <str name= "fieldType" > synonym_type_addr </str> <str name= "useForFields" > street city state </str> </lst> </lst> </queryparser> We could even have a convention that if the queryParser config is empty, then look for a fieldType in Schema named "synonymDefaultAnalysis" and use that for synonym expansion for all fields of a TextField type.
          Hide
          nolanlawson Nolan Lawson added a comment -

          I do agree with Jack that this is a less-than-ideal solution (the XML config is pure hack). So I'd be happy to have it included in contrib/ to keep it from muddying up the central code base. In any case, I intend to maintain the Github code, which acts as a sort of drop-in JAR plugin for multiple Solr versions.

          My goal was mostly just to start a conversation about this. I think Jan's proposed configuration is a step in the right direction, for instance - very clear and concise! So this might be something worth incubating a bit more in Github before contributing it to Solr.

          Speaking of which, please disregard the patch I posted. As it turns out, I'm having problems getting it to work with Solr 4.1.0 (due to this bug), although 3.5.0 - 4.0.0 all work nicely.

          Show
          nolanlawson Nolan Lawson added a comment - I do agree with Jack that this is a less-than-ideal solution ( the XML config is pure hack). So I'd be happy to have it included in contrib/ to keep it from muddying up the central code base. In any case, I intend to maintain the Github code , which acts as a sort of drop-in JAR plugin for multiple Solr versions. My goal was mostly just to start a conversation about this. I think Jan's proposed configuration is a step in the right direction, for instance - very clear and concise! So this might be something worth incubating a bit more in Github before contributing it to Solr. Speaking of which, please disregard the patch I posted. As it turns out, I'm having problems getting it to work with Solr 4.1.0 (due to this bug ), although 3.5.0 - 4.0.0 all work nicely.
          Hide
          steve_rowe Steve Rowe added a comment - - edited

          As it turns out, I'm having problems getting it to work with Solr 4.1.0 (due to this bug), although 3.5.0 - 4.0.0 all work nicely.

          I commented on the bug with more details, but basically you need to call reset() before using any tokenstream.

          Show
          steve_rowe Steve Rowe added a comment - - edited As it turns out, I'm having problems getting it to work with Solr 4.1.0 (due to this bug ), although 3.5.0 - 4.0.0 all work nicely. I commented on the bug with more details, but basically you need to call reset() before using any tokenstream.
          Hide
          nolanlawson Nolan Lawson added a comment - - edited

          Thanks for the guidance, Steve. The attached SOLR-4381-2.patch should work for Solr 4.1.0.

          Show
          nolanlawson Nolan Lawson added a comment - - edited Thanks for the guidance, Steve. The attached SOLR-4381-2.patch should work for Solr 4.1.0.
          Hide
          janhoy Jan Høydahl added a comment -

          Earlier I was considering a SynonymSearchComponent but it would be hard to make it work with all query parsers. So the more I think of it, I believe synonyms is best solved in the Query Parser. Deboosting expanded synonyms is quite important I think. Let's evolve this a bit in GitHub and dump a patch here later.

          Tip: When uploading patches, give them the same name every time. Jira will grey out the older versjons.

          Show
          janhoy Jan Høydahl added a comment - Earlier I was considering a SynonymSearchComponent but it would be hard to make it work with all query parsers. So the more I think of it, I believe synonyms is best solved in the Query Parser. Deboosting expanded synonyms is quite important I think. Let's evolve this a bit in GitHub and dump a patch here later. Tip: When uploading patches, give them the same name every time. Jira will grey out the older versjons.
          Hide
          janhoy Jan Høydahl added a comment -

          Tested it with 4.1 and it works like a charm. A few comments:

          I dropped the jar into SOLR_HOME/lib and declared sharedLib="lib" in solr.xml, and the plugin was picked up. No need to re-package WAR. This was Tomcat7.

          Aren't multi word synonyms supposed to be treaded like a phrase? What I see is that the individual words are searched. Example synonym apache software foundation,apache,asf, and I query for "asf":

          +((manu:asf)^2.0 (((manu:apache) (((manu:software) (manu:apache) (manu:foundation))~3))^0.2))
          

          This will match also apache "apache foundation software". If I quote the multi-word synonym in synonyms.txt, it gets quoted, but then the phrase is not detected when entered as query.

          Perhaps an option to control the sloppyness of expanded multi-word synonyms would do.

          Show
          janhoy Jan Høydahl added a comment - Tested it with 4.1 and it works like a charm. A few comments: I dropped the jar into SOLR_HOME/lib and declared sharedLib="lib" in solr.xml, and the plugin was picked up. No need to re-package WAR. This was Tomcat7. Aren't multi word synonyms supposed to be treaded like a phrase? What I see is that the individual words are searched. Example synonym apache software foundation,apache,asf , and I query for "asf": +((manu:asf)^2.0 (((manu:apache) (((manu:software) (manu:apache) (manu:foundation))~3))^0.2)) This will match also apache "apache foundation software". If I quote the multi-word synonym in synonyms.txt, it gets quoted, but then the phrase is not detected when entered as query. Perhaps an option to control the sloppyness of expanded multi-word synonyms would do.
          Hide
          nolanlawson Nolan Lawson added a comment -

          Absolutely right, Jan. "software foundation apache", "foundation software apache", and any other combination all match. I've filed a bug.

          This is what I get for submitting my code to the harsh light of day! Hopefully I can push out a fix by this weekend.

          Also, thanks for the tip about sharedLib="lib". I'll test it out and add it to the "Getting Started" instructions.

          I agree that development should stay in GitHub for now. I'll re-request a merge when the code is a bit more mature.

          Show
          nolanlawson Nolan Lawson added a comment - Absolutely right, Jan. "software foundation apache", "foundation software apache", and any other combination all match. I've filed a bug. This is what I get for submitting my code to the harsh light of day! Hopefully I can push out a fix by this weekend. Also, thanks for the tip about sharedLib="lib". I'll test it out and add it to the "Getting Started" instructions. I agree that development should stay in GitHub for now. I'll re-request a merge when the code is a bit more mature.
          Hide
          janhoy Jan Høydahl added a comment -

          Could you specify which private methods in eDisMax you needed to copy/paste? Perhaps we can look at how to make it more extension friendly?

          Show
          janhoy Jan Høydahl added a comment - Could you specify which private methods in eDisMax you needed to copy/paste? Perhaps we can look at how to make it more extension friendly?
          Hide
          jkrupan Jack Krupansky added a comment -

          If this issue is to be seriously pursued as part of edismax, the following should be included here in JIRA:

          1. A concise summary of the overall approach, with key technical details.

          2. A few example queries, both source and the resulting "parsed query". Key test cases, if you will.

          3. A semi-detailed summary of what the user of the change needs to know, in terms of how to set it up, manage it, use it, and its precise effects.

          4. Detail any limitations.

          That said, if you were to implement this as pat of a standalone, "contrib" query parser, you you are much freer to do whatever you want with no regard to potential consequences and need not worry about fine details. But if you want this to be part of edismax, you'll need to be very, very careful. I would suggest the former - it would allow you to get going much more rapidly. Integration with edismax proper could be deferred until you're happy that you've done all you've intended to do - and meanwhile the contrib module would be available for others to use out of the box.

          4. Specifically what features of the Synonym Filter will be lost by using this approach.

          Show
          jkrupan Jack Krupansky added a comment - If this issue is to be seriously pursued as part of edismax, the following should be included here in JIRA: 1. A concise summary of the overall approach, with key technical details. 2. A few example queries, both source and the resulting "parsed query". Key test cases, if you will. 3. A semi-detailed summary of what the user of the change needs to know, in terms of how to set it up, manage it, use it, and its precise effects. 4. Detail any limitations. That said, if you were to implement this as pat of a standalone, "contrib" query parser, you you are much freer to do whatever you want with no regard to potential consequences and need not worry about fine details. But if you want this to be part of edismax, you'll need to be very, very careful. I would suggest the former - it would allow you to get going much more rapidly. Integration with edismax proper could be deferred until you're happy that you've done all you've intended to do - and meanwhile the contrib module would be available for others to use out of the box. 4. Specifically what features of the Synonym Filter will be lost by using this approach.
          Hide
          nolanlawson Nolan Lawson added a comment -

          Could you specify which private methods in eDisMax you needed to copy/paste? Perhaps we can look at how to make it more extension friendly?

          These lines.

          If this issue is to be seriously pursued as part of edismax, the following should be included here in JIRA:

          I don't think it should be included in EDisMax itself. Extending EDisMax was just a temporary shortcut I took, but Jan points out that the solution itself could be applied outside EDisMax, or even outside Solr.

          1. A concise summary of the overall approach, with key technical details.

          Please see this blog post for the best explanation.

          2. A few example queries, both source and the resulting "parsed query". Key test cases, if you will.

          Good idea. Added to the README.

          3. A semi-detailed summary of what the user of the change needs to know, in terms of how to set it up, manage it, use it, and its precise effects.

          In the README for now.

          4. Detail any limitations.

          Currently handling this in the Issues page. Otherwise the standard query-time expansion concerns apply: increased delay in query execution, configuration is in the request parameters instead of the schema.xml, query becomes bloated and incomprehensible. Also potential user confusion on the single "best practice" solution for synonyms in Solr, since Solr already has a well-documented way of handling synonyms through the SynonymFilterFactory. As of right now, I assume people will only use my solution if they try the standard solution and are unsatisfied.

          4. Specifically what features of the Synonym Filter will be lost by using this approach.

          As far as I know, none, because I'm still using the SynonymFilterFactory and it's configurable by the user.

          In general, I agree with you that some rapid iteration outside of the Solr core would probably be a better approach than outright integration. Please consider my "merge request" withdrawn; I'll let the code incubate for a bit, and then look into integration later.

          Show
          nolanlawson Nolan Lawson added a comment - Could you specify which private methods in eDisMax you needed to copy/paste? Perhaps we can look at how to make it more extension friendly? These lines . If this issue is to be seriously pursued as part of edismax, the following should be included here in JIRA: I don't think it should be included in EDisMax itself. Extending EDisMax was just a temporary shortcut I took, but Jan points out that the solution itself could be applied outside EDisMax, or even outside Solr. 1. A concise summary of the overall approach, with key technical details. Please see this blog post for the best explanation. 2. A few example queries, both source and the resulting "parsed query". Key test cases, if you will. Good idea. Added to the README. 3. A semi-detailed summary of what the user of the change needs to know, in terms of how to set it up, manage it, use it, and its precise effects. In the README for now. 4. Detail any limitations. Currently handling this in the Issues page . Otherwise the standard query-time expansion concerns apply: increased delay in query execution, configuration is in the request parameters instead of the schema.xml , query becomes bloated and incomprehensible. Also potential user confusion on the single "best practice" solution for synonyms in Solr, since Solr already has a well-documented way of handling synonyms through the SynonymFilterFactory . As of right now, I assume people will only use my solution if they try the standard solution and are unsatisfied. 4. Specifically what features of the Synonym Filter will be lost by using this approach. As far as I know, none, because I'm still using the SynonymFilterFactory and it's configurable by the user. In general, I agree with you that some rapid iteration outside of the Solr core would probably be a better approach than outright integration. Please consider my "merge request" withdrawn; I'll let the code incubate for a bit, and then look into integration later.
          Hide
          otis Otis Gospodnetic added a comment -

          In general, I agree with you that some rapid iteration outside of the Solr core would probably be a better approach than outright integration. Please consider my "merge request" withdrawn; I'll let the code incubate for a bit, and then look into integration later.

          Has that time come by any chance?

          Show
          otis Otis Gospodnetic added a comment - In general, I agree with you that some rapid iteration outside of the Solr core would probably be a better approach than outright integration. Please consider my "merge request" withdrawn; I'll let the code incubate for a bit, and then look into integration later. Has that time come by any chance?
          Hide
          otis Otis Gospodnetic added a comment -

          Nolan Lawson I see this marked for 4.3. Does this mean there is no patch for 4.1 or 4.2? (even though I see a 4.1 jar int he README on Github) TIA.

          Show
          otis Otis Gospodnetic added a comment - Nolan Lawson I see this marked for 4.3. Does this mean there is no patch for 4.1 or 4.2? (even though I see a 4.1 jar int he README on Github) TIA.
          Hide
          nolanlawson Nolan Lawson added a comment - - edited

          Hey Otis Gospodnetic:

          To answer your first question, I don't believe that time has come yet. I still have a lot of open issues to fix, although I'm slowly whittling them down.

          The most important one, I believe, is to use composition instead of subclassing, so that I can support the DisMax and Lucene query parsers instead of just EDisMax. After that I need to seriously simplify the XML configuration.

          As for your second question, I'm testing out Solr 4.2 as we speak. If it works, I'll modify the README; otherwise I'll add a GitHub issue.

          Show
          nolanlawson Nolan Lawson added a comment - - edited Hey Otis Gospodnetic : To answer your first question, I don't believe that time has come yet. I still have a lot of open issues to fix, although I'm slowly whittling them down. The most important one, I believe, is to use composition instead of subclassing, so that I can support the DisMax and Lucene query parsers instead of just EDisMax. After that I need to seriously simplify the XML configuration. As for your second question, I'm testing out Solr 4.2 as we speak. If it works, I'll modify the README; otherwise I'll add a GitHub issue.
          Hide
          otis Otis Gospodnetic added a comment -

          Nolan Lawson: would it be possible for you to (quickly?) open issues for everything that remains to be done? I'm asking/suggesting this because we (Sematext) have a client who'd really like to see this committed to Solr, we are willing to put in the work to make that happen, and to make this possible it would be really helpful to see what the remaining issues are. Thanks!

          Show
          otis Otis Gospodnetic added a comment - Nolan Lawson : would it be possible for you to (quickly?) open issues for everything that remains to be done? I'm asking/suggesting this because we (Sematext) have a client who'd really like to see this committed to Solr, we are willing to put in the work to make that happen, and to make this possible it would be really helpful to see what the remaining issues are. Thanks!
          Hide
          nolanlawson Nolan Lawson added a comment -

          Otis Gospodnetic: OK, I've updated everything in the GitHub Issues page. If you're willing to put in work, then please do send me a pull request! Looking forward to it.

          Show
          nolanlawson Nolan Lawson added a comment - Otis Gospodnetic : OK, I've updated everything in the GitHub Issues page . If you're willing to put in work, then please do send me a pull request! Looking forward to it.
          Hide
          otis Otis Gospodnetic added a comment -

          Thanks Nolan Lawson. In your blog post I see "...the parser does not currently expand synonyms if the user input contains complex query operators (i.e. AND, OR, +, and -). This is a TODO for a future release.", but it looks like that's not on the list of open issues at https://github.com/healthonnet/hon-lucene-synonyms/issues?state=open

          Maybe this is no longer true - are Boolean operators handled correctly now?

          Show
          otis Otis Gospodnetic added a comment - Thanks Nolan Lawson . In your blog post I see "...the parser does not currently expand synonyms if the user input contains complex query operators (i.e. AND, OR, +, and -). This is a TODO for a future release.", but it looks like that's not on the list of open issues at https://github.com/healthonnet/hon-lucene-synonyms/issues?state=open Maybe this is no longer true - are Boolean operators handled correctly now?
          Hide
          okkeklein Okke Klein added a comment -

          The terms that are being expanded by the solr.SynonymFilterFactory are also being stemmed. This is unwanted if you want to expand "MIA" to "missing in action" and not "miss in action". See Github issue for details.

          Show
          okkeklein Okke Klein added a comment - The terms that are being expanded by the solr.SynonymFilterFactory are also being stemmed. This is unwanted if you want to expand "MIA" to "missing in action" and not "miss in action". See Github issue for details.
          Hide
          hemantverma09 Hemant Verma added a comment - - edited

          While using this patch I found one scenario in which it is not working properly.
          I have in my synonyms list the below keywords:
          pepsi,pepsico,pbg
          outsourcing,rpo,offshoring

          Difference in expanding synonyms comes up when I use any of the word with stopword as a prefix.

          Search Keyword ------------ Expanded Result
          ----------------------------------------------------------------
          pepsi -----------------------> pepsi, pepsico, pbg
          pbg -------------------------> pepsi, pepsico, pbg
          the pepsi -----------------> pepsi, pepsico
          the pbg -------------------> pepsi, pbg
          outsourcing -------------> outsourc, offshor, rpo
          the outsourcing --------> outsourc, offshor

          The above expanded synonyms result shows that when we use any keyword (available in synonym list) prefixed with stopword then expanded synonyms do miss few synonym.

          Show
          hemantverma09 Hemant Verma added a comment - - edited While using this patch I found one scenario in which it is not working properly. I have in my synonyms list the below keywords: pepsi,pepsico,pbg outsourcing,rpo,offshoring Difference in expanding synonyms comes up when I use any of the word with stopword as a prefix. Search Keyword ------------ Expanded Result ---------------------------------------------------------------- pepsi -----------------------> pepsi, pepsico, pbg pbg -------------------------> pepsi, pepsico, pbg the pepsi -----------------> pepsi, pepsico the pbg -------------------> pepsi, pbg outsourcing -------------> outsourc, offshor, rpo the outsourcing --------> outsourc, offshor The above expanded synonyms result shows that when we use any keyword (available in synonym list) prefixed with stopword then expanded synonyms do miss few synonym.
          Hide
          steve_rowe Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          steve_rowe Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          nolanlawson Nolan Lawson added a comment -

          Otis Gospodnetic: I was going to tip you off to this open issue, but then I realized you were the one who filed it!

          Hemant Verma: Could you try the 1.3.3 release and see if this still persists? If not, please file a new issue on GitHub.

          BTW, the project has matured considerably since I first opened this issue. It seems pretty healthy as a separate plugin, but I'll defer to the wisdom of the Solr devs as to whether or not they want to include it in 5.0. As it stands, you can pretty much just copy the source files over; it still lives as a separate QParserPlugin class.

          Show
          nolanlawson Nolan Lawson added a comment - Otis Gospodnetic : I was going to tip you off to this open issue , but then I realized you were the one who filed it! Hemant Verma : Could you try the 1.3.3 release and see if this still persists? If not, please file a new issue on GitHub . BTW, the project has matured considerably since I first opened this issue. It seems pretty healthy as a separate plugin, but I'll defer to the wisdom of the Solr devs as to whether or not they want to include it in 5.0. As it stands, you can pretty much just copy the source files over; it still lives as a separate QParserPlugin class.
          Hide
          thetaphi Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          thetaphi Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          janhoy Jan Høydahl added a comment -

          Closing as won't fix, since the new graph and sow stuff in SOLR-9185, 10343 (https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/) provides an official solution, so the sedismax code will never land in Solr.

          Show
          janhoy Jan Høydahl added a comment - Closing as won't fix, since the new graph and sow stuff in SOLR-9185 , 10343 ( https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/ ) provides an official solution, so the sedismax code will never land in Solr.

            People

            • Assignee:
              Unassigned
              Reporter:
              nolanlawson Nolan Lawson
            • Votes:
              20 Vote for this issue
              Watchers:
              34 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development