Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5379

Query-time multi-word synonym expansion

    Details

      Description

      While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons:

      • First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion
      • Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words.

      For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605.

      For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym.

      1. conf-test-files-4_8_1.patch
        6 kB
        Jeremy Anderson
      2. quoted.patch
        21 kB
        Tien Nguyen Manh
      3. quoted-4_8_1.patch
        21 kB
        Jeremy Anderson
      4. solr-5379-version-4.10.3.patch
        57 kB
        Rafał Kuć
      5. synonym-expander.patch
        16 kB
        Tien Nguyen Manh
      6. synonym-expander-4_8_1.patch
        25 kB
        Jeremy Anderson

        Issue Links

          Activity

          Hide
          tiennm Tien Nguyen Manh added a comment -

          Here are two patchs for above two issue
          1. quoted.patch is an extended EDismaxQParser with new option to quoted multi-word synonym in user query
          2. synonym-expander.patch is a patch to create new Query structure when user query have multi-word synonym

          Show
          tiennm Tien Nguyen Manh added a comment - Here are two patchs for above two issue 1. quoted.patch is an extended EDismaxQParser with new option to quoted multi-word synonym in user query 2. synonym-expander.patch is a patch to create new Query structure when user query have multi-word synonym
          Hide
          otis Otis Gospodnetic added a comment - - edited

          Tien Nguyen Manh How does this differ from SOLR-4381? Which cases does SOLR-4381 not handle that this patch handles?

          Show
          otis Otis Gospodnetic added a comment - - edited Tien Nguyen Manh How does this differ from SOLR-4381 ? Which cases does SOLR-4381 not handle that this patch handles?
          Hide
          otis Otis Gospodnetic added a comment -

          My understanding of how this synonym expander (the synonym-expander.patch) works is:

          Assume synonyms are:

          Seabiscuit, Sea biscit, Biscit
          

          For query "Seabiscuit article", the regular edismax will construct a MultiPhraseQuery like ("Seebiscuit|Sea|biscit", biscit, article").

          Instead of that, this patch rewrites the query differently:
          PhraseQuery(Seabiscit article) OR PhraseQuery(Sea biscit article) OR PhraseQuery(biscit article)

          Show
          otis Otis Gospodnetic added a comment - My understanding of how this synonym expander (the synonym-expander.patch) works is: Assume synonyms are: Seabiscuit, Sea biscit, Biscit For query "Seabiscuit article", the regular edismax will construct a MultiPhraseQuery like ("Seebiscuit|Sea|biscit", biscit, article"). Instead of that, this patch rewrites the query differently: PhraseQuery(Seabiscit article) OR PhraseQuery(Sea biscit article) OR PhraseQuery(biscit article)
          Hide
          tiennm Tien Nguyen Manh added a comment -

          Otis Gospodnetic The difference are
          SOLR-4381 is an extension of EDismax so it only work for that query parser, my patch is a patch to SolrQueryParserBase it work for any query parser
          SOLR-4381 rewrite query into lattice (all synonym combination) so it need to parse N modified query, my patch is applied when we read tokenstream to build Lucene Query, so it still parse query 1 time and
          we can still optimize my work to make the result Lucene Query compacted by combine both MultiPhraseQuery and PhraseQuery, so the Lucene Query of my patch is smaller than SOLR-4381

          Show
          tiennm Tien Nguyen Manh added a comment - Otis Gospodnetic The difference are SOLR-4381 is an extension of EDismax so it only work for that query parser, my patch is a patch to SolrQueryParserBase it work for any query parser SOLR-4381 rewrite query into lattice (all synonym combination) so it need to parse N modified query, my patch is applied when we read tokenstream to build Lucene Query, so it still parse query 1 time and we can still optimize my work to make the result Lucene Query compacted by combine both MultiPhraseQuery and PhraseQuery, so the Lucene Query of my patch is smaller than SOLR-4381
          Hide
          tiennm Tien Nguyen Manh added a comment -

          That corrected Otis Gospodnetic

          Show
          tiennm Tien Nguyen Manh added a comment - That corrected Otis Gospodnetic
          Hide
          MarcoWong Marco Wong added a comment - - edited

          Excuse me, for the synonym-expander.patch, when I have a ShingleFilter in query time analyzer which emits bigram TermQuery like Term(a b), does the updated SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), making my existing tokenization logic fail?

          Show
          MarcoWong Marco Wong added a comment - - edited Excuse me, for the synonym-expander.patch, when I have a ShingleFilter in query time analyzer which emits bigram TermQuery like Term(a b), does the updated SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), making my existing tokenization logic fail?
          Hide
          tiennm Tien Nguyen Manh added a comment -

          Yes, it will emit PhraseQuery(Term(a), Term(b)).
          We must additional check to only tokenize term when it is synonym.
          I will change the patch.

          Show
          tiennm Tien Nguyen Manh added a comment - Yes, it will emit PhraseQuery(Term(a), Term(b)). We must additional check to only tokenize term when it is synonym. I will change the patch.
          Hide
          tiennm Tien Nguyen Manh added a comment -

          Patch check synonym term

          Show
          tiennm Tien Nguyen Manh added a comment - Patch check synonym term
          Hide
          bsteele Bill Steele added a comment -

          We found this to be much more useful code for multiword synonyms. We ran some tests, and when having a synonym set such as:

          seabiscuit, sea biscuit, sea biscit

          Search on the following:

          seabiscuit article

          Returned matches with the following terms

          Sea biscit article
          Sea biscuit article
          Seabiscuit article
          Biscuit Sea article
          Sea article
          Biscit article

          With this patch, the above search query just returned the terms:

          Sea biscit article
          Sea biscuit article
          Seabiscuit article

          Show
          bsteele Bill Steele added a comment - We found this to be much more useful code for multiword synonyms. We ran some tests, and when having a synonym set such as: seabiscuit, sea biscuit, sea biscit Search on the following: seabiscuit article Returned matches with the following terms Sea biscit article Sea biscuit article Seabiscuit article Biscuit Sea article Sea article Biscit article With this patch, the above search query just returned the terms: Sea biscit article Sea biscuit article Seabiscuit article
          Hide
          markus17 Markus Jelsma added a comment -

          Bill - did you also test the other multiword-synonym patches?

          Show
          markus17 Markus Jelsma added a comment - Bill - did you also test the other multiword-synonym patches?
          Hide
          bsteele Bill Steele added a comment -

          Hi Markus,

          Which other multi-word synonym patches are you referring to? We tested the Solr functionality built into Solr 4.2.1 which didn't support multiple word synonyms, and as far as I know that issue still exists in the latest. This patch fixes that issue for us. Not aware of other alternatives available.

          Show
          bsteele Bill Steele added a comment - Hi Markus, Which other multi-word synonym patches are you referring to? We tested the Solr functionality built into Solr 4.2.1 which didn't support multiple word synonyms, and as far as I know that issue still exists in the latest. This patch fixes that issue for us. Not aware of other alternatives available.
          Hide
          markus17 Markus Jelsma added a comment -

          Oh i interpreted your comment as if you have tested it against the other patches linked to this one.

          Show
          markus17 Markus Jelsma added a comment - Oh i interpreted your comment as if you have tested it against the other patches linked to this one.
          Hide
          otis Otis Gospodnetic added a comment -

          Bill Steele - maybe you had your colleagues test other patches, like SOLR-4381?

          Show
          otis Otis Gospodnetic added a comment - Bill Steele - maybe you had your colleagues test other patches, like SOLR-4381 ?
          Hide
          janhoy Jan Høydahl added a comment -

          So what is the next step with this one? Anyone who have tested it and have comments?

          Show
          janhoy Jan Høydahl added a comment - So what is the next step with this one? Anyone who have tested it and have comments?
          Hide
          otis Otis Gospodnetic added a comment -

          Our customers have been using this in production for about half a year now without issues.

          Show
          otis Otis Gospodnetic added a comment - Our customers have been using this in production for about half a year now without issues.
          Hide
          markus17 Markus Jelsma added a comment -

          How does this patch handle boosts? Are the synonym and the original keywords boosted equally?

          Show
          markus17 Markus Jelsma added a comment - How does this patch handle boosts? Are the synonym and the original keywords boosted equally?
          Hide
          iorixxx Ahmet Arslan added a comment -

          Assume synonyms are

            usa, united states of america 

          What happens if I fire the following sloppy phrase query "president usa"~5

          Show
          iorixxx Ahmet Arslan added a comment - Assume synonyms are usa, united states of america What happens if I fire the following sloppy phrase query "president usa"~5
          Hide
          nolanlawson Nolan Lawson added a comment -

          Markus Jelsma: They're boosted equally. It was the subject of a bug.

          Ahmet Arslan: I just tested it out now. I got:

          (+(DisjunctionMaxQuery((text:"president usa"~5)) (((+DisjunctionMaxQuery((text:"president united states of america"~5)))/no_coord))))/no_coord // parsedQuery
          +((text:"president usa"~5) ((+(text:"president united states of america"~5)))) // parsedQuery.toString()
          
          Show
          nolanlawson Nolan Lawson added a comment - Markus Jelsma : They're boosted equally. It was the subject of a bug . Ahmet Arslan : I just tested it out now. I got: (+(DisjunctionMaxQuery((text: "president usa" ~5)) (((+DisjunctionMaxQuery((text: "president united states of america" ~5)))/no_coord))))/no_coord // parsedQuery +((text: "president usa" ~5) ((+(text: "president united states of america" ~5)))) // parsedQuery.toString()
          Hide
          markus17 Markus Jelsma added a comment -

          Nolan, Jan, both of you have extensive knowledge about the one you worked on hosted on github. How do you compare features? I've checked your issue list and there are no new issues coming in and a lot have been resolved already, looks like that one is much more mature and flexible/configurable.

          Show
          markus17 Markus Jelsma added a comment - Nolan, Jan, both of you have extensive knowledge about the one you worked on hosted on github. How do you compare features? I've checked your issue list and there are no new issues coming in and a lot have been resolved already, looks like that one is much more mature and flexible/configurable.
          Hide
          janhoy Jan Høydahl added a comment -

          What I like about Nolan's solution is that you control synonyms outside of analysis - for all fields,although there are pros and cons with this. Also that you can deboost the synonyms and easily turn them on/off as you like.

          What I like about Tien's patch is that it solves exactly the problem at hand without introducing need for completely new configurations or concepts, and that it can work with other Qparsers as well. In that respect it is perhaps better suited for early inclusion in Solr, then we can look at bringing in be best from 4381 later?

          Show
          janhoy Jan Høydahl added a comment - What I like about Nolan's solution is that you control synonyms outside of analysis - for all fields,although there are pros and cons with this. Also that you can deboost the synonyms and easily turn them on/off as you like. What I like about Tien's patch is that it solves exactly the problem at hand without introducing need for completely new configurations or concepts, and that it can work with other Qparsers as well. In that respect it is perhaps better suited for early inclusion in Solr, then we can look at bringing in be best from 4381 later?
          Hide
          otis Otis Gospodnetic added a comment -

          Jan: +1

          Show
          otis Otis Gospodnetic added a comment - Jan: +1
          Hide
          markus17 Markus Jelsma added a comment -

          Yes +1

          Show
          markus17 Markus Jelsma added a comment - Yes +1
          Hide
          nolanlawson Nolan Lawson added a comment -

          +1 as well. Tien's patch also seems to be a better candidate seeing as they included Java tests, whereas my tests are in Python 'cuz I was lazy.

          Show
          nolanlawson Nolan Lawson added a comment - +1 as well. Tien's patch also seems to be a better candidate seeing as they included Java tests, whereas my tests are in Python 'cuz I was lazy.
          Hide
          eric.bus Eric Bus added a comment -

          Has anyone modified this patch to work on 4.6.1? I tried to do a manual merge for the second patch. But a lot has changed in the SolrQueryParserBase.java file.

          Show
          eric.bus Eric Bus added a comment - Has anyone modified this patch to work on 4.6.1? I tried to do a manual merge for the second patch. But a lot has changed in the SolrQueryParserBase.java file.
          Hide
          markus17 Markus Jelsma added a comment -

          Nolan Lawson is the outcome you describe desired behaviour? I don't really believe it is. For synonyms [a b,x y] and q="a b" you get PhraseQuery(content:"x y a b"). While phrase "a b" and "x y" would ordinarily match some documents, "x y a b" will never match. Or is this supposed to expand syns at index time too?

          Show
          markus17 Markus Jelsma added a comment - Nolan Lawson is the outcome you describe desired behaviour? I don't really believe it is. For synonyms [a b,x y] and q="a b" you get PhraseQuery(content:"x y a b"). While phrase "a b" and "x y" would ordinarily match some documents, "x y a b" will never match. Or is this supposed to expand syns at index time too?
          Hide
          markus17 Markus Jelsma added a comment -

          Ok, it seems i had some bad jars laying around messsing things up if a specific token filter was in use. Anyway, this patch works fine from single word to multi word but not the other way around.

          I have a 4.5.0 check out here with just this patch. Using the example schema and data and the usual [seabiscuit,sea biscit,biscit] syns:
          http://localhost:8983/solr/select?defType=edismax&qf=name&rows=0&debugQuery=true&q=

          q=biscit => (+DisjunctionMaxQuery(((name:seabiscuit name:"sea biscit" name:biscit))))/no_coord
          q=seabiscuit => (+DisjunctionMaxQuery(((name:seabiscuit name:"sea biscit" name:biscit))))/no_coord
          q=sea biscit => (+(DisjunctionMaxQuery((name:sea)) DisjunctionMaxQuery(((name:seabiscuit name:"sea biscit" name:biscit)))))/no_coord
          

          This is all very nice but, if we change the syns from [seabiscuit,sea biscit,biscit] to [seabiscuit,sea biscit] it no longer works for

          q=sea biscit => (+(DisjunctionMaxQuery((name:sea)) DisjunctionMaxQuery((name:biscit))))/no_coord
          

          Tien Nguyen Manh So i assume this is clearly not the desired behaviour right?

          Show
          markus17 Markus Jelsma added a comment - Ok, it seems i had some bad jars laying around messsing things up if a specific token filter was in use. Anyway, this patch works fine from single word to multi word but not the other way around. I have a 4.5.0 check out here with just this patch. Using the example schema and data and the usual [seabiscuit,sea biscit,biscit] syns: http://localhost:8983/solr/select?defType=edismax&qf=name&rows=0&debugQuery=true&q= q=biscit => (+DisjunctionMaxQuery(((name:seabiscuit name: "sea biscit" name:biscit))))/no_coord q=seabiscuit => (+DisjunctionMaxQuery(((name:seabiscuit name: "sea biscit" name:biscit))))/no_coord q=sea biscit => (+(DisjunctionMaxQuery((name:sea)) DisjunctionMaxQuery(((name:seabiscuit name: "sea biscit" name:biscit)))))/no_coord This is all very nice but, if we change the syns from [seabiscuit,sea biscit,biscit] to [seabiscuit,sea biscit] it no longer works for q=sea biscit => (+(DisjunctionMaxQuery((name:sea)) DisjunctionMaxQuery((name:biscit))))/no_coord Tien Nguyen Manh So i assume this is clearly not the desired behaviour right?
          Hide
          markus17 Markus Jelsma added a comment -

          By the way: using the SynonymQuotedDismaxQParser doesn't change anything.

          Show
          markus17 Markus Jelsma added a comment - By the way: using the SynonymQuotedDismaxQParser doesn't change anything.
          Hide
          tiennm Tien Nguyen Manh added a comment - - edited

          Markus Jelsma It is not the desired behavious!.

          your result above in first example with sync [seabiscuit,sea biscit,biscit]
          q=sea biscit => (+(DisjunctionMaxQuery((name:sea)) DisjunctionMaxQuery(((name:seabiscuit name:"sea biscit" name:biscit)))))/no_coord

          seem the default behaviour (without the SynonymQuotedDismaxQParser).
          After using SynonymQuotedDismaxQParser, it should be the same result for all three queries q=biscit, q=seabiscuit, q=sea biscit

          Show
          tiennm Tien Nguyen Manh added a comment - - edited Markus Jelsma It is not the desired behavious!. your result above in first example with sync [seabiscuit,sea biscit,biscit] q=sea biscit => (+(DisjunctionMaxQuery((name:sea)) DisjunctionMaxQuery(((name:seabiscuit name:"sea biscit" name:biscit)))))/no_coord seem the default behaviour (without the SynonymQuotedDismaxQParser). After using SynonymQuotedDismaxQParser, it should be the same result for all three queries q=biscit, q=seabiscuit, q=sea biscit
          Hide
          thetaphi Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          thetaphi Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          rpialum Jeremy Anderson added a comment - - edited

          I'm in the process of trying to get this logic ported into the 4.8.1 Released Tag. I believe I've gotten the code ported over, but am having problems getting the unit test to run to confirm the correctness of the port. The main reason is the differences in the conf/solrconfig.xml and conf/schema.xml files that exist in the root and I'm guessing those used by Tien when the 4.5.0 patch was created.

          I'm still a SOLR novice so I'm not quite sure how to properly replicate the schema and configuration settings to get the unit test to run. I'm going to attach patch files shortly for the 4.8.1 code base along with the current stubbed out configuration files.

          Any help anyone can provide would be greatly appreciated. My end goal is to hopefully be able to get the multi-term synonym expansion logic to work with a 4.8.1 deployment where we're using an extended version of the SolrQueryParser. (I'm not sure if the multi-term synonym logic is only usable with this patch by the new SynonymQuotedDismaxQParser or existing DismaxQarsers).

          Notes on 4.8.1 port:

          • There is now 2 parsers usable by the FSTSynonymFilterFactory: SolrSynonymParser & WordnetSynonymParser. The latter of which I'm not sure if any additional logic needs to be implemented for proper usage of the tokenize parameter.
          • All of the logic implemented in SolrQueryParserBase from 4.5.0 has now been moved into the utility QueryBuilder class.
          Show
          rpialum Jeremy Anderson added a comment - - edited I'm in the process of trying to get this logic ported into the 4.8.1 Released Tag. I believe I've gotten the code ported over, but am having problems getting the unit test to run to confirm the correctness of the port. The main reason is the differences in the conf/solrconfig.xml and conf/schema.xml files that exist in the root and I'm guessing those used by Tien when the 4.5.0 patch was created. I'm still a SOLR novice so I'm not quite sure how to properly replicate the schema and configuration settings to get the unit test to run. I'm going to attach patch files shortly for the 4.8.1 code base along with the current stubbed out configuration files. Any help anyone can provide would be greatly appreciated. My end goal is to hopefully be able to get the multi-term synonym expansion logic to work with a 4.8.1 deployment where we're using an extended version of the SolrQueryParser. (I'm not sure if the multi-term synonym logic is only usable with this patch by the new SynonymQuotedDismaxQParser or existing DismaxQarsers). Notes on 4.8.1 port: There is now 2 parsers usable by the FSTSynonymFilterFactory: SolrSynonymParser & WordnetSynonymParser. The latter of which I'm not sure if any additional logic needs to be implemented for proper usage of the tokenize parameter. All of the logic implemented in SolrQueryParserBase from 4.5.0 has now been moved into the utility QueryBuilder class.
          Hide
          rpialum Jeremy Anderson added a comment -

          Initial files for 4.8.1 port. Unit test does not run, therefore the validity of the port is unknown.

          Show
          rpialum Jeremy Anderson added a comment - Initial files for 4.8.1 port. Unit test does not run, therefore the validity of the port is unknown.
          Hide
          janhoy Jan Høydahl added a comment -

          Interest in picking this up again? Jeremy Anderson, I have not looked at your patches yet, but am interested in facilitating / reviewing

          Show
          janhoy Jan Høydahl added a comment - Interest in picking this up again? Jeremy Anderson , I have not looked at your patches yet, but am interested in facilitating / reviewing
          Hide
          rpialum Jeremy Anderson added a comment -

          Unfortunately, I'm swamped with other stuff on my plate. Thinking back, I think I abandoned this approach and instead took Nolan Lawson's route (see SOLR-4381). I don't recall how mature and stable I had gotten my patches before switching paths.

          Show
          rpialum Jeremy Anderson added a comment - Unfortunately, I'm swamped with other stuff on my plate. Thinking back, I think I abandoned this approach and instead took Nolan Lawson's route (see SOLR-4381 ). I don't recall how mature and stable I had gotten my patches before switching paths.
          Hide
          otis Otis Gospodnetic added a comment -

          Is there any interest in committing this to 4.x or 5.x? We have a client at Sematext who needs query-time synonym support for their Solr 4.x setup. So we can make sure this patch works for 4.x. If any of the Solr developers wants to commit this to 5.x, please leave a comment here.

          Show
          otis Otis Gospodnetic added a comment - Is there any interest in committing this to 4.x or 5.x? We have a client at Sematext who needs query-time synonym support for their Solr 4.x setup. So we can make sure this patch works for 4.x. If any of the Solr developers wants to commit this to 5.x, please leave a comment here.
          Hide
          markus17 Markus Jelsma added a comment -

          I am sure there is, but there are no working patches for 4.10 or 5.x thus far.

          Show
          markus17 Markus Jelsma added a comment - I am sure there is, but there are no working patches for 4.10 or 5.x thus far.
          Hide
          otis Otis Gospodnetic added a comment -

          I am sure there is, but there are no working patches for 4.10 or 5.x thus far.

          Right. What I was trying to ask is whether any of the active Solr committers wants to commit this. If there is no will to commit, I'd rather keep things simple on our end ignore this issue. But there is a will to commit, I'd love to see this in Solr, as would 30+ other watchers, I imagine.

          Show
          otis Otis Gospodnetic added a comment - I am sure there is, but there are no working patches for 4.10 or 5.x thus far. Right. What I was trying to ask is whether any of the active Solr committers wants to commit this. If there is no will to commit, I'd rather keep things simple on our end ignore this issue. But there is a will to commit, I'd love to see this in Solr, as would 30+ other watchers, I imagine.
          Hide
          janhoy Jan Høydahl added a comment -

          +1 to get some of this in. I have the desire but not the cycles right now. Perhaps your happy customer could help drive this?

          Just looked briefly at the patches.. disclaimer: I did not apply and test this yet

          • I would expect a ton of new unit tests for synonym-expander.patch but cannot find?
          • Why create another subclass of ExtendedDismax for this? If going into core, fold the features into edismax? The patch will be smaller too.
          • I cannot see a test for configuring custom synonymAnalyzers. Also, it should refer to schema fieldTypes instead of adding to qparser config - in the same way e.g. Suggesters do

          Probably the work could be split up - first add more test coverage to the synonym-expander part and commit it. Then fold the quoting-stuff into standard edismax and commit (this part is less risky since it is back-compat if you don't use the new params).

          Is Tien Nguyen Manh still around? Other users ot the patch who are willing to step in and improve it?

          Show
          janhoy Jan Høydahl added a comment - +1 to get some of this in. I have the desire but not the cycles right now. Perhaps your happy customer could help drive this? Just looked briefly at the patches.. disclaimer: I did not apply and test this yet I would expect a ton of new unit tests for synonym-expander.patch but cannot find? Why create another subclass of ExtendedDismax for this? If going into core, fold the features into edismax? The patch will be smaller too. I cannot see a test for configuring custom synonymAnalyzers . Also, it should refer to schema fieldTypes instead of adding to qparser config - in the same way e.g. Suggesters do Probably the work could be split up - first add more test coverage to the synonym-expander part and commit it. Then fold the quoting-stuff into standard edismax and commit (this part is less risky since it is back-compat if you don't use the new params). Is Tien Nguyen Manh still around? Other users ot the patch who are willing to step in and improve it?
          Hide
          gro Rafał Kuć added a comment -

          I have the code updated to Solr 4.10.3 and I'm running tests now. I see a few issues with the code right now (i.e. some static, magic string objects, because some classes were moved outside of Lucene core). I'll attach the updated patch tomorrow, but I'm not sure if there will be another release from 4.x branch. So I guess the easiest way would be to get the code polished for 5.x branch and try committing there. What do you think?

          Show
          gro Rafał Kuć added a comment - I have the code updated to Solr 4.10.3 and I'm running tests now. I see a few issues with the code right now (i.e. some static, magic string objects, because some classes were moved outside of Lucene core). I'll attach the updated patch tomorrow, but I'm not sure if there will be another release from 4.x branch. So I guess the easiest way would be to get the code polished for 5.x branch and try committing there. What do you think?
          Hide
          gro Rafał Kuć added a comment -

          As promised I'm attaching an updated patch for Solr 4.10.3. I've updated the code, updated unit tests, included modified configuration files, etc. I'll start working on trunk version of the patch now starting with synonym expander and its unit tests.

          Show
          gro Rafał Kuć added a comment - As promised I'm attaching an updated patch for Solr 4.10.3. I've updated the code, updated unit tests, included modified configuration files, etc. I'll start working on trunk version of the patch now starting with synonym expander and its unit tests.
          Hide
          janhoy Jan Høydahl added a comment -

          Rafał Kuć Do you agree to fold this into edismax? I hate to fragment into another parser, which over time will diverge in features.

          Show
          janhoy Jan Høydahl added a comment - Rafał Kuć Do you agree to fold this into edismax? I hate to fragment into another parser, which over time will diverge in features.
          Hide
          gro Rafał Kuć added a comment -

          Sure, for now I don't see a reason why we shouldn't get that into edismax

          Show
          gro Rafał Kuć added a comment - Sure, for now I don't see a reason why we shouldn't get that into edismax
          Hide
          tgarafola Timothy Garafola added a comment -

          Is there a status on this issue? Did it get moved forward to 5.x? Is it available in 4.10?

          Show
          tgarafola Timothy Garafola added a comment - Is there a status on this issue? Did it get moved forward to 5.x? Is it available in 4.10?
          Hide
          otis Otis Gospodnetic added a comment -

          There is a patch for 4.10.3, but it was not committed, so this is still not available in Solr AFAIK. Would be great to get this into 5.x.

          Show
          otis Otis Gospodnetic added a comment - There is a patch for 4.10.3, but it was not committed, so this is still not available in Solr AFAIK. Would be great to get this into 5.x.
          Hide
          mjsminkey Mary Jo Sminkey added a comment -

          This still isn't available in Solr 5?? What can we do to get this made official??

          Show
          mjsminkey Mary Jo Sminkey added a comment - This still isn't available in Solr 5?? What can we do to get this made official??
          Hide
          mjsminkey Mary Jo Sminkey added a comment -

          Also sounds like the code to incorporate it into edismax was never done?

          Show
          mjsminkey Mary Jo Sminkey added a comment - Also sounds like the code to incorporate it into edismax was never done?
          Hide
          daitken Daniel Aitken added a comment -

          Doesn't look like it from the patch, no; it's still using the extended synonym quoted query parser.

          We have a client who requires matching on multi-word synonyms, so I've I've compiled Solr 4.10.3 with solr-5379-version-4.10.3.patch applied and have it up and running; just testing it now, with a kind of merged config between the unit test files provided in the patch and my regular Solr 4.x configuration.

          Behaviour on quoted multi-word synonyms appears to work as expected across the testing I performed; this works well and would be fantastic to have available. I'm not too too concerned about drifting from edismax; unless I'm misunderstanding, wouldn't this solution maintain features from edismax by virtue of it being extended from it? It would be nice, however, not to maintain a custom query parser.

          So, all in all, works well for us, but I am concerned about having to essentially maintain a fork of 4.10.3 just to support this one use case. Is there a possibility of this making it into a release? Is there anything else that needs to be done with it?

          Show
          daitken Daniel Aitken added a comment - Doesn't look like it from the patch, no; it's still using the extended synonym quoted query parser. We have a client who requires matching on multi-word synonyms, so I've I've compiled Solr 4.10.3 with solr-5379-version-4.10.3.patch applied and have it up and running; just testing it now, with a kind of merged config between the unit test files provided in the patch and my regular Solr 4.x configuration. Behaviour on quoted multi-word synonyms appears to work as expected across the testing I performed; this works well and would be fantastic to have available. I'm not too too concerned about drifting from edismax; unless I'm misunderstanding, wouldn't this solution maintain features from edismax by virtue of it being extended from it? It would be nice, however, not to maintain a custom query parser. So, all in all, works well for us, but I am concerned about having to essentially maintain a fork of 4.10.3 just to support this one use case. Is there a possibility of this making it into a release? Is there anything else that needs to be done with it?
          Hide
          atuljangra Atul Jangra added a comment -

          Any update on how soon this work will be included in Solr. This is very common occurrence for anyone working on search.
          I'm using Solr 5.5.0 and this is still a big problem. I even tried some external parser, but nothing seem to work.

          Show
          atuljangra Atul Jangra added a comment - Any update on how soon this work will be included in Solr. This is very common occurrence for anyone working on search. I'm using Solr 5.5.0 and this is still a big problem. I even tried some external parser, but nothing seem to work.
          Hide
          janhoy Jan Høydahl added a comment -

          Daniel Aitken and Atul Jangra and Mary Jo Sminkey, I'm sorry there were on replies to your questions about updating the patch. What it probably means is that noone has had the capacity or need to spend time on this. It will probably take some effort to lift the patch from 4.x to 6.x, and then get it ready for committing either as part of edismax or as a subclass.

          What can we do to get this made official??

          If you can contribute development work yourself (or your company) that would be the best. Else hire someone who can help you and/or just keep nagging here until it is done

          Show
          janhoy Jan Høydahl added a comment - Daniel Aitken and Atul Jangra and Mary Jo Sminkey , I'm sorry there were on replies to your questions about updating the patch. What it probably means is that noone has had the capacity or need to spend time on this. It will probably take some effort to lift the patch from 4.x to 6.x, and then get it ready for committing either as part of edismax or as a subclass. What can we do to get this made official?? If you can contribute development work yourself (or your company) that would be the best. Else hire someone who can help you and/or just keep nagging here until it is done
          Hide
          janhoy Jan Høydahl added a comment -

          Closing as duplicate since SOLR-9185 (sow) and SOLR-10343 (Graph) now provides the official solution to this issue. See https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/

          Show
          janhoy Jan Høydahl added a comment - Closing as duplicate since SOLR-9185 (sow) and SOLR-10343 (Graph) now provides the official solution to this issue. See https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/

            People

            • Assignee:
              Unassigned
              Reporter:
              tiennm Tien Nguyen Manh
            • Votes:
              20 Vote for this issue
              Watchers:
              51 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development