Solr
  1. Solr
  2. SOLR-2632

Highlighting does not work for embedded boost query that boosts a dismax query

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4.1, 3.2, 3.3
    • Fix Version/s: None
    • Component/s: highlighter
    • Environment:

      Linux.
      Reproduced in different machines with different Linux distributions and different JDK's.
      Solr 3.3 and Lucidworks for solr 1.4.1 and 3.2.

      Description

      I need to issue a dismax query, with date boost (I'd like to use the multiplicative boost provided by boost queries) and also filtering for other fields with too many possible distinct values to fit in a filter query. To achieve this, I use the boost query as a nested query using the pseudofield query. I also need highlighting for the fields used in the dismax query, but highlighting does not work. If I just use the boosted dismax query without embedding it inside another query, it works correctly. If I use bf instead of a boost query, and embed directly the dismax query, it works too, but hl.fl needs to be specified.

      It's a bit complicated to explain, so, I'll give examples using the example data that comes with solr (the problem is reproducible in the example solr distribution, not only in my concrete project).

      http://localhost:8983/solr/select?q=%2binStock:true%20%2b_query_:%22

      {!boost%20b=$dateboost%20v=$qq%20defType=dismax}%22&qq=test&qf=name&dateboost=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name
      For this query, highlighting does not work. Specifying hl.fl or not, does not influence the result. The result is:
      <lst name="highlighting">
      <lst name="GB18030TEST"/>
      <lst name="UTF8TEST"/>
      </lst>

      http://localhost:8983/solr/select?q=_query_:%22{!boost%20b=$dateboost%20v=$qq%20defType=dismax}

      %22&qq=test&qf=name&dateboost=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name

      This doesn't work either. Same result.

      http://localhost:8983/solr/select?q=

      {!boost b=$dateboost v=$qq defType=dismax}

      &qq=test&qf=name&dateboost=recip(ms(NOW,last_modified),3.16e-11,1,1)&hl=true

      In this case, hightlighting works correctly:

      <lst name="highlighting">
      <lst name="GB18030TEST">
      <arr name="name">
      <str><em>Test</em> with some GB18030 encoded characters</str>
      </arr>
      </lst>
      <lst name="UTF8TEST">
      <arr name="name">
      <str><em>Test</em> with some UTF-8 encoded characters</str>
      </arr>
      </lst>
      </lst>

      http://localhost:8983/solr/select?q=%2BinStock:true%20%2B_query_:%22

      {!dismax%20v=$qq}

      %22&qq=test&qf=name&bf=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name

      This also works. Same result as before. But in this case hl.fl is needed. Without it, highlighting does not work, either.

      Thanks.

        Issue Links

          Activity

          Hide
          Juan Antonio Farré Basurte added a comment -

          I've done some more testing. For the following query, using edismax, highlighting also fails in the same way:

          http://localhost:8983/solr/select?q=%2binStock:true%20%2b_query_:%22

          {!edismax%20boost=$dateboost%20v=$qq}%22&qq=test&qf=name&q.alt=:&tie=0.1&dateboost=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name

          While it works well for the following two queries:

          http://localhost:8983/solr/select?q=%2binStock:true%20%2b_query_:%22{!edismax%20v=$qq}%22&qq=test&qf=name&q.alt=*:*&tie=0.1&dateboost=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name
          http://localhost:8983/solr/select?q={!edismax%20boost=$dateboost%20v=$qq}

          %22&qq=test&qf=name&q.alt=:&tie=0.1&dateboost=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name

          Also tested in solr 3.3 with the same results.

          Show
          Juan Antonio Farré Basurte added a comment - I've done some more testing. For the following query, using edismax, highlighting also fails in the same way: http://localhost:8983/solr/select?q=%2binStock:true%20%2b_query_:%22 {!edismax%20boost=$dateboost%20v=$qq}%22&qq=test&qf=name&q.alt= : &tie=0.1&dateboost=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name While it works well for the following two queries: http://localhost:8983/solr/select?q=%2binStock:true%20%2b_query_:%22{!edismax%20v=$qq}%22&qq=test&qf=name&q.alt=*:*&tie=0.1&dateboost=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name http://localhost:8983/solr/select?q= {!edismax%20boost=$dateboost%20v=$qq} %22&qq=test&qf=name&q.alt= : &tie=0.1&dateboost=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name Also tested in solr 3.3 with the same results.
          Hide
          Koji Sekiguchi added a comment -

          http://localhost:8983/solr/select?q=%2binStock:true%20%2b_query_:%22

          Unknown macro: {!boost%20b=$dateboost%20v=$qq%20defType=dismax}

          %22&qq=test&qf=name&dateboost=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name
          For this query, highlighting does not work. Specifying hl.fl or not, does not influence the result. The result is:
          <lst name="highlighting">
          <lst name="GB18030TEST"/>
          <lst name="UTF8TEST"/>
          </lst>

          This request creates a BooleanQuery that is composed of TermQuery("inStock", true) and BoostedQuery. Lucene's Highlighter knows TermQuery but doesn't know how to deal with Solr's BoostedQuery. The BoostedQuery should include TermQuery("name","test") that you want to hihglight, but Lucene doesn't care BoostedQuery, so Highlighter ignores entire BoostedQuery.

          Show
          Koji Sekiguchi added a comment - http://localhost:8983/solr/select?q=%2binStock:true%20%2b_query_:%22 Unknown macro: {!boost%20b=$dateboost%20v=$qq%20defType=dismax} %22&qq=test&qf=name&dateboost=recip%28ms%28NOW,last_modified%29,3.16e-11,1,1%29&hl=true&hl.fl=name For this query, highlighting does not work. Specifying hl.fl or not, does not influence the result. The result is: <lst name="highlighting"> <lst name="GB18030TEST"/> <lst name="UTF8TEST"/> </lst> This request creates a BooleanQuery that is composed of TermQuery("inStock", true) and BoostedQuery. Lucene's Highlighter knows TermQuery but doesn't know how to deal with Solr's BoostedQuery. The BoostedQuery should include TermQuery("name","test") that you want to hihglight, but Lucene doesn't care BoostedQuery, so Highlighter ignores entire BoostedQuery.
          Hide
          Juan Antonio Farré Basurte added a comment -

          Sounds logical, but... if highlighter doesn't know how to deal with BoostedQuery, then why does it work when I issue the boosted query alone, without embedding it in the boolean query?
          May be I'm wrong, but it looks to me more like a problem of embedding the boosted query into the boolean query than a problem with boosted query itself. In fact, as you can see in my examples, if I directly embed the dismax query (without boost query) in the boolean query, it works, but it requires specifying hl.fl, when I believe it should just use the qf.
          My feeling is that the highlighter has problems dealing with embedded queries. The problems go worse if you embed boosted queries.

          Show
          Juan Antonio Farré Basurte added a comment - Sounds logical, but... if highlighter doesn't know how to deal with BoostedQuery, then why does it work when I issue the boosted query alone, without embedding it in the boolean query? May be I'm wrong, but it looks to me more like a problem of embedding the boosted query into the boolean query than a problem with boosted query itself. In fact, as you can see in my examples, if I directly embed the dismax query (without boost query) in the boolean query, it works, but it requires specifying hl.fl, when I believe it should just use the qf. My feeling is that the highlighter has problems dealing with embedded queries. The problems go worse if you embed boosted queries.
          Hide
          Koji Sekiguchi added a comment -

          Ok, I checked HighlightComponent and QParser impls. HighlightComponent calls QParser.getHighlightQuery() to get Query to be highlighted. When you set +inStock:true +query:"

          {!boost b=$dateboost v=$qq defType=dismax}" to q parameter, QParser is LuceneQParser, and its getHighlightQuery() is:
          public Query getHighlightQuery() throws ParseException {
            return getQuery();
          }
          


          and it returns BooleanQuery. If you set {!boost b=$dateboost v=$qq defType=dismax}

          to q parameter, QParser's getHighlightQuery() is:

          public Query getHighlightQuery() throws ParseException {
            return baseParser.getHighlightQuery();
          }
          

          and where baseParser is DisMaxQParser then it returns DisjunctionMaxQuery.

          Show
          Koji Sekiguchi added a comment - Ok, I checked HighlightComponent and QParser impls. HighlightComponent calls QParser.getHighlightQuery() to get Query to be highlighted. When you set +inStock:true + query :" {!boost b=$dateboost v=$qq defType=dismax}" to q parameter, QParser is LuceneQParser, and its getHighlightQuery() is: public Query getHighlightQuery() throws ParseException { return getQuery(); } and it returns BooleanQuery. If you set {!boost b=$dateboost v=$qq defType=dismax} to q parameter, QParser's getHighlightQuery() is: public Query getHighlightQuery() throws ParseException { return baseParser.getHighlightQuery(); } and where baseParser is DisMaxQParser then it returns DisjunctionMaxQuery.
          Hide
          Juan Antonio Farré Basurte added a comment -

          Ok, now I get much better what is going on, thanks for the explanation.
          What I'm not sure is about the conclusion. Is this a bug that should be corrected?
          Sounds like that to me, and in fact I don't see any workaround, except using bf instead of boost, but then you get an additive boost instead of a multiplicative one.

          Show
          Juan Antonio Farré Basurte added a comment - Ok, now I get much better what is going on, thanks for the explanation. What I'm not sure is about the conclusion. Is this a bug that should be corrected? Sounds like that to me, and in fact I don't see any workaround, except using bf instead of boost, but then you get an additive boost instead of a multiplicative one.
          Hide
          Koji Sekiguchi added a comment -

          What I'm not sure is about the conclusion. Is this a bug that should be corrected?

          I'm not sure. If getHighlightQuery() is for providing basic query so that Lucene's highlighter can understand what kind of query it is, it looks bug to me.

          BTW, how do you think the idea of SOLR-1926. If it can be used, does it solve your problem?

          Show
          Koji Sekiguchi added a comment - What I'm not sure is about the conclusion. Is this a bug that should be corrected? I'm not sure. If getHighlightQuery() is for providing basic query so that Lucene's highlighter can understand what kind of query it is, it looks bug to me. BTW, how do you think the idea of SOLR-1926 . If it can be used, does it solve your problem?
          Hide
          Juan Antonio Farré Basurte added a comment -

          Interesting idea.
          For my concrete problem, it would probably provide a workaround, yes.
          The comment by Hoss Man sounds also quite reasonable. I can't think of a situation where having hl.q provides a clear advantage over the hl.text suggested by Hoss Man, though may be I just haven't come up with the use case.

          Show
          Juan Antonio Farré Basurte added a comment - Interesting idea. For my concrete problem, it would probably provide a workaround, yes. The comment by Hoss Man sounds also quite reasonable. I can't think of a situation where having hl.q provides a clear advantage over the hl.text suggested by Hoss Man, though may be I just haven't come up with the use case.
          Hide
          Juan Antonio Farré Basurte added a comment -

          I chose minor for priority because of the "workaround" of using bf instead of boost.
          But that's not a real workaround, as you don't get the multiplicative boost you need.
          The more I test it, the more I realize how far using bf is from what I need for a suitable date boosting.
          So, I change priority to major.

          Show
          Juan Antonio Farré Basurte added a comment - I chose minor for priority because of the "workaround" of using bf instead of boost. But that's not a real workaround, as you don't get the multiplicative boost you need. The more I test it, the more I realize how far using bf is from what I need for a suitable date boosting. So, I change priority to major.
          Hide
          Mark Miller added a comment -

          This will vary depending on what highlighter impl you choose.

          For the span highlighter, it doesn't yet know about BoostedQuery - it would need to identify it and pull the query out of it with getQuery. However, this would add a dependency on modules/queries that I don't think exists yet - that is the only hitch.

          Show
          Mark Miller added a comment - This will vary depending on what highlighter impl you choose. For the span highlighter, it doesn't yet know about BoostedQuery - it would need to identify it and pull the query out of it with getQuery. However, this would add a dependency on modules/queries that I don't think exists yet - that is the only hitch.
          Hide
          Juan Antonio Farré Basurte added a comment -

          That is good to know. So, what currently stable highlighter implementation/s would work correctly in the indicated use cases?

          Show
          Juan Antonio Farré Basurte added a comment - That is good to know. So, what currently stable highlighter implementation/s would work correctly in the indicated use cases?
          Hide
          lukes shaw added a comment - - edited

          Hi everyone, recently i was trying to have the boost in the query and highlighting on in parallel. But if have the boost, highlighting doesn't works, but the moment i remove the boost highlighting start working again.

          Below is the request i am sending.

          http://localhost:8983/solr/collection1/select?q=%2B_query_%3A%22

          {!type%3Dedismax+qf%3D%27body^1.0+title^10.0%27+pf%3D%27body^2%27+ps%3D36+pf2%3D%27body^2%27+pf3%3D%27body^2%27+v%3D%27apple%27+mm%3D100}%22&group=true&group.field=content_group_id_k&group.ngroups=true&group.limit=3&fl=id%2Clanguage_k%2Clast_modified_date_dt%2Ctitle&rows=20&hl.snippets=1&hl.fragsize=200&hl.fl=body&hl.fl=title&hl=true&hl.q=%2B_query_%3A%22{!type%3Dedismax+qf%3D%27body^1.0+title^10.0%27+pf%3D%27body^2%27+ps%3D36+pf2%3D%27body^2%27+pf3%3D%27body^2%27+v%3D%27apple%27+mm%3D100}

          %22&debugQuery=true&wt=json&indent=true&hl.snippets=1&hl.fragsize=200&hl.fl=bosy&hl.fl=title&hl=true&boost=boost_weight

          OR

          http://localhost:8983/solr/collection1/select?q=%2B_query_%3A%22

          {!type%3Dedismax+qf%3D%27body^1.0+title^10.0%27+pf%3D%27body^2%27+ps%3D36+pf2%3D%27body^2%27+pf3%3D%27body^2%27+v%3D%27apple%27+mm%3D100}

          %22&group=true&group.field=content_group_id_k&group.ngroups=true&group.limit=3&fl=id%2Clanguage_k%2Clast_modified_date_dt%2Ctitle&rows=20&hl.snippets=1&hl.fragsize=200&hl.fl=body&hl.fl=title&hl=true&debugQuery=true&wt=json&indent=true&hl.snippets=1&hl.fragsize=200&hl.fl=bosy&hl.fl=title&hl=true&boost=boost_weight

          But if i do above two without the boost or use bf(additive) instead of boost(multiplicative), things works but i don't get the boost(multiplicative).

          I am using SOLR4.1.0

          Any help in this is really appreciated.

          Regards,
          Lokesh

          Show
          lukes shaw added a comment - - edited Hi everyone, recently i was trying to have the boost in the query and highlighting on in parallel. But if have the boost, highlighting doesn't works, but the moment i remove the boost highlighting start working again. Below is the request i am sending. http://localhost:8983/solr/collection1/select?q=%2B_query_%3A%22 {!type%3Dedismax+qf%3D%27body^1.0+title^10.0%27+pf%3D%27body^2%27+ps%3D36+pf2%3D%27body^2%27+pf3%3D%27body^2%27+v%3D%27apple%27+mm%3D100}%22&group=true&group.field=content_group_id_k&group.ngroups=true&group.limit=3&fl=id%2Clanguage_k%2Clast_modified_date_dt%2Ctitle&rows=20&hl.snippets=1&hl.fragsize=200&hl.fl=body&hl.fl=title&hl=true&hl.q=%2B_query_%3A%22{!type%3Dedismax+qf%3D%27body^1.0+title^10.0%27+pf%3D%27body^2%27+ps%3D36+pf2%3D%27body^2%27+pf3%3D%27body^2%27+v%3D%27apple%27+mm%3D100} %22&debugQuery=true&wt=json&indent=true&hl.snippets=1&hl.fragsize=200&hl.fl=bosy&hl.fl=title&hl=true&boost=boost_weight OR http://localhost:8983/solr/collection1/select?q=%2B_query_%3A%22 {!type%3Dedismax+qf%3D%27body^1.0+title^10.0%27+pf%3D%27body^2%27+ps%3D36+pf2%3D%27body^2%27+pf3%3D%27body^2%27+v%3D%27apple%27+mm%3D100} %22&group=true&group.field=content_group_id_k&group.ngroups=true&group.limit=3&fl=id%2Clanguage_k%2Clast_modified_date_dt%2Ctitle&rows=20&hl.snippets=1&hl.fragsize=200&hl.fl=body&hl.fl=title&hl=true&debugQuery=true&wt=json&indent=true&hl.snippets=1&hl.fragsize=200&hl.fl=bosy&hl.fl=title&hl=true&boost=boost_weight But if i do above two without the boost or use bf(additive) instead of boost(multiplicative), things works but i don't get the boost(multiplicative). I am using SOLR4.1.0 Any help in this is really appreciated. Regards, Lokesh

            People

            • Assignee:
              Unassigned
              Reporter:
              Juan Antonio Farré Basurte
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development