Lucene - Core
  1. Lucene - Core
  2. LUCENE-4043

Add scoring support for query time join

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: modules/join
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Have similar scoring for query time joining just like the index time block join (with the score mode).

      1. LUCENE-4043.patch
        19 kB
        Martijn van Groningen
      2. LUCENE-4043.patch
        31 kB
        Martijn van Groningen
      3. LUCENE-4043.patch
        55 kB
        Martijn van Groningen
      4. LUCENE-4043.patch
        56 kB
        Martijn van Groningen

        Issue Links

          Activity

          Hide
          Martijn van Groningen added a comment -

          Draft patch. Added ScoreMode as parameter to JoinUtil#createJoinQuery(...).

          Maybe ScoreMode should be a public enum inside the join package.

          Show
          Martijn van Groningen added a comment - Draft patch. Added ScoreMode as parameter to JoinUtil#createJoinQuery(...). Maybe ScoreMode should be a public enum inside the join package.
          Hide
          Martijn van Groningen added a comment -

          Updated patch.

          • Started adding randomizing score mode in TestJoinUtil test class.
          • Made ScoreMode a public enum in join package.
          Show
          Martijn van Groningen added a comment - Updated patch. Started adding randomizing score mode in TestJoinUtil test class. Made ScoreMode a public enum in join package.
          Hide
          Martijn van Groningen added a comment -

          Updated patch.

          • Fixed random tests.
          • Added support for explain.
          • Added ScoreMode support for documents that relate to more than one document.

          I think it is ready to be committed.

          Show
          Martijn van Groningen added a comment - Updated patch. Fixed random tests. Added support for explain. Added ScoreMode support for documents that relate to more than one document. I think it is ready to be committed.
          Hide
          Michael McCandless added a comment -

          Patch looks great!

          You don't need to use your own growFactor ... just call ArrayUtil.grow
          directly (it already oversizes under the hood for you).

          Maybe remove @throws IAE from createJoinQuery's javadocs? (But, still
          throw it... in case we add a new ScoreMode and forget to fix this
          code, in the future). Because today all ScoreMode enum values work...

          Fix omitted to emitted in the comment on top of "class MVInnerScorer".

          Probably javadocs should somewhere explain about the "first time doc
          is emitted it gets that score"?

          Maybe explain added RAM requirements when scores are tracked in the
          javadocs?

          Maybe rename TermsWithScoreCollector.MV.Avg.ordScores -> .scoreCounts
          (and .scores -> .scoreSums?).

          Can we put back the non-wildcard imports?

          Show
          Michael McCandless added a comment - Patch looks great! You don't need to use your own growFactor ... just call ArrayUtil.grow directly (it already oversizes under the hood for you). Maybe remove @throws IAE from createJoinQuery's javadocs? (But, still throw it... in case we add a new ScoreMode and forget to fix this code, in the future). Because today all ScoreMode enum values work... Fix omitted to emitted in the comment on top of "class MVInnerScorer". Probably javadocs should somewhere explain about the "first time doc is emitted it gets that score"? Maybe explain added RAM requirements when scores are tracked in the javadocs? Maybe rename TermsWithScoreCollector.MV.Avg.ordScores -> .scoreCounts (and .scores -> .scoreSums?). Can we put back the non-wildcard imports?
          Hide
          Martijn van Groningen added a comment -

          Thanks for reviewing Mike! I've updated the patch.

          You don't need to use your own growFactor ... just call ArrayUtil.grow directly (it already oversizes under the hood for you).

          Sure. (I didn't release that the ArrayUtil#oversize() was doing this)

          Maybe remove @throws IAE from createJoinQuery's javadocs? (But, still throw it... in case we add a new ScoreMode and forget to fix this code, in the future). Because today all ScoreMode enum values work...

          Makes sense. We design it not to throw the exception. If the exception is thrown, then it is a bug.

          Fix omitted to emitted in the comment on top of "class MVInnerScorer".

          Done.

          Probably javadocs should somewhere explain about the "first time doc is emitted it gets that score"?

          Done.

          Maybe explain added RAM requirements when scores are tracked in the javadocs?

          Done

          Maybe rename TermsWithScoreCollector.MV.Avg.ordScores -> .scoreCounts (and .scores -> .scoreSums?).

          Done.

          Can we put back the non-wildcard imports?

          Done. (IDE was trying to be smart... I'll change my settings...)

          Show
          Martijn van Groningen added a comment - Thanks for reviewing Mike! I've updated the patch. You don't need to use your own growFactor ... just call ArrayUtil.grow directly (it already oversizes under the hood for you). Sure. (I didn't release that the ArrayUtil#oversize() was doing this) Maybe remove @throws IAE from createJoinQuery's javadocs? (But, still throw it... in case we add a new ScoreMode and forget to fix this code, in the future). Because today all ScoreMode enum values work... Makes sense. We design it not to throw the exception. If the exception is thrown, then it is a bug. Fix omitted to emitted in the comment on top of "class MVInnerScorer". Done. Probably javadocs should somewhere explain about the "first time doc is emitted it gets that score"? Done. Maybe explain added RAM requirements when scores are tracked in the javadocs? Done Maybe rename TermsWithScoreCollector.MV.Avg.ordScores -> .scoreCounts (and .scores -> .scoreSums?). Done. Can we put back the non-wildcard imports? Done. (IDE was trying to be smart... I'll change my settings...)
          Hide
          Michael McCandless added a comment -

          I still see one omitted Otherwise this looks great: +1 to commit!

          Show
          Michael McCandless added a comment - I still see one omitted Otherwise this looks great: +1 to commit!
          Hide
          Martijn van Groningen added a comment -

          Oops... I see. I'll commit soon!

          Show
          Martijn van Groningen added a comment - Oops... I see. I'll commit soon!
          Hide
          Martijn van Groningen added a comment -

          Committed to trunk and branch4x.

          Show
          Martijn van Groningen added a comment - Committed to trunk and branch4x.
          Hide
          David vandendriessche added a comment - - edited

          Hi, Is this possible to use in solr I tried setting

          {!scoreMode=Avg}

          , but it doesn't seem to have any effect.

          Show
          David vandendriessche added a comment - - edited Hi, Is this possible to use in solr I tried setting {!scoreMode=Avg} , but it doesn't seem to have any effect.
          Hide
          Martijn van Groningen added a comment -

          Solr uses a different joining implementation. Which doesn't support mapping the scores from the `from` side to the `to` side. If you want to use the Lucene joining implementation you could wrap this in a Solr QParserPlugin extension.

          Show
          Martijn van Groningen added a comment - Solr uses a different joining implementation. Which doesn't support mapping the scores from the `from` side to the `to` side. If you want to use the Lucene joining implementation you could wrap this in a Solr QParserPlugin extension.
          Hide
          David vandendriessche added a comment - - edited

          Could it be that you're JoinUtil doesn't clear a cache or something?(Could also be my code) My plugin always works once when I start Solr. It joins and gives good scores. The second query it always returns the same as the first one? After restarting Solr it works 1 time and then does the same.

          public class TestParserPlugin extends ExtendedDismaxQParserPlugin {

          @Override
          public QParser createParser(String string, SolrParams sp, SolrParams sp1, SolrQueryRequest sqr)

          { return new TestParserPlugin.TestParser(string, sp1, sp1, sqr); }

          @Override
          public void init(NamedList nl) {
          }

          public class TestParser extends QParser {

          public TestParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req)

          { super(qstr, localParams, params, req); }

          @Override
          public org.apache.lucene.search.Query parse() throws org.apache.lucene.queryparser.classic.ParseException {
          //IndexReader reader;
          try

          { //IndexSearcher searcher=req.getSearcher(); IndexSearcher searcher = req.getSearcher(); Query q = QParser.getParser(qstr, "edismax", req).getQuery(); return JoinUtil.createJoinQuery("pageId", true, "fileId", q, searcher, ScoreMode.Max); }

          catch (IOException ex)

          { Logger.getLogger(TestParserPlugin.class.getName()).log(Level.SEVERE, null, ex); }

          return null;
          }

          @Override
          protected void finalize() throws Throwable

          { super.finalize(); req.close(); }

          }

          }

          Show
          David vandendriessche added a comment - - edited Could it be that you're JoinUtil doesn't clear a cache or something?(Could also be my code) My plugin always works once when I start Solr. It joins and gives good scores. The second query it always returns the same as the first one? After restarting Solr it works 1 time and then does the same. public class TestParserPlugin extends ExtendedDismaxQParserPlugin { @Override public QParser createParser(String string, SolrParams sp, SolrParams sp1, SolrQueryRequest sqr) { return new TestParserPlugin.TestParser(string, sp1, sp1, sqr); } @Override public void init(NamedList nl) { } public class TestParser extends QParser { public TestParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { super(qstr, localParams, params, req); } @Override public org.apache.lucene.search.Query parse() throws org.apache.lucene.queryparser.classic.ParseException { //IndexReader reader; try { //IndexSearcher searcher=req.getSearcher(); IndexSearcher searcher = req.getSearcher(); Query q = QParser.getParser(qstr, "edismax", req).getQuery(); return JoinUtil.createJoinQuery("pageId", true, "fileId", q, searcher, ScoreMode.Max); } catch (IOException ex) { Logger.getLogger(TestParserPlugin.class.getName()).log(Level.SEVERE, null, ex); } return null; } @Override protected void finalize() throws Throwable { super.finalize(); req.close(); } } }
          Hide
          Martijn van Groningen added a comment -

          I think this happens b/c the Query that the JoinUtil returns doesn't override the equals and hashcode method (See TermsIncludingScoreQuery). This should be fixed, otherwise this query can never be cached (this is what the SolrIndexSearcher does). Can you check if the following works:

          IndexSearcher searcher = new IndexSearcher(req.getSearcher().getIndexReader());
          return JoinUtil.createJoinQuery("pageId", true, "fileId", q, searcher, ScoreMode.Max); 
          
          Show
          Martijn van Groningen added a comment - I think this happens b/c the Query that the JoinUtil returns doesn't override the equals and hashcode method (See TermsIncludingScoreQuery). This should be fixed, otherwise this query can never be cached (this is what the SolrIndexSearcher does). Can you check if the following works: IndexSearcher searcher = new IndexSearcher(req.getSearcher().getIndexReader()); return JoinUtil.createJoinQuery( "pageId" , true , "fileId" , q, searcher, ScoreMode.Max);
          Hide
          David vandendriessche added a comment - - edited

          Same problem if I use:
          IndexSearcher searcher = new IndexSearcher(req.getSearcher().getIndexReader());

          I can't override the class since the constructor is private. This is probably to only use the static methods.

          I've used an output to see the hashcode and it stays the same. Can I change this behaviour somehow?

          Show
          David vandendriessche added a comment - - edited Same problem if I use: IndexSearcher searcher = new IndexSearcher(req.getSearcher().getIndexReader()); I can't override the class since the constructor is private. This is probably to only use the static methods. I've used an output to see the hashcode and it stays the same. Can I change this behaviour somehow?
          Hide
          David vandendriessche added a comment -

          I got suggested to extend the Query class and return a Hash myself.

          My query class contains:

          @Override
          public int hashCode()

          { return q.toString().hashCode(); }

          It seems to work now.

          Show
          David vandendriessche added a comment - I got suggested to extend the Query class and return a Hash myself. My query class contains: @Override public int hashCode() { return q.toString().hashCode(); } It seems to work now.
          Hide
          Martijn van Groningen added a comment -

          Hi David, I just committed LUCENE-4704. Query instances returned from JoinUtil will implement equals and hashcode in future versions.

          Show
          Martijn van Groningen added a comment - Hi David, I just committed LUCENE-4704 . Query instances returned from JoinUtil will implement equals and hashcode in future versions.
          Hide
          David vandendriessche added a comment -

          Thanks alot!

          I'll try the patch when 4.2 get's released.

          Show
          David vandendriessche added a comment - Thanks alot! I'll try the patch when 4.2 get's released.

            People

            • Assignee:
              Unassigned
              Reporter:
              Martijn van Groningen
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development