Solr
  1. Solr
  2. SOLR-7655

Perf bug- DefaultSolrHighlighter.getSpanQueryScorer triggers MultiFields.getMergedFieldInfos

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0
    • Fix Version/s: 5.2.1
    • Component/s: highlighter
    • Labels:
      None

      Description

      It appears grabbing the FieldInfos from the SlowCompositeReaderWrapper is slow. It isn't cached. The DefaultSolrHighligher in SOLR-6196 (v5.0) uses it to ascertain if there are payloads. Instead it can grab it from the Terms instance, which is cached.

      1. SOLR-7655.patch
        2 kB
        David Smiley

        Activity

        Hide
        David Smiley added a comment -

        Suggested fix:

            try {
              scorer.setUsePayloads(request.getParams().getFieldBool(fieldName, HighlightParams.PAYLOADS,
                  request.getSearcher().getLeafReader().fields().terms(fieldName).hasPayloads()));
              // It'd be nice to know if payloads are on the tokenStream but the presence of the attribute isn't a good
              // indicator.
            } catch (IOException e) {
              throw new RuntimeException(e);
            }
        

        I'm going to try this now with Solr's tests, then post a patch.

        Show
        David Smiley added a comment - Suggested fix: try { scorer.setUsePayloads(request.getParams().getFieldBool(fieldName, HighlightParams.PAYLOADS, request.getSearcher().getLeafReader().fields().terms(fieldName).hasPayloads())); // It'd be nice to know if payloads are on the tokenStream but the presence of the attribute isn't a good // indicator. } catch (IOException e) { throw new RuntimeException(e); } I'm going to try this now with Solr's tests, then post a patch.
        Hide
        David Smiley added a comment -

        This was discovered via a commenter here: https://issues.apache.org/jira/browse/SOLR-5855?focusedCommentId=14578437&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14578437 (the purpose of that issue is unrelated to the discovery in the stack traces posted)

        Show
        David Smiley added a comment - This was discovered via a commenter here: https://issues.apache.org/jira/browse/SOLR-5855?focusedCommentId=14578437&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14578437 (the purpose of that issue is unrelated to the discovery in the stack traces posted)
        Hide
        David Smiley added a comment -

        Here's a patch; a little better than the "suggested fix": the Terms returned could be null, and if an exception somehow gets thrown then we can log it without re-throwing.

        I did a little performance testing on a project I have. It seems this perf bug is most pronounced if you attempt to highlight on a ton of fields (e.g. via hl.fl=*), and if there are a lot of Lucene segments. And furthermore if you don't have a lot of text to highlight per field then the overhead here is proportionally higher to the overall task.

        Precommit is happy and the tests pass. It'd be nice to get this into 5.2.1 but would like to see a +1 from someone. What do you think Shalin Shekhar Mangar (you're the RM I believe).

        As a side note... I'm wondering if SlowCompositeReaderWrapper ought to cache FieldInfos too; maybe lazyily.

        Show
        David Smiley added a comment - Here's a patch; a little better than the "suggested fix": the Terms returned could be null, and if an exception somehow gets thrown then we can log it without re-throwing. I did a little performance testing on a project I have. It seems this perf bug is most pronounced if you attempt to highlight on a ton of fields (e.g. via hl.fl=* ), and if there are a lot of Lucene segments. And furthermore if you don't have a lot of text to highlight per field then the overhead here is proportionally higher to the overall task. Precommit is happy and the tests pass. It'd be nice to get this into 5.2.1 but would like to see a +1 from someone. What do you think Shalin Shekhar Mangar (you're the RM I believe). As a side note... I'm wondering if SlowCompositeReaderWrapper ought to cache FieldInfos too; maybe lazyily.
        Hide
        Shalin Shekhar Mangar added a comment -

        Looks good to me. I don't know if you're online but if not, I'll commit your patch in a couple of hours and create the RC.

        Show
        Shalin Shekhar Mangar added a comment - Looks good to me. I don't know if you're online but if not, I'll commit your patch in a couple of hours and create the RC.
        Hide
        Ere Maijala added a comment -

        Took me a while to get a build environment up, but results look very promising. My very unscientific benchmark shows that the highlighter may now be even a bit faster than in Solr 4.10.2 at least in some cases. Thanks for the great work, David Smiley!

        Show
        Ere Maijala added a comment - Took me a while to get a build environment up, but results look very promising. My very unscientific benchmark shows that the highlighter may now be even a bit faster than in Solr 4.10.2 at least in some cases. Thanks for the great work, David Smiley !
        Hide
        ASF subversion and git services added a comment -

        Commit 1684665 from David Smiley in branch 'dev/trunk'
        [ https://svn.apache.org/r1684665 ]

        SOLR-7655: Speed up DefaultSolrHighlighter's check for the existence of payloads

        Show
        ASF subversion and git services added a comment - Commit 1684665 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1684665 ] SOLR-7655 : Speed up DefaultSolrHighlighter's check for the existence of payloads
        Hide
        ASF subversion and git services added a comment -

        Commit 1684667 from David Smiley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1684667 ]

        SOLR-7655: Speed up DefaultSolrHighlighter's check for the existence of payloads.

        Show
        ASF subversion and git services added a comment - Commit 1684667 from David Smiley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1684667 ] SOLR-7655 : Speed up DefaultSolrHighlighter's check for the existence of payloads.
        Hide
        ASF subversion and git services added a comment -

        Commit 1684668 from David Smiley in branch 'dev/branches/lucene_solr_5_2'
        [ https://svn.apache.org/r1684668 ]

        SOLR-7655: Speed up DefaultSolrHighlighter's check for the existence of payloads.

        Show
        ASF subversion and git services added a comment - Commit 1684668 from David Smiley in branch 'dev/branches/lucene_solr_5_2' [ https://svn.apache.org/r1684668 ] SOLR-7655 : Speed up DefaultSolrHighlighter's check for the existence of payloads.
        Hide
        David Smiley added a comment -

        Shalin Shekhar Mangar I didn't see you online and I saw it wasn't committed yet so I did so now.

        Show
        David Smiley added a comment - Shalin Shekhar Mangar I didn't see you online and I saw it wasn't committed yet so I did so now.

          People

          • Assignee:
            David Smiley
            Reporter:
            David Smiley
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development