Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9592

decorateDocValues cause serious performance issue because of using slowCompositeReaderWrapper

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.0, 6.1, 6.2
    • Fix Version/s: 6.3, master (7.0)
    • Component/s: Response Writers, search
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:

      Description

      I have serious performance issue using AtomicUpdate (and RealtimeGet) with non stored docValues.
      Because decorateDocValues try to merge each leafLeader on the fly via slowCompositeReaderWrapper and it’s extremely slow (> 10sec).
      Simply access docValues via nonCompositeReader could resolve this issue.(patch)

      AtomicUpdate performance(or RealtimeGet performance)

      • Environment
        • solr version : 6.0.0
        • schema ~ 100 fields(90% docValues, some of those are multi valued)
        • index : 5,000,000
      • Performance
        • original : > 10sec per query
        • patched : at least 100msec per query

      This patch will also enhance search performance, because DocStreamer also fetch docValues via decorateDocValues.
      Though it depends on each environment, I could take 20% search performance gain.

      (This patch originally written for solr 6.0.0, and now rewritten for master)

      1. SOLR-9592_6x.patch
        34 kB
        Takahiro Ishikawa
      2. SOLR-9592.patch
        35 kB
        Takahiro Ishikawa
      3. SOLR-9592.patch
        3 kB
        Takahiro Ishikawa

        Activity

        Hide
        takaishi Takahiro Ishikawa added a comment -

        attached patch

        Show
        takaishi Takahiro Ishikawa added a comment - attached patch
        Hide
        varunthacker Varun Thacker added a comment -

        Thanks for the patch Takahiro!

        Additionally should we rename SolrIndexSearcher#getLeafReader to SolrIndexSearcher#getSlowAtomicReader to make it more accurate? Also we could add Javadocs encouraging people to use leafContexts ?

        Show
        varunthacker Varun Thacker added a comment - Thanks for the patch Takahiro! Additionally should we rename SolrIndexSearcher#getLeafReader to SolrIndexSearcher#getSlowAtomicReader to make it more accurate? Also we could add Javadocs encouraging people to use leafContexts ?
        Hide
        takaishi Takahiro Ishikawa added a comment -

        Thanks for comments, Varun!

        Yes! Renaming to getSlowAtomicReader is great idea. I'll add a patch soon.

        Show
        takaishi Takahiro Ishikawa added a comment - Thanks for comments, Varun! Yes! Renaming to getSlowAtomicReader is great idea. I'll add a patch soon.
        Hide
        takaishi Takahiro Ishikawa added a comment -

        updated patch for master(solr-7.0)
        And I'm working on a patch for branch_6x which needs some modifications.

        Show
        takaishi Takahiro Ishikawa added a comment - updated patch for master(solr-7.0) And I'm working on a patch for branch_6x which needs some modifications.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment - - edited

        Nice catch! I knew there had to be reasons why /select was so much slower than /xport for docValues!

        The method name change might help a little, but the real issue is knowing how to use things like MultiDocValues (i.e. you generally want to use them for the ord mapping, but not the other stuff!)

        We should really cache the MultiDocValues created as well... but that can be a different JIRA.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - - edited Nice catch! I knew there had to be reasons why /select was so much slower than /xport for docValues! The method name change might help a little, but the real issue is knowing how to use things like MultiDocValues (i.e. you generally want to use them for the ord mapping, but not the other stuff!) We should really cache the MultiDocValues created as well... but that can be a different JIRA.
        Hide
        takaishi Takahiro Ishikawa added a comment -

        updated patch for branch_6x

        'ant test' passed without doTestIndexAndConfigAliasReplication().
        And the test failure is reported SOLR-9426, so this is other problem.

        Show
        takaishi Takahiro Ishikawa added a comment - updated patch for branch_6x 'ant test' passed without doTestIndexAndConfigAliasReplication(). And the test failure is reported SOLR-9426 , so this is other problem.
        Hide
        takaishi Takahiro Ishikawa added a comment -

        Thanks for comments, Yonik!

        I'm glad to resolve your concern!

        The method name change might help a little

        Are you disagree with renaming(-1) or weakly agree with(+0)?
        I agree with renaming(Varun suggested). Because, at first glance, getLeafReader does not imply internally calling SlowCompositeLeafReader and this make people find performance bottlenecks difficult.

        but the real issue is knowing how to use things like MultiDocValues (i.e. you generally want to use them for the ord mapping, but not the other stuff!)
        We should really cache the MultiDocValues created as well... but that can be a different JIRA.

        I may not catch the meaning here. If my understanding is clear, we should use MultiDocValues in cases where you essencially need global view and decorateDocValues usage is not the case right?
        How to handle things like MultiDocValues in those cases(caching) is interesting problem and might be other JIRA.

        Show
        takaishi Takahiro Ishikawa added a comment - Thanks for comments, Yonik! I'm glad to resolve your concern! The method name change might help a little Are you disagree with renaming(-1) or weakly agree with(+0)? I agree with renaming(Varun suggested). Because, at first glance, getLeafReader does not imply internally calling SlowCompositeLeafReader and this make people find performance bottlenecks difficult. but the real issue is knowing how to use things like MultiDocValues (i.e. you generally want to use them for the ord mapping, but not the other stuff!) We should really cache the MultiDocValues created as well... but that can be a different JIRA. I may not catch the meaning here. If my understanding is clear, we should use MultiDocValues in cases where you essencially need global view and decorateDocValues usage is not the case right? How to handle things like MultiDocValues in those cases(caching) is interesting problem and might be other JIRA.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Are you disagree with renaming(-1) or weakly agree with(+0)?

        I supposed weakly agree

        Because, at first glance, getLeafReader

        Yeah, I really don't care for that name.

        If my understanding is clear, we should use MultiDocValues in cases where you essencially need global view and decorateDocValues usage is not the case right?

        Right. Sometimes you need both a global view and a segment view to do it right. See something like FacetFieldProcessorByArrayDV, where we use both top level and segment level.

        decorateDocValues would seem to only need segment level access.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Are you disagree with renaming(-1) or weakly agree with(+0)? I supposed weakly agree Because, at first glance, getLeafReader Yeah, I really don't care for that name. If my understanding is clear, we should use MultiDocValues in cases where you essencially need global view and decorateDocValues usage is not the case right? Right. Sometimes you need both a global view and a segment view to do it right. See something like FacetFieldProcessorByArrayDV, where we use both top level and segment level. decorateDocValues would seem to only need segment level access.
        Hide
        takaishi Takahiro Ishikawa added a comment -

        I supposed weakly agree

        Thank you. I'll keep it(getLeafReader) renamed

        Right. Sometimes you need both a global view and a segment view to do it right. See something like FacetFieldProcessorByArrayDV, where we use both top level and segment level.

        Yes, I got what you are saying. If I have a time, I'll go into detail for MultiDocValues problem.

        Now all my work is done. Is there any other comments?

        Show
        takaishi Takahiro Ishikawa added a comment - I supposed weakly agree Thank you. I'll keep it(getLeafReader) renamed Right. Sometimes you need both a global view and a segment view to do it right. See something like FacetFieldProcessorByArrayDV, where we use both top level and segment level. Yes, I got what you are saying. If I have a time, I'll go into detail for MultiDocValues problem. Now all my work is done. Is there any other comments?
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit cae6b49a065bfbc5789d8bdcf3706c158c0fc10d in lucene-solr's branch refs/heads/master from Yonik Seeley
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cae6b49 ]

        SOLR-9592: use correct leaf reader rather than top-level reader in SolrIndexReaderm.decorateDocValues

        Show
        jira-bot ASF subversion and git services added a comment - Commit cae6b49a065bfbc5789d8bdcf3706c158c0fc10d in lucene-solr's branch refs/heads/master from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cae6b49 ] SOLR-9592 : use correct leaf reader rather than top-level reader in SolrIndexReaderm.decorateDocValues
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 59d83f57e12f72ff8db503a0c01b77cb5df354fd in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=59d83f5 ]

        SOLR-9592: use correct leaf reader rather than top-level reader in SolrIndexReaderm.decorateDocValues

        Show
        jira-bot ASF subversion and git services added a comment - Commit 59d83f57e12f72ff8db503a0c01b77cb5df354fd in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=59d83f5 ] SOLR-9592 : use correct leaf reader rather than top-level reader in SolrIndexReaderm.decorateDocValues
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Committed. Thanks!

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Committed. Thanks!
        Hide
        takaishi Takahiro Ishikawa added a comment -

        Thanks for commit! Yonik Seeley!

        Show
        takaishi Takahiro Ishikawa added a comment - Thanks for commit! Yonik Seeley !
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Closing after 6.3.0 release.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Closing after 6.3.0 release.

          People

          • Assignee:
            yseeley@gmail.com Yonik Seeley
            Reporter:
            takaishi Takahiro Ishikawa
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development