Solr
  1. Solr
  2. SOLR-7730

speed-up faceting on doc values fields

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.2.1
    • Fix Version/s: 5.4
    • Component/s: faceting
    • Labels:

      Description

      every time we count facets on DocValues fields in Solr on many segments index we see the unnecessary hotspot:

      ....
              at org.apache.lucene.index.MultiFields.getMergedFieldInfos(MultiFields.java:248)
              at org.apache.lucene.index.SlowCompositeReaderWrapper.getFieldInfos(SlowCompositeReaderWrapper.java:239)
              at org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedSetDocValues(SlowCompositeReaderWrapper.java:176)
              at org.apache.solr.request.DocValuesFacets.getCounts(DocValuesFacets.java:72)
              at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:460) ....
      

      the reason is SlowCompositeReaderWrapper.getSortedSetDocValues() Line 136 and SlowCompositeReaderWrapper.getSortedDocValues() Line 174

      before return composite doc values, SCWR merges segment field infos, which is expensive, but after fieldinfo is merged, it checks only docvalue type in it. This dv type check can be done much easier in per segment basis.

      This patch gets some performance gain for those who count DV facets in Solr.

      1. LUCENE-7730.patch
        3 kB
        Yuriy
      2. LUCENE-7730.patch
        2 kB
        Mikhail Khludnev
      3. SOLR-7730.patch
        7 kB
        Mikhail Khludnev
      4. SOLR-7730-changes.patch
        1.0 kB
        Mikhail Khludnev

        Issue Links

          Activity

          Hide
          Yuriy added a comment -

          It is possible that first segment has no docs with field "A" but second segment has such docs. Invocation of reader.getFieldInfos().fieldInfo(field) on first segment will return null and call of getDocValuesType() will cause npe.

          Show
          Yuriy added a comment - It is possible that first segment has no docs with field "A" but second segment has such docs. Invocation of reader.getFieldInfos().fieldInfo(field) on first segment will return null and call of getDocValuesType() will cause npe.
          Hide
          Mikhail Khludnev added a comment -

          TODO I'm going to provide a test case.

          Show
          Mikhail Khludnev added a comment - TODO I'm going to provide a test case.
          Hide
          Mikhail Khludnev added a comment -

          Added test ensures no NPE on segments w/o docValues.
          I'm wondered by the silence. Isn't there anyone who need faster DV facets in Solr?

          Show
          Mikhail Khludnev added a comment - Added test ensures no NPE on segments w/o docValues. I'm wondered by the silence. Isn't there anyone who need faster DV facets in Solr?
          Hide
          Yonik Seeley added a comment -

          I'm certainly for the issue in general - I just haven't had a chance to actually evaluate the patch. I did glance at it previously and realized I would need to look at it longer in the context of the actual file.

          Show
          Yonik Seeley added a comment - I'm certainly for the issue in general - I just haven't had a chance to actually evaluate the patch. I did glance at it previously and realized I would need to look at it longer in the context of the actual file.
          Hide
          Mikhail Khludnev added a comment - - edited

          this allows to look in the context.
          It would be great if you can check the gain it with your recent benchmark on 10 or more segments index.

          Show
          Mikhail Khludnev added a comment - - edited this allows to look in the context. It would be great if you can check the gain it with your recent benchmark on 10 or more segments index.
          Hide
          Yonik Seeley added a comment -

          Finding time to look at this now...

          Show
          Yonik Seeley added a comment - Finding time to look at this now...
          Hide
          ASF subversion and git services added a comment -

          Commit 1694267 from Yonik Seeley in branch 'dev/trunk'
          [ https://svn.apache.org/r1694267 ]

          SOLR-7730: SlowCompositeReaderWrapper.getSortedSetDocValues - don't merge FieldInfos just to check DocValueType

          Show
          ASF subversion and git services added a comment - Commit 1694267 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1694267 ] SOLR-7730 : SlowCompositeReaderWrapper.getSortedSetDocValues - don't merge FieldInfos just to check DocValueType
          Hide
          ASF subversion and git services added a comment -

          Commit 1694269 from Yonik Seeley in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1694269 ]

          SOLR-7730: SlowCompositeReaderWrapper.getSortedSetDocValues - don't merge FieldInfos just to check DocValueType

          Show
          ASF subversion and git services added a comment - Commit 1694269 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694269 ] SOLR-7730 : SlowCompositeReaderWrapper.getSortedSetDocValues - don't merge FieldInfos just to check DocValueType
          Hide
          Mikhail Khludnev added a comment -

          Yonik Seeley how much gain did you evidence?

          Show
          Mikhail Khludnev added a comment - Yonik Seeley how much gain did you evidence?
          Hide
          ASF subversion and git services added a comment -

          Commit 1694563 from mkhl@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1694563 ]

          SOLR-7730: mention initial contributors

          Show
          ASF subversion and git services added a comment - Commit 1694563 from mkhl@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1694563 ] SOLR-7730 : mention initial contributors
          Hide
          Yonik Seeley added a comment -

          how much gain did you evidence?

          I actually didn't benchmark it because after I looked at the code in context, it became obvious (and I didn't have a docvalues index lying around...)

          Remember to backport your last commit to 5x! In case you're not familiar with the standard way, to merge back a commit from trunk to 5x, it's

          #execute in the root of an up-to-date 5x checkout:
          svn merge -c 1694563 https://svn.apache.org/repos/asf/lucene/dev/trunk
          

          I actually have a shell variable set up to make it shorter:
          T=https://svn.apache.org/repos/asf/lucene/dev/trunk

          Show
          Yonik Seeley added a comment - how much gain did you evidence? I actually didn't benchmark it because after I looked at the code in context, it became obvious (and I didn't have a docvalues index lying around...) Remember to backport your last commit to 5x! In case you're not familiar with the standard way, to merge back a commit from trunk to 5x, it's #execute in the root of an up-to-date 5x checkout: svn merge -c 1694563 https: //svn.apache.org/repos/asf/lucene/dev/trunk I actually have a shell variable set up to make it shorter: T= https://svn.apache.org/repos/asf/lucene/dev/trunk
          Hide
          Mikhail Khludnev added a comment -

          user's feedback:

          I added your patch to 4.10.4, recompiled, tested and pushed from testing stage
          to online system. What a difference!

          Right after restart the performance increase for faceting is times 10.
          Qtime for MatchAllDocsQuery(:) and docValues and faceting went down from
          around 35 seconds to 3.5 seconds for faceting.
          After 1 hour under load the qtime is somewhere around 15 percent performance
          increase for faceting.

          This patch is a must have!

          Show
          Mikhail Khludnev added a comment - user's feedback: I added your patch to 4.10.4, recompiled, tested and pushed from testing stage to online system. What a difference! Right after restart the performance increase for faceting is times 10. Qtime for MatchAllDocsQuery( : ) and docValues and faceting went down from around 35 seconds to 3.5 seconds for faceting. After 1 hour under load the qtime is somewhere around 15 percent performance increase for faceting. This patch is a must have!
          Hide
          ASF subversion and git services added a comment -

          Commit 1694665 from mkhl@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1694665 ]

          SOLR-7730: mention initial contributors

          Show
          ASF subversion and git services added a comment - Commit 1694665 from mkhl@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694665 ] SOLR-7730 : mention initial contributors
          Hide
          Mikhail Khludnev added a comment -

          I omitted backporting deliberately - guys can wait till 6.0. Just backpoted it because you put exclamation in the end of the sentence. note: ^/lucene/dev/trunk is a nice shortcut too.

          Show
          Mikhail Khludnev added a comment - I omitted backporting deliberately - guys can wait till 6.0. Just backpoted it because you put exclamation in the end of the sentence. note: ^/lucene/dev/trunk is a nice shortcut too.
          Hide
          Shalin Shekhar Mangar added a comment -

          Bulk close for 5.3.0 release

          Show
          Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release
          Hide
          Mikhail Khludnev added a comment -

          SlowCompositeReaderWrapper }} in {{lucene-core-5.3.0.jar still has slow implementation

          public SortedDocValues getSortedDocValues(java.lang.String)
          ...
               105: aload_0
               106: invokevirtual #39                 // Method getFieldInfos:()Lorg/apache/lucene/index/FieldInfos;
               109: aload_1
               110: invokevirtual #40                 // Method org/apache/lucene/index/FieldInfos.fieldInfo:(Ljava/lang/String;)Lorg/apache/lucene/index/FieldInfo;
               113: invokevirtual #41                 // Method org/apache/lucene/index/FieldInfo.getDocValuesType:()Lorg/apache/lucene/index/DocValuesType;
               116: getstatic     #42                 // Field org/apache/lucene/index/DocValuesType.SORTED:Lorg/apache/lucene/index/DocValuesType;
               119: if_acmpeq     124
               122: aconst_null
               123: areturn
          ...
          

          I missed something.

          Show
          Mikhail Khludnev added a comment - SlowCompositeReaderWrapper }} in {{lucene-core-5.3.0.jar still has slow implementation public SortedDocValues getSortedDocValues(java.lang. String ) ... 105: aload_0 106: invokevirtual #39 // Method getFieldInfos:()Lorg/apache/lucene/index/FieldInfos; 109: aload_1 110: invokevirtual #40 // Method org/apache/lucene/index/FieldInfos.fieldInfo:(Ljava/lang/ String ;)Lorg/apache/lucene/index/FieldInfo; 113: invokevirtual #41 // Method org/apache/lucene/index/FieldInfo.getDocValuesType:()Lorg/apache/lucene/index/DocValuesType; 116: getstatic #42 // Field org/apache/lucene/index/DocValuesType.SORTED:Lorg/apache/lucene/index/DocValuesType; 119: if_acmpeq 124 122: aconst_null 123: areturn ... I missed something.
          Hide
          Mikhail Khludnev added a comment -

          Got it. 5.3 branch was cut before commit. see
          https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_5_3/lucene/core/src/java/org/apache/lucene/index/SlowCompositeReaderWrapper.java?view=log
          Thus, this optimization will be available only at 5.4. I'm sorry for confusion.

          Show
          Mikhail Khludnev added a comment - Got it. 5.3 branch was cut before commit. see https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_5_3/lucene/core/src/java/org/apache/lucene/index/SlowCompositeReaderWrapper.java?view=log Thus, this optimization will be available only at 5.4. I'm sorry for confusion.
          Hide
          Steve Rowe added a comment - - edited

          Mikhail Khludnev, the changelog entry for this issue went into lucene/CHANGES.txt (despite the fact that it's a SOLR issue), but didn't make the 5.4 release. However, it's still listed under the 5.3.0 section, despite the fact that it was never part of any 5.3.X release (see https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_5_3/lucene/CHANGES.txt and https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_5_4/lucene/CHANGES.txt)

          The entry should be moved to the 5.4 section.

          Thanks to randomstatistic on #solr IRC for mentioning this.

          Show
          Steve Rowe added a comment - - edited Mikhail Khludnev , the changelog entry for this issue went into lucene/CHANGES.txt (despite the fact that it's a SOLR issue), but didn't make the 5.4 release. However, it's still listed under the 5.3.0 section, despite the fact that it was never part of any 5.3.X release (see https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_5_3/lucene/CHANGES.txt and https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_5_4/lucene/CHANGES.txt ) The entry should be moved to the 5.4 section. Thanks to randomstatistic on #solr IRC for mentioning this.
          Hide
          Mikhail Khludnev added a comment -

          attaching SOLR-7730-changes.patch move it from 5.3 to 5.4 Optimizations.
          Steve Rowe Should I commit it to trunk and 5x?

          Show
          Mikhail Khludnev added a comment - attaching SOLR-7730-changes.patch move it from 5.3 to 5.4 Optimizations. Steve Rowe Should I commit it to trunk and 5x?
          Hide
          Steve Rowe added a comment -

          attaching SOLR-7730-changes.patch move it from 5.3 to 5.4 Optimizations. Steve Rowe Should I commit it to trunk and 5x?

          +1, LGTM.

          In addtion to trunk and 5x, I think you should also commit it to the lucene_solr_5_4 branch, in case there is a 5.4.1 release.

          Show
          Steve Rowe added a comment - attaching SOLR-7730 -changes.patch move it from 5.3 to 5.4 Optimizations. Steve Rowe Should I commit it to trunk and 5x? +1, LGTM. In addtion to trunk and 5x, I think you should also commit it to the lucene_solr_5_4 branch, in case there is a 5.4.1 release.
          Hide
          ASF subversion and git services added a comment -

          Commit 1720239 from mkhl@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1720239 ]

          SOLR-7730: mention in 5.4.0's Optimizations

          Show
          ASF subversion and git services added a comment - Commit 1720239 from mkhl@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1720239 ] SOLR-7730 : mention in 5.4.0's Optimizations
          Hide
          ASF subversion and git services added a comment -

          Commit 1720241 from mkhl@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1720241 ]

          SOLR-7730: mention in 5.4.0's Optimizations

          Show
          ASF subversion and git services added a comment - Commit 1720241 from mkhl@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1720241 ] SOLR-7730 : mention in 5.4.0's Optimizations
          Hide
          ASF subversion and git services added a comment -

          Commit 1720248 from mkhl@apache.org in branch 'dev/branches/lucene_solr_5_4'
          [ https://svn.apache.org/r1720248 ]

          SOLR-7730: mention in 5.4.0's Optimizations

          Show
          ASF subversion and git services added a comment - Commit 1720248 from mkhl@apache.org in branch 'dev/branches/lucene_solr_5_4' [ https://svn.apache.org/r1720248 ] SOLR-7730 : mention in 5.4.0's Optimizations

            People

            • Assignee:
              Mikhail Khludnev
              Reporter:
              Mikhail Khludnev
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development