Lucene - Core
  1. Lucene - Core
  2. LUCENE-4598

Change PayloadIterator to not use top-level reader API

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Currently the facet module uses MultiFields.* to pull the D&PEnum in PayloadIterator, to access the payloads that store the facet ords.

      It then makes heavy use of .advance and .getPayload to visit all docIDs in the result set.

      I think we should get some speedup if we go segment by segment instead ...

      1. LUCENE-4598.patch
        7 kB
        Michael McCandless
      2. LUCENE-4598.patch
        11 kB
        Shai Erera
      3. LUCENE-4598.patch
        7 kB
        Shai Erera
      4. LUCENE-4598.patch
        6 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Is this really an issue? I thought that pulling a top-level D&PEnum on a single posting won't gain much by iterating per segment? The per-segment iteration is more complicated (code-wise) ...

        Anyway, I think that either in this issue or another one, we should explore a Collector that aggregates the ordinals during collection, and not after collection has finished. That will save substantial RAM consumed by the bitset (not critical) and potential float[] (for scores), which is the size of the result set.

        Show
        Shai Erera added a comment - Is this really an issue? I thought that pulling a top-level D&PEnum on a single posting won't gain much by iterating per segment? The per-segment iteration is more complicated (code-wise) ... Anyway, I think that either in this issue or another one, we should explore a Collector that aggregates the ordinals during collection, and not after collection has finished. That will save substantial RAM consumed by the bitset (not critical) and potential float[] (for scores), which is the size of the result set.
        Hide
        Michael McCandless added a comment -

        Is this really an issue? I thought that pulling a top-level D&PEnum on a single posting won't gain much by iterating per segment? The per-segment iteration is more complicated (code-wise) ...

        I think it should help somewhat ... I don't expect the gains to be earth shattering though

        Anyway, I think that either in this issue or another one, we should explore a Collector that aggregates the ordinals during collection, and not after collection has finished. That will save substantial RAM consumed by the bitset (not critical) and potential float[] (for scores), which is the size of the result set.

        +1, I'll open a new issue for that.

        Show
        Michael McCandless added a comment - Is this really an issue? I thought that pulling a top-level D&PEnum on a single posting won't gain much by iterating per segment? The per-segment iteration is more complicated (code-wise) ... I think it should help somewhat ... I don't expect the gains to be earth shattering though Anyway, I think that either in this issue or another one, we should explore a Collector that aggregates the ordinals during collection, and not after collection has finished. That will save substantial RAM consumed by the bitset (not critical) and potential float[] (for scores), which is the size of the result set. +1, I'll open a new issue for that.
        Hide
        Michael McCandless added a comment -

        OK I opened LUCENE-4600 for single-pass collection + aggregation.

        Show
        Michael McCandless added a comment - OK I opened LUCENE-4600 for single-pass collection + aggregation.
        Hide
        Shai Erera added a comment -

        That will save substantial RAM consumed by the bitset (not critical)

        "not critical" can become very critical for very large indexes. I am now talking to a team who index 1B documents with Lucene (very small documents). A bitset that size will consume 128MB RAM, that's substantial !

        Thanks for opening the issue. Though, if we move to only during-collection aggregation, I don't think that this issue will still be valid. Only if we keep ScoredDocIdsCollector, and perhaps the two-pass is efficient in some cases.

        Show
        Shai Erera added a comment - That will save substantial RAM consumed by the bitset (not critical) "not critical" can become very critical for very large indexes. I am now talking to a team who index 1B documents with Lucene (very small documents). A bitset that size will consume 128MB RAM, that's substantial ! Thanks for opening the issue. Though, if we move to only during-collection aggregation, I don't think that this issue will still be valid. Only if we keep ScoredDocIdsCollector, and perhaps the two-pass is efficient in some cases.
        Hide
        Michael McCandless added a comment -

        Though, if we move to only during-collection aggregation, I don't think that this issue will still be valid.

        Ahh good point. So we should do the other issue first

        Show
        Michael McCandless added a comment - Though, if we move to only during-collection aggregation, I don't think that this issue will still be valid. Ahh good point. So we should do the other issue first
        Hide
        Shai Erera added a comment -

        not necessarily . These are not either-or IMO. We've spent a lot of time addressing different faceting scenarios in this module. There are plenty of extension points ranging from simple configuration to writing your own facets collection algorithm.

        This issue is quite simple - you propose to fix a specific class to not use MultiFields. Let's do it, even before LUCENE-4600. Since it's a low hanging fruit, and I'm not sure that we'll never need to do a post-collection aggregation, so this class will probably be needed still.

        Show
        Shai Erera added a comment - not necessarily . These are not either-or IMO. We've spent a lot of time addressing different faceting scenarios in this module. There are plenty of extension points ranging from simple configuration to writing your own facets collection algorithm. This issue is quite simple - you propose to fix a specific class to not use MultiFields. Let's do it, even before LUCENE-4600 . Since it's a low hanging fruit, and I'm not sure that we'll never need to do a post-collection aggregation, so this class will probably be needed still.
        Hide
        Michael McCandless added a comment -

        Since it's a low hanging fruit,

        Wait, before you said:

        The per-segment iteration is more complicated (code-wise) ...

        which sort of spooked me out

        If it really is easy to prototype we should test and see if perf gains are worth it or not ...

        Show
        Michael McCandless added a comment - Since it's a low hanging fruit, Wait, before you said: The per-segment iteration is more complicated (code-wise) ... which sort of spooked me out If it really is easy to prototype we should test and see if perf gains are worth it or not ...
        Hide
        Shai Erera added a comment -

        I said that because I was bitten by it (when implemented ParallelTaxonomyArrays.initParents). The scenario is that you have a FixedBitSet over r.maxDoc(). You then get an iterator from it, by global docIDs. So it's very easy to pull a top-level IR and skip by these global docIDs.

        If instead you need to pull it per-segment, you need to do some math in the code, handle the case where a segment is entirely irrelevant ... stuff like that. Basically, I would imagine that we'll duplicate the MultiFields version for D&PEnum?

        When I thought about making this class per-segment, I was thinking about LUCENE-4600 – i.e. if we do in-collection aggregation, that class only needs to iterate within a single segment, on local docIDs. No complications.

        Show
        Shai Erera added a comment - I said that because I was bitten by it (when implemented ParallelTaxonomyArrays.initParents). The scenario is that you have a FixedBitSet over r.maxDoc(). You then get an iterator from it, by global docIDs. So it's very easy to pull a top-level IR and skip by these global docIDs. If instead you need to pull it per-segment, you need to do some math in the code, handle the case where a segment is entirely irrelevant ... stuff like that. Basically, I would imagine that we'll duplicate the MultiFields version for D&PEnum? When I thought about making this class per-segment, I was thinking about LUCENE-4600 – i.e. if we do in-collection aggregation, that class only needs to iterate within a single segment, on local docIDs. No complications.
        Hide
        Shai Erera added a comment -

        Hmm ... if you pass liveDocs, then I think this becomes expensive, because of how MultiBits is implemented (doing binary search on every lookup). So two options:

        1. Pass liveDocs=null, this class is used for matching documents, and should traverse whatever the query matched. If it matched deleted docs, we should traverse them too?
          • Maybe even now it's a bug that it always passes liveDocs?
        2. Modify the class impl to work on the leaves(). But then, we'll just duplicate the code of MultiDocsAndPositionsEnum?

        Mike, if we pass liveDocs=null, how much difference would it make if the class iterated on the leaves(), vs. iterating through MultiDocsAndPositionsEnum?

        Show
        Shai Erera added a comment - Hmm ... if you pass liveDocs, then I think this becomes expensive, because of how MultiBits is implemented (doing binary search on every lookup). So two options: Pass liveDocs=null, this class is used for matching documents, and should traverse whatever the query matched. If it matched deleted docs, we should traverse them too? Maybe even now it's a bug that it always passes liveDocs? Modify the class impl to work on the leaves(). But then, we'll just duplicate the code of MultiDocsAndPositionsEnum? Mike, if we pass liveDocs=null, how much difference would it make if the class iterated on the leaves(), vs. iterating through MultiDocsAndPositionsEnum?
        Hide
        Shai Erera added a comment -

        Patch sets liveDocs=null with a comment and proper javadocs.

        While not related to this issue, I noticed that PayloadIterator copies the payload BytesRef to its own buffer. This, I think, is a remnant from the days where TP.getPayload took a byte[]. This is a redundant copy which must be eliminated.

        Mike, I wonder if that'd speed things up (only a bit, I know)?

        Anyway, I'm not at all sure that it's worth duplicating MultiDAPE's logic into this class. But at any rate, I think that this patch needs to be committed, to remove the redundant byte[] copies.

        Also, I noticed that CategoryListIterator (and PayloadIterator) define an init() method which must be called prior to using them, and there's even a jdoc comment saying that calling it twice may skip over documents ... I don't think that we need it? Can't CLI impls initialize at ctor? At least, PayloadIterator can. I'll open a separate issue for that.

        Show
        Shai Erera added a comment - Patch sets liveDocs=null with a comment and proper javadocs. While not related to this issue, I noticed that PayloadIterator copies the payload BytesRef to its own buffer. This, I think, is a remnant from the days where TP.getPayload took a byte[]. This is a redundant copy which must be eliminated. Mike, I wonder if that'd speed things up (only a bit, I know)? Anyway, I'm not at all sure that it's worth duplicating MultiDAPE's logic into this class. But at any rate, I think that this patch needs to be committed, to remove the redundant byte[] copies. Also, I noticed that CategoryListIterator (and PayloadIterator) define an init() method which must be called prior to using them, and there's even a jdoc comment saying that calling it twice may skip over documents ... I don't think that we need it? Can't CLI impls initialize at ctor? At least, PayloadIterator can. I'll open a separate issue for that.
        Hide
        Shai Erera added a comment -

        I noticed that CategoryListIterator (and PayloadIterator) define an init() method which must be called prior to using them

        Maybe it's not useless after all. That way, impls can do some checks at init(), and not repeat them in skipTo. So for now I think that I'll just leave it.

        Show
        Shai Erera added a comment - I noticed that CategoryListIterator (and PayloadIterator) define an init() method which must be called prior to using them Maybe it's not useless after all. That way, impls can do some checks at init(), and not repeat them in skipTo. So for now I think that I'll just leave it.
        Hide
        Michael McCandless added a comment -

        +1 to pass liveDocs=null.

        I'm seeing some test failures (org.apache.lucene.facet.enhancements.EnhancementsPayloadIteratorTest) with this patch ...

        I ran quick perf test on 10M doc index, with 5% deleted docs:

                            Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                         LowTerm       28.51      (1.3%)       29.07      (0.8%)    2.0% (   0% -    4%)
                        HighTerm        2.46      (0.8%)        2.52      (0.6%)    2.6% (   1% -    3%)
                         MedTerm       13.17      (1.2%)       13.62      (0.8%)    3.5% (   1% -    5%)
        

        Nice little performance improvement ...

        Show
        Michael McCandless added a comment - +1 to pass liveDocs=null. I'm seeing some test failures (org.apache.lucene.facet.enhancements.EnhancementsPayloadIteratorTest) with this patch ... I ran quick perf test on 10M doc index, with 5% deleted docs: Task QPS base StdDev QPS comp StdDev Pct diff LowTerm 28.51 (1.3%) 29.07 (0.8%) 2.0% ( 0% - 4%) HighTerm 2.46 (0.8%) 2.52 (0.6%) 2.6% ( 1% - 3%) MedTerm 13.17 (1.2%) 13.62 (0.8%) 3.5% ( 1% - 5%) Nice little performance improvement ...
        Hide
        Shai Erera added a comment -

        I'm seeing some test failures (org.apache.lucene.facet.enhancements.EnhancementsPayloadIteratorTest) with this patch ...

        Thanks ! I didn't yet run all the tests (I thought that I had), but anyway this problem wasn't always reproducible. Easy fix, need to start from BytesRef.offset, where before it was just 0, because the array was copied.

        I think that I'll commit that, and separately try to avoid MultiFields for this class.

        Tests pass, but I'll run few more times before I commit.

        Show
        Shai Erera added a comment - I'm seeing some test failures (org.apache.lucene.facet.enhancements.EnhancementsPayloadIteratorTest) with this patch ... Thanks ! I didn't yet run all the tests (I thought that I had), but anyway this problem wasn't always reproducible. Easy fix, need to start from BytesRef.offset, where before it was just 0, because the array was copied. I think that I'll commit that, and separately try to avoid MultiFields for this class. Tests pass, but I'll run few more times before I commit.
        Hide
        Shai Erera added a comment -

        I've decided to not do two commits. So attached patch covers the previous one + moves to using per-segment posting iteration.

        I'd appreciate if someone can review the changes, especially PayloadIterator.nextSegment and .setdoc().

        I ran tests w/ many iters, all seem ok so far.

        Show
        Shai Erera added a comment - I've decided to not do two commits. So attached patch covers the previous one + moves to using per-segment posting iteration. I'd appreciate if someone can review the changes, especially PayloadIterator.nextSegment and .setdoc(). I ran tests w/ many iters, all seem ok so far.
        Hide
        Adrien Grand added a comment -

        Nice that PayloadIterator now returns a direct reference to payloads instead of copying data!

        Show
        Adrien Grand added a comment - Nice that PayloadIterator now returns a direct reference to payloads instead of copying data!
        Hide
        Shai Erera added a comment -

        Yes ... well in the past TermPositions didn't maintain a byte[] for payloads, you had to give it one. Now that it does, it's stupid to copy the array again

        Show
        Shai Erera added a comment - Yes ... well in the past TermPositions didn't maintain a byte[] for payloads, you had to give it one. Now that it does, it's stupid to copy the array again
        Hide
        Michael McCandless added a comment -

        +1, looks great!

        And it looks like it's a bit faster than before:

                            Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                         LowTerm       28.35      (1.4%)       29.42      (0.8%)    3.8% (   1% -    6%)
                        HighTerm        2.46      (0.6%)        2.57      (0.5%)    4.8% (   3% -    5%)
                         MedTerm       13.09      (1.4%)       13.92      (0.5%)    6.4% (   4% -    8%)
        

        I think we could speed things up more if this code "owned" the iteration, eg with some sort of "bulk accumulate" method, rather than StandardFacetAccumulator going through CategoryListIterator down to PayloadIterator, per hit. This way it could first iterate by segment (on the outer loop), then, inside iterate on all docs in that segment, etc. But save that for another day ...

        Show
        Michael McCandless added a comment - +1, looks great! And it looks like it's a bit faster than before: Task QPS base StdDev QPS comp StdDev Pct diff LowTerm 28.35 (1.4%) 29.42 (0.8%) 3.8% ( 1% - 6%) HighTerm 2.46 (0.6%) 2.57 (0.5%) 4.8% ( 3% - 5%) MedTerm 13.09 (1.4%) 13.92 (0.5%) 6.4% ( 4% - 8%) I think we could speed things up more if this code "owned" the iteration, eg with some sort of "bulk accumulate" method, rather than StandardFacetAccumulator going through CategoryListIterator down to PayloadIterator, per hit. This way it could first iterate by segment (on the outer loop), then, inside iterate on all docs in that segment, etc. But save that for another day ...
        Hide
        Shai Erera added a comment -

        Thanks, I'll run tests again just to make sure and commit.

        I'm glad that it sped things up. I was skeptic that duplicating MultiDPE logic into PayloadIterator will improve anything, but perhaps these results show that people should consider moving to the per-segment API. Maybe we need a separate benchmark to prove that ...

        PayloadIterator is just a utility class that encapsulates a logic that is similar to TermsEnum.seekExact. I.e., if DISI had an advanceExact(doc), I'm quite sure that we wouldn't need that class. I think that DISI.advanceExact is not that complicated to implement, and possibly even as a final method (so it doesn't affect any DISI out there), by calling advance() and docID(). I'll think about it and perhaps open an issue for it.

        Then, PayloadIterator could be annihilated completely, and maybe we can make CLI a per-segment thing too, and FacetsAccumulator will iterate per-segment. It's a bigger change though, so I agree with "save that for another day" .

        Show
        Shai Erera added a comment - Thanks, I'll run tests again just to make sure and commit. I'm glad that it sped things up. I was skeptic that duplicating MultiDPE logic into PayloadIterator will improve anything, but perhaps these results show that people should consider moving to the per-segment API. Maybe we need a separate benchmark to prove that ... PayloadIterator is just a utility class that encapsulates a logic that is similar to TermsEnum.seekExact. I.e., if DISI had an advanceExact(doc), I'm quite sure that we wouldn't need that class. I think that DISI.advanceExact is not that complicated to implement, and possibly even as a final method (so it doesn't affect any DISI out there), by calling advance() and docID(). I'll think about it and perhaps open an issue for it. Then, PayloadIterator could be annihilated completely, and maybe we can make CLI a per-segment thing too, and FacetsAccumulator will iterate per-segment. It's a bigger change though, so I agree with "save that for another day" .
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1419397

        LUCENE-4598: Change PayloadIterator to not use top-level reader API

        Show
        Commit Tag Bot added a comment - [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1419397 LUCENE-4598 : Change PayloadIterator to not use top-level reader API
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. Thanks Mike !

        Show
        Shai Erera added a comment - Committed to trunk and 4x. Thanks Mike !
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1419446

        LUCENE-4598: Change PayloadIterator to not use top-level reader API

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1419446 LUCENE-4598 : Change PayloadIterator to not use top-level reader API
        Hide
        Michael McCandless added a comment -

        Got a nice ~5-6% bump in the nightly facet perf from this fix: http://people.apache.org/~mikemccand/lucenebench/TermDateFacets.html

        I added an annotation.

        Show
        Michael McCandless added a comment - Got a nice ~5-6% bump in the nightly facet perf from this fix: http://people.apache.org/~mikemccand/lucenebench/TermDateFacets.html I added an annotation.
        Hide
        Shai Erera added a comment -

        nice !

        I expect a higher gain when indexing multiple facets per document, since the payload will be bigger. I think that not copying the payload would show a bigger gain than in this benchmark.

        Show
        Shai Erera added a comment - nice ! I expect a higher gain when indexing multiple facets per document, since the payload will be bigger. I think that not copying the payload would show a bigger gain than in this benchmark.
        Hide
        Michael McCandless added a comment -

        I made a hacked up patch to test how a specialized (payloads, dgap/vint, counting) 2nd pass aggregation would perform (attached):

                            Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                         LowTerm       29.28      (1.2%)       31.01      (1.4%)    5.9% (   3% -    8%)
                         MedTerm       14.28      (1.5%)       16.19      (1.4%)   13.4% (  10% -   16%)
                        HighTerm        4.05      (2.4%)        5.05      (1.5%)   24.6% (  20% -   29%)
        

        So ... I think we should provide specialized impls for the common cases, and make it easy to for users to use (eg FacetsCollector.create(FSP) or something).

        These results are similar to what I saw with the single-valued DocValue collector at https://issues.apache.org/jira/browse/LUCENE-4600?focusedCommentId=13527566&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527566

        Show
        Michael McCandless added a comment - I made a hacked up patch to test how a specialized (payloads, dgap/vint, counting) 2nd pass aggregation would perform (attached): Task QPS base StdDev QPS comp StdDev Pct diff LowTerm 29.28 (1.2%) 31.01 (1.4%) 5.9% ( 3% - 8%) MedTerm 14.28 (1.5%) 16.19 (1.4%) 13.4% ( 10% - 16%) HighTerm 4.05 (2.4%) 5.05 (1.5%) 24.6% ( 20% - 29%) So ... I think we should provide specialized impls for the common cases, and make it easy to for users to use (eg FacetsCollector.create(FSP) or something). These results are similar to what I saw with the single-valued DocValue collector at https://issues.apache.org/jira/browse/LUCENE-4600?focusedCommentId=13527566&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527566

          People

          • Assignee:
            Shai Erera
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development