Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4
    • Component/s: None
    • Labels:
      None

      Description

      We can use index statistics to figure out before hand what type of doc set (sorted int or bitset) we should create. This should use less memory than the current approach as well as increase performance.

      1. SOLR-7918.patch
        12 kB
        Yonik Seeley

        Activity

        Hide
        Yonik Seeley added a comment -

        Patch attached.. This also introduces a DocSetProducer interface (ported from Heliosearch) to form a basis for future optimizations.

        The actual set building was moved out to DocSetUtil from SolrIndexSearcher to avoid bloating that class more.

        Performance improvements were quite good. On the low end was large SortedInt sets (only a 20% improvement), but large sets saw a 70% improvement and very small sets saw over 120% improvement. Complete request+response was measured from the client, so the speedups were actually even greater.

        Show
        Yonik Seeley added a comment - Patch attached.. This also introduces a DocSetProducer interface (ported from Heliosearch) to form a basis for future optimizations. The actual set building was moved out to DocSetUtil from SolrIndexSearcher to avoid bloating that class more. Performance improvements were quite good. On the low end was large SortedInt sets (only a 20% improvement), but large sets saw a 70% improvement and very small sets saw over 120% improvement. Complete request+response was measured from the client, so the speedups were actually even greater.
        Hide
        ASF subversion and git services added a comment -

        Commit 1695623 from Yonik Seeley in branch 'dev/trunk'
        [ https://svn.apache.org/r1695623 ]

        SOLR-7918: optimize term->DocSet generation

        Show
        ASF subversion and git services added a comment - Commit 1695623 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1695623 ] SOLR-7918 : optimize term->DocSet generation
        Hide
        ASF subversion and git services added a comment -

        Commit 1695626 from Yonik Seeley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1695626 ]

        SOLR-7918: optimize term->DocSet generation

        Show
        ASF subversion and git services added a comment - Commit 1695626 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1695626 ] SOLR-7918 : optimize term->DocSet generation
        Hide
        David Smiley added a comment -

        This is really cool Yonik! I looked over the patch. I have some feedback:

        • Was there really any benefit to initializing the FixedBitSet manually versus simply creating it and calling set() ? If not it's more clear to simply use the methods on FBS.
        • I saw the size threshold numerous times — maxDoc >> 6 + 5. Could this go into a utility method to not repeat yourself?
        • The private method createDocSetByIterator appears unused. What's the story there?
        Show
        David Smiley added a comment - This is really cool Yonik! I looked over the patch. I have some feedback: Was there really any benefit to initializing the FixedBitSet manually versus simply creating it and calling set() ? If not it's more clear to simply use the methods on FBS. I saw the size threshold numerous times — maxDoc >> 6 + 5 . Could this go into a utility method to not repeat yourself? The private method createDocSetByIterator appears unused. What's the story there?
        Hide
        Yonik Seeley added a comment -

        Was there really any benefit to initializing the FixedBitSet manually versus simply creating it and calling set() ?

        Yep, there did seem to be a perf increase... it happens sometimes. See BooleanScorer as well:

          // This is basically an inlined FixedBitSet... seems to help with bound checks
          final long[] matching = new long[SET_SIZE];
        

        I saw the size threshold numerous times — maxDoc >> 6 + 5.

        Yeah, that actually appears in other places (like SolrIndexSearcher) too, and sometimes just as maxDoc>>6. They should all arguably have a small constant added for better test coverage of both small and large DocSets.

        The private method createDocSetByIterator appears unused. What's the story there?

        I started off by just porting DocSetUtil, but this code isn't used (yet). It can be removed for now.

        Show
        Yonik Seeley added a comment - Was there really any benefit to initializing the FixedBitSet manually versus simply creating it and calling set() ? Yep, there did seem to be a perf increase... it happens sometimes. See BooleanScorer as well: // This is basically an inlined FixedBitSet... seems to help with bound checks final long [] matching = new long [SET_SIZE]; I saw the size threshold numerous times — maxDoc >> 6 + 5. Yeah, that actually appears in other places (like SolrIndexSearcher) too, and sometimes just as maxDoc>>6. They should all arguably have a small constant added for better test coverage of both small and large DocSets. The private method createDocSetByIterator appears unused. What's the story there? I started off by just porting DocSetUtil, but this code isn't used (yet). It can be removed for now.
        Hide
        ASF subversion and git services added a comment -

        Commit 1702572 from Yonik Seeley in branch 'dev/trunk'
        [ https://svn.apache.org/r1702572 ]

        SOLR-7918: remove dead code in DocSetUtil

        Show
        ASF subversion and git services added a comment - Commit 1702572 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1702572 ] SOLR-7918 : remove dead code in DocSetUtil
        Hide
        ASF subversion and git services added a comment -

        Commit 1702573 from Yonik Seeley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1702573 ]

        SOLR-7918: remove dead code in DocSetUtil

        Show
        ASF subversion and git services added a comment - Commit 1702573 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1702573 ] SOLR-7918 : remove dead code in DocSetUtil

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development