Lucene - Core
  1. Lucene - Core
  2. LUCENE-5780

OrdinalMap's mapping from global ords to segment ords is sometimes wasteful

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Robert found a case when the ordinal map can be quite wasteful in terms of memory usage: in order to be able to resolve values given a global ordinals, it stores two things:

      • an identifier of the segment where the value is
      • the difference between the ordinal on the segment and the global ordinal

      The issue is that OrdinalMap currently picks any of the segments that contain the value but we can do better: we can pick the first segment that has the value. This will help for two reasons:

      • it will potentially require fewer bits per value to store the segment ids if NRT segments don't introduce new values
      • if all values happen to appear in the first segment, then the map from global ords to deltas only stores zeros.

      I just tested on an index where all values are in the first segment and this helped reduce memory usage of the ordinal map by 4x (from 3.5MB to 800KB).

      1. LUCENE-5780.patch
        2 kB
        Adrien Grand
      2. LUCENE-5780.patch
        2 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        I opened LUCENE-5782

        Show
        Adrien Grand added a comment - I opened LUCENE-5782
        Hide
        ASF subversion and git services added a comment -

        Commit 1604158 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1604158 ]

        LUCENE-5780: Make OrdinalMap more memory-efficient.

        Show
        ASF subversion and git services added a comment - Commit 1604158 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1604158 ] LUCENE-5780 : Make OrdinalMap more memory-efficient.
        Hide
        ASF subversion and git services added a comment -

        Commit 1604157 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1604157 ]

        LUCENE-5780: Make OrdinalMap more memory-efficient.

        Show
        ASF subversion and git services added a comment - Commit 1604157 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1604157 ] LUCENE-5780 : Make OrdinalMap more memory-efficient.
        Hide
        Adrien Grand added a comment -

        +1 I had planned to open a follow-up issue about it. Will open it shortly.

        Show
        Adrien Grand added a comment - +1 I had planned to open a follow-up issue about it. Will open it shortly.
        Hide
        Robert Muir added a comment -

        This looks good (+1 to commit to trunk/4.10) but i think we can do better, by explicitly sorting? E.g. take a long[] sizes parameter (can be optional and zeros would give us what we have today if we use a stable sort), that the user could populate either with valueCount or number of docs in the segment (both are probably a fine heuristic).

        I know this means we will need an array to remap lookups, but this only happens once per segment with the new LongValues api so it won't impact performance.

        Show
        Robert Muir added a comment - This looks good (+1 to commit to trunk/4.10) but i think we can do better, by explicitly sorting? E.g. take a long[] sizes parameter (can be optional and zeros would give us what we have today if we use a stable sort), that the user could populate either with valueCount or number of docs in the segment (both are probably a fine heuristic). I know this means we will need an array to remap lookups, but this only happens once per segment with the new LongValues api so it won't impact performance.
        Hide
        Adrien Grand added a comment -

        Same patch but with a better assertion.

        Show
        Adrien Grand added a comment - Same patch but with a better assertion.
        Hide
        Adrien Grand added a comment -

        Here is a patch (built on 4.x since I did the tests on 4.x but it should apply well on trunk too).

        Show
        Adrien Grand added a comment - Here is a patch (built on 4.x since I did the tests on 4.x but it should apply well on trunk too).
        Hide
        Adrien Grand added a comment -

        No worries I just updated the fix version.

        Show
        Adrien Grand added a comment - No worries I just updated the fix version.
        Hide
        Robert Muir added a comment -

        I agree conceptually its a bug, but I think this should be 4.10. Its not anything introduced in 4.9, its been this way since 4.2

        I already am pushing back on other issues such as SOLR-6178, because i ran 125 rounds of tests last night and want to keep things stable.

        Show
        Robert Muir added a comment - I agree conceptually its a bug, but I think this should be 4.10. Its not anything introduced in 4.9, its been this way since 4.2 I already am pushing back on other issues such as SOLR-6178 , because i ran 125 rounds of tests last night and want to keep things stable.

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development