Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      OrdinalMap does its best to store a mapping from segment to global ordinals with as little memory as possible using MonotonicAppendingLongBuffer. In the low-cardinality case, there are things that could be optimized though:

      • on large segments, it's quite likely that the segment ordinals will perfectly match the global ordinals. In that case there is nothing to do, we can just return the segment ordinal as-is.
      • even if they don't, it might be that storing the global ordinals directly in a PackedInts.Mutable only takes slightly more memory while removing the overhead of the monotonic encoding.
      1. LUCENE-5767.patch
        21 kB
        Adrien Grand
      2. LUCENE-5767.patch
        6 kB
        Adrien Grand

        Activity

        Hide
        jpountz Adrien Grand added a comment -

        Here is a patch.

        Show
        jpountz Adrien Grand added a comment - Here is a patch.
        Hide
        rcmuir Robert Muir added a comment -

        This looks great, +1

        Show
        rcmuir Robert Muir added a comment - This looks great, +1
        Hide
        martijn.v.groningen Martijn van Groningen added a comment -

        +1 this looks good

        Show
        martijn.v.groningen Martijn van Groningen added a comment - +1 this looks good
        Hide
        jpountz Adrien Grand added a comment -

        Here is a new iteration that:

        • fixes the nocommit about ramBytesUsed
        • changes the API a bit in order to expose the global ordinal map per-segment, that is: LongValues getGlobalOrds(int segmentIndex) instead of long getGlobalOrd(int segmentIndex, long segmentOrd. It makes the API a bit easier to consume per-segment and also proved to be slightly faster in the context of Elasticsearch.
        Show
        jpountz Adrien Grand added a comment - Here is a new iteration that: fixes the nocommit about ramBytesUsed changes the API a bit in order to expose the global ordinal map per-segment, that is: LongValues getGlobalOrds(int segmentIndex) instead of long getGlobalOrd(int segmentIndex, long segmentOrd . It makes the API a bit easier to consume per-segment and also proved to be slightly faster in the context of Elasticsearch.
        Hide
        rcmuir Robert Muir added a comment -

        I see, this avoids array lookup per-hit, nice idea.

        Show
        rcmuir Robert Muir added a comment - I see, this avoids array lookup per-hit, nice idea.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1602997 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1602997 ]

        LUCENE-5767: OrdinalMap optimizations.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1602997 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1602997 ] LUCENE-5767 : OrdinalMap optimizations.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1603000 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1603000 ]

        LUCENE-5767: OrdinalMap optimizations.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1603000 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1603000 ] LUCENE-5767 : OrdinalMap optimizations.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1604064 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1604064 ]

        LUCENE-5767: remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1604064 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1604064 ] LUCENE-5767 : remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1604068 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1604068 ]

        LUCENE-5767: remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1604068 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1604068 ] LUCENE-5767 : remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1604070 from Robert Muir in branch 'dev/branches/lucene_solr_4_9'
        [ https://svn.apache.org/r1604070 ]

        LUCENE-5767: remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1604070 from Robert Muir in branch 'dev/branches/lucene_solr_4_9' [ https://svn.apache.org/r1604070 ] LUCENE-5767 : remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)

          People

          • Assignee:
            jpountz Adrien Grand
            Reporter:
            jpountz Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development