Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      OrdinalMap does its best to store a mapping from segment to global ordinals with as little memory as possible using MonotonicAppendingLongBuffer. In the low-cardinality case, there are things that could be optimized though:

      • on large segments, it's quite likely that the segment ordinals will perfectly match the global ordinals. In that case there is nothing to do, we can just return the segment ordinal as-is.
      • even if they don't, it might be that storing the global ordinals directly in a PackedInts.Mutable only takes slightly more memory while removing the overhead of the monotonic encoding.
      1. LUCENE-5767.patch
        21 kB
        Adrien Grand
      2. LUCENE-5767.patch
        6 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Here is a patch.

        Show
        Adrien Grand added a comment - Here is a patch.
        Hide
        Robert Muir added a comment -

        This looks great, +1

        Show
        Robert Muir added a comment - This looks great, +1
        Hide
        Martijn van Groningen added a comment -

        +1 this looks good

        Show
        Martijn van Groningen added a comment - +1 this looks good
        Hide
        Adrien Grand added a comment -

        Here is a new iteration that:

        • fixes the nocommit about ramBytesUsed
        • changes the API a bit in order to expose the global ordinal map per-segment, that is: LongValues getGlobalOrds(int segmentIndex) instead of long getGlobalOrd(int segmentIndex, long segmentOrd. It makes the API a bit easier to consume per-segment and also proved to be slightly faster in the context of Elasticsearch.
        Show
        Adrien Grand added a comment - Here is a new iteration that: fixes the nocommit about ramBytesUsed changes the API a bit in order to expose the global ordinal map per-segment, that is: LongValues getGlobalOrds(int segmentIndex) instead of long getGlobalOrd(int segmentIndex, long segmentOrd . It makes the API a bit easier to consume per-segment and also proved to be slightly faster in the context of Elasticsearch.
        Hide
        Robert Muir added a comment -

        I see, this avoids array lookup per-hit, nice idea.

        Show
        Robert Muir added a comment - I see, this avoids array lookup per-hit, nice idea.
        Hide
        ASF subversion and git services added a comment -

        Commit 1602997 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1602997 ]

        LUCENE-5767: OrdinalMap optimizations.

        Show
        ASF subversion and git services added a comment - Commit 1602997 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1602997 ] LUCENE-5767 : OrdinalMap optimizations.
        Hide
        ASF subversion and git services added a comment -

        Commit 1603000 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1603000 ]

        LUCENE-5767: OrdinalMap optimizations.

        Show
        ASF subversion and git services added a comment - Commit 1603000 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1603000 ] LUCENE-5767 : OrdinalMap optimizations.
        Hide
        ASF subversion and git services added a comment -

        Commit 1604064 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1604064 ]

        LUCENE-5767: remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)

        Show
        ASF subversion and git services added a comment - Commit 1604064 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1604064 ] LUCENE-5767 : remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)
        Hide
        ASF subversion and git services added a comment -

        Commit 1604068 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1604068 ]

        LUCENE-5767: remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)

        Show
        ASF subversion and git services added a comment - Commit 1604068 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1604068 ] LUCENE-5767 : remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)
        Hide
        ASF subversion and git services added a comment -

        Commit 1604070 from Robert Muir in branch 'dev/branches/lucene_solr_4_9'
        [ https://svn.apache.org/r1604070 ]

        LUCENE-5767: remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)

        Show
        ASF subversion and git services added a comment - Commit 1604070 from Robert Muir in branch 'dev/branches/lucene_solr_4_9' [ https://svn.apache.org/r1604070 ] LUCENE-5767 : remove bogus cast (in this case can exceed Integer.MAX_VALUE, and the underlying delta reader takes long anyway)

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development