Lucene - Core
  1. Lucene - Core
  2. LUCENE-5782

Improve OrdinalMap compression by sorting the supplied terms enums

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      As mentionned in LUCENE-5780, OrdinalMaps might have much better compression when the terms enums are supplied sorted by descending cardinality. When it is not the case, we could sort the enums and re-map segment indices on top of it.

      1. LUCENE-5782.patch
        14 kB
        Adrien Grand
      2. LUCENE-5782.patch
        18 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Here is a patch. As suggested by Robert the API takes a long[] that allows to configure the weight of each terms enum, and the code uses the number of live terms as a weight.

        Show
        Adrien Grand added a comment - Here is a patch. As suggested by Robert the API takes a long[] that allows to configure the weight of each terms enum, and the code uses the number of live terms as a weight.
        Hide
        Robert Muir added a comment -

        Can we remove the subclass+delegation of Remapped? thats just really awkward...

        Show
        Robert Muir added a comment - Can we remove the subclass+delegation of Remapped? thats just really awkward...
        Hide
        Adrien Grand added a comment -

        I'm not sure to understand why it is awkward and what you are suggesting.

        Show
        Adrien Grand added a comment - I'm not sure to understand why it is awkward and what you are suggesting.
        Hide
        Robert Muir added a comment -

        Well in the patch, OrdinalMap is split into a hierarchy of OrdinalMap and RemappedOrdinalMap.

        RemappedOrdinalMap extends OrdinalMap, but is a delegator taking the "raw" OrdinalMap, and just modifies the behavior of two one-line methods.

        Why have this class hierarchy? Why not just have one class like before?

        Show
        Robert Muir added a comment - Well in the patch, OrdinalMap is split into a hierarchy of OrdinalMap and RemappedOrdinalMap. RemappedOrdinalMap extends OrdinalMap, but is a delegator taking the "raw" OrdinalMap, and just modifies the behavior of two one-line methods. Why have this class hierarchy? Why not just have one class like before?
        Hide
        Adrien Grand added a comment -

        I did it this way to keep the raw ordinal map build decoupled from the segment number remapping, for simplicity. Let me try to see if I can keep it simple with a single class.

        Show
        Adrien Grand added a comment - I did it this way to keep the raw ordinal map build decoupled from the segment number remapping, for simplicity. Let me try to see if I can keep it simple with a single class.
        Hide
        Adrien Grand added a comment -

        Here is a new attempt, does it look better?

        Show
        Adrien Grand added a comment - Here is a new attempt, does it look better?
        Hide
        Robert Muir added a comment -

        +1, thank you!

        Show
        Robert Muir added a comment - +1, thank you!
        Hide
        ASF subversion and git services added a comment -

        Commit 1604387 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1604387 ]

        LUCENE-5782: Improve OrdinalMap compression by sorting the supplied terms enums

        Show
        ASF subversion and git services added a comment - Commit 1604387 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1604387 ] LUCENE-5782 : Improve OrdinalMap compression by sorting the supplied terms enums
        Hide
        ASF subversion and git services added a comment -

        Commit 1604388 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1604388 ]

        LUCENE-5782: Improve OrdinalMap compression by sorting the supplied terms enums

        Show
        ASF subversion and git services added a comment - Commit 1604388 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1604388 ] LUCENE-5782 : Improve OrdinalMap compression by sorting the supplied terms enums
        Hide
        Adrien Grand added a comment -

        Thanks Robert

        Show
        Adrien Grand added a comment - Thanks Robert

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development