Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This one has fallen behind...
      It picks TABLE/GCD even when it won't actually save space or help, writes with BlockpackedWriter even when it won't save space, etc.

      Instead of comparing PackedInts.bitsRequired, factor in acceptableOverheadRatio too to determine "will save space". Check if blocking will save space along the same lines (otherwise use regular packed ints).

      Fix a similar bug in Lucene49 codec along these same lines (comparing PackedInts.bitsRequired instead of what would actually be written)

      1. LUCENE-5751.patch
        14 kB
        Robert Muir
      2. LUCENE-5751.patch
        14 kB
        Robert Muir

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        Patch: i see significant performance improvements with this codec, sometimes > 50% for numerics/strings.

        Show
        rcmuir Robert Muir added a comment - Patch: i see significant performance improvements with this codec, sometimes > 50% for numerics/strings.
        Hide
        jpountz Adrien Grand added a comment -

        Hmm, should avgBPV be stored on a float? I think it can favor blocking too much otherwise if eg. 8.9 becomes 8 in your heuristic. Otherwise, it looks good to me!

        Show
        jpountz Adrien Grand added a comment - Hmm, should avgBPV be stored on a float? I think it can favor blocking too much otherwise if eg. 8.9 becomes 8 in your heuristic. Otherwise, it looks good to me!
        Hide
        rcmuir Robert Muir added a comment -

        Good point: i updated the patch. In general it shouldnt be too sensitive since it only looks for large differences, but I agree its better to just use a float avg!

        Show
        rcmuir Robert Muir added a comment - Good point: i updated the patch. In general it shouldnt be too sensitive since it only looks for large differences, but I agree its better to just use a float avg!
        Hide
        mikemccand Michael McCandless added a comment -

        +1

        Show
        mikemccand Michael McCandless added a comment - +1
        Hide
        jpountz Adrien Grand added a comment -

        +1

        Show
        jpountz Adrien Grand added a comment - +1
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1601929 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1601929 ]

        LUCENE-5751: Bring MemoryDocValues up to speed

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1601929 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1601929 ] LUCENE-5751 : Bring MemoryDocValues up to speed
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1601936 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1601936 ]

        LUCENE-5751: Bring MemoryDocValues up to speed

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1601936 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1601936 ] LUCENE-5751 : Bring MemoryDocValues up to speed

          People

          • Assignee:
            Unassigned
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development