Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      These are heavily optimized for the in-RAM case (for example FieldCache uses PackedInts.FAST to make it even faster so), but for the docvalues case they are not: we always essentially use COMPACT, we have only one decoder that must solve all the cases, even the complicated ones, we use BlockPackedWriter for all integers (even if they are ordinals), etc.

      1. LUCENE-5720.patch
        28 kB
        Robert Muir
      2. LUCENE-5720.patch
        26 kB
        Robert Muir
      3. LUCENE-5720.patch
        22 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Here's my first stab. this adds a fastestDirectBits(float overhead) versus trying to integrate with the existing one, because the logic is different when dealing with the directory API.

        We can probably improve this stuff more for 5.0, e.g. the directory api was always geared at sequential access and we might be able to introduce some API changes later to speed it up more: but this seems like a safe win.

        Show
        Robert Muir added a comment - Here's my first stab. this adds a fastestDirectBits(float overhead) versus trying to integrate with the existing one, because the logic is different when dealing with the directory API. We can probably improve this stuff more for 5.0, e.g. the directory api was always geared at sequential access and we might be able to introduce some API changes later to speed it up more: but this seems like a safe win.
        Hide
        Robert Muir added a comment -

        I tried to hack luceneutil up for a performance test, not sure wikipedia 'title' is the best, but i tried on 1M:

        Size: 500KB increase in docvalues data (5.7MB -> 6.2MB)
        Note that in context, the entire index is 385MB (no stored fields or vectors), so the 500KB docvalues increase is negligible.

        20% improvement in sort performance.

        Show
        Robert Muir added a comment - I tried to hack luceneutil up for a performance test, not sure wikipedia 'title' is the best, but i tried on 1M: Size: 500KB increase in docvalues data (5.7MB -> 6.2MB) Note that in context, the entire index is 385MB (no stored fields or vectors), so the 500KB docvalues increase is negligible. 20% improvement in sort performance.
        Hide
        Robert Muir added a comment -

        With file formats and javadocs.

        Show
        Robert Muir added a comment - With file formats and javadocs.
        Hide
        Adrien Grand added a comment -

        +1

        Show
        Adrien Grand added a comment - +1
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Robert Muir added a comment -

        Updated patch: since we are using these for numerics (and in those cases high BPV is common, e.g. floats/doubles/etc), i added 'byte' cases for bpv > 32 (40,48,56,64). I changed the upgrade logic, to try to go to byte first, then nibble.

        This means it also works like the in-ram one: if you pass FASTEST it always goes to some multiple of a byte.

        Show
        Robert Muir added a comment - Updated patch: since we are using these for numerics (and in those cases high BPV is common, e.g. floats/doubles/etc), i added 'byte' cases for bpv > 32 (40,48,56,64). I changed the upgrade logic, to try to go to byte first, then nibble. This means it also works like the in-ram one: if you pass FASTEST it always goes to some multiple of a byte.
        Hide
        Adrien Grand added a comment -

        +1 to the updated patch

        Show
        Adrien Grand added a comment - +1 to the updated patch
        Hide
        ASF subversion and git services added a comment -

        Commit 1599180 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1599180 ]

        LUCENE-5720: Optimize DirectPackedReader's decompression

        Show
        ASF subversion and git services added a comment - Commit 1599180 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1599180 ] LUCENE-5720 : Optimize DirectPackedReader's decompression
        Hide
        ASF subversion and git services added a comment -

        Commit 1599182 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1599182 ]

        LUCENE-5720: Optimize DirectPackedReader's decompression

        Show
        ASF subversion and git services added a comment - Commit 1599182 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1599182 ] LUCENE-5720 : Optimize DirectPackedReader's decompression

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development