Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      MergeState.DocMap could leverage MonotonicAppendingLongBuffer to save memory.

      1. LUCENE-4792.patch
        10 kB
        Adrien Grand

        Issue Links

          Activity

          Hide
          Adrien Grand added a comment -

          Patch. I used RamUsageEstimator to estimate the memory savings depending on the ratio of deleted documents on a segment of 100M docs (deleted docs are randomly chosen):

          Deletion ratio Old size New size
          0.001 214.6 MB 32.2 MB
          0.01 250.3 MB 53.7 MB
          0.1 298 MB 71.3 MB
          0.3 309.9 MB 79.3 MB
          0.5 321.9 MB 80.8 MB
          0.7 309.9 MB 79.3 MB
          0.9 298 MB 71.3 MB
          0.99 250.3 MB 53.8 MB
          0.999 214.6 MB 32.3 MB
          Show
          Adrien Grand added a comment - Patch. I used RamUsageEstimator to estimate the memory savings depending on the ratio of deleted documents on a segment of 100M docs (deleted docs are randomly chosen): Deletion ratio Old size New size 0.001 214.6 MB 32.2 MB 0.01 250.3 MB 53.7 MB 0.1 298 MB 71.3 MB 0.3 309.9 MB 79.3 MB 0.5 321.9 MB 80.8 MB 0.7 309.9 MB 79.3 MB 0.9 298 MB 71.3 MB 0.99 250.3 MB 53.8 MB 0.999 214.6 MB 32.3 MB
          Hide
          Robert Muir added a comment -

          +1, this is great.

          Show
          Robert Muir added a comment - +1, this is great.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Adrien Grand
          http://svn.apache.org/viewvc?view=revision&revision=1448853

          LUCENE-4792: Reduction of the memory required to build the doc ID maps used when merging segments.

          Show
          Commit Tag Bot added a comment - [trunk commit] Adrien Grand http://svn.apache.org/viewvc?view=revision&revision=1448853 LUCENE-4792 : Reduction of the memory required to build the doc ID maps used when merging segments.
          Hide
          Adrien Grand added a comment -

          Committed. Thanks for the review Robert!

          Show
          Adrien Grand added a comment - Committed. Thanks for the review Robert!
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Adrien Grand
          http://svn.apache.org/viewvc?view=revision&revision=1448861

          LUCENE-4792: Reduction of the memory required to build the doc ID maps used when merging segments (merged from r1448853).

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Adrien Grand http://svn.apache.org/viewvc?view=revision&revision=1448861 LUCENE-4792 : Reduction of the memory required to build the doc ID maps used when merging segments (merged from r1448853).
          Hide
          Michael McCandless added a comment -

          These RAM savings are AWESOME! Where else can we use MonotonicAppendingLongBuffer!

          Show
          Michael McCandless added a comment - These RAM savings are AWESOME! Where else can we use MonotonicAppendingLongBuffer!
          Hide
          Robert Muir added a comment -

          We are using the same compression for (as far as i know):

          • stored fields, term vectors, docvalues "disk" addresses
          • multidocvalues ordinal maps

          We could consider trying it out for fieldcache and other places for example, im not sure what the perf hit would be.
          (I'm not very interested in optimizing fieldcache myself)

          Show
          Robert Muir added a comment - We are using the same compression for (as far as i know): stored fields, term vectors, docvalues "disk" addresses multidocvalues ordinal maps We could consider trying it out for fieldcache and other places for example, im not sure what the perf hit would be. (I'm not very interested in optimizing fieldcache myself)
          Hide
          Adrien Grand added a comment -

          In case someone would like to use this class, I'd add that:

          • the encoded sequence does not strictly need to be monotonic: it can encode any sequence of values but it compresses best when the stream contains monotonic sub-sequences of 1024 longs at least (for example it would have a good compression ratio if there are first 10000 increasing values and then 5000 decreasing values),
          • it can address up to 2^42 values,
          • there are writer/reader equivalents called MonotonicBlockPackedWriter and MonotonicBlockPackedReader (which can either load values in memory or read from disk).
          Show
          Adrien Grand added a comment - In case someone would like to use this class, I'd add that: the encoded sequence does not strictly need to be monotonic: it can encode any sequence of values but it compresses best when the stream contains monotonic sub-sequences of 1024 longs at least (for example it would have a good compression ratio if there are first 10000 increasing values and then 5000 decreasing values), it can address up to 2^42 values, there are writer/reader equivalents called MonotonicBlockPackedWriter and MonotonicBlockPackedReader (which can either load values in memory or read from disk).
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Adrien Grand
              Reporter:
              Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development