Lucene - Core
  1. Lucene - Core
  2. LUCENE-6779

Reduce memory allocated by CompressingStoredFieldsWriter to write large strings

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: core/codecs
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      In SOLR-7927, I am trying to reduce the memory required to index very large documents (between 10 to 100MB) and one of the places which allocate a lot of heap is the UTF8 encoding in CompressingStoredFieldsWriter. The same problem existed in JavaBinCodec and we reduced its memory allocation by falling back to a double pass approach in SOLR-7971 when the utf8 size of the string is greater than 64KB.

      I propose to make the same changes to CompressingStoredFieldsWriter as we made to JavaBinCodec in SOLR-7971.

      1. LUCENE-6779_alt.patch
        6 kB
        Robert Muir
      2. LUCENE-6779.patch
        13 kB
        Shalin Shekhar Mangar
      3. LUCENE-6779.patch
        9 kB
        Shalin Shekhar Mangar
      4. LUCENE-6779.patch
        8 kB
        Shalin Shekhar Mangar
      5. LUCENE-6779.patch
        8 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Shalin Shekhar Mangar added a comment -

          Patch with the changes.

          Show
          Shalin Shekhar Mangar added a comment - Patch with the changes.
          Hide
          Dawid Weiss added a comment -
          +  /** Writes UTF8 into the given OutputStream by first writing to the given scratch array
          +   * and then writing the contents of the scratch array to the OutputStream. The given scratch byte array
          +   * is used to buffer intermediate data before it is written to the byte buffer.
          +   *
          +   * @return the number of bytes written
          +   */
          +  public static int writeUTF16toUTF8(final CharSequence s, final int offset, final int len, final DataOutput dataOutput, final byte[] scratch) throws IOException {
          

          Isn't this a mix of two things (buffering and coding)? I think it'd be nicer to have the DataOutput (or some decorator) take care of the buffering aspects and the routine could then focus on transcoding from UTF16 to UTF8.

          Also, most of the hardcoded constants/ checks for surrogate pairs, etc. do have counterparts in Character.* methods (and they should inline very well).

          Show
          Dawid Weiss added a comment - + /** Writes UTF8 into the given OutputStream by first writing to the given scratch array + * and then writing the contents of the scratch array to the OutputStream. The given scratch byte array + * is used to buffer intermediate data before it is written to the byte buffer. + * + * @ return the number of bytes written + */ + public static int writeUTF16toUTF8( final CharSequence s, final int offset, final int len, final DataOutput dataOutput, final byte [] scratch) throws IOException { Isn't this a mix of two things (buffering and coding)? I think it'd be nicer to have the DataOutput (or some decorator) take care of the buffering aspects and the routine could then focus on transcoding from UTF16 to UTF8. Also, most of the hardcoded constants/ checks for surrogate pairs, etc. do have counterparts in Character.* methods (and they should inline very well).
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          Isn't this a mix of two things (buffering and coding)? I think it'd be nicer to have the DataOutput (or some decorator) take care of the buffering aspects and the routine could then focus on transcoding from UTF16 to UTF8.

          Yes but that actually has better performance than writing bytes directly to the DataOutput. I tested this with JavaBinCodec and I don't think performance will be very different here (see JMH benchmark results in SOLR-7971). Presumably, the huge amount of invocations of writeByte don't perform well compared to setting a byte in a scratch array directly.

          Also, most of the hardcoded constants/ checks for surrogate pairs, etc. do have counterparts in Character.* methods (and they should inline very well).

          I didn't know about that. The constants here are the same as the ones in the existing UnicodeUtil.UTF16toUTF8 method.

          Show
          Shalin Shekhar Mangar added a comment - - edited Isn't this a mix of two things (buffering and coding)? I think it'd be nicer to have the DataOutput (or some decorator) take care of the buffering aspects and the routine could then focus on transcoding from UTF16 to UTF8. Yes but that actually has better performance than writing bytes directly to the DataOutput. I tested this with JavaBinCodec and I don't think performance will be very different here (see JMH benchmark results in SOLR-7971 ). Presumably, the huge amount of invocations of writeByte don't perform well compared to setting a byte in a scratch array directly. Also, most of the hardcoded constants/ checks for surrogate pairs, etc. do have counterparts in Character.* methods (and they should inline very well). I didn't know about that. The constants here are the same as the ones in the existing UnicodeUtil.UTF16toUTF8 method.
          Hide
          Robert Muir added a comment -

          Yes but that actually has better performance than writing bytes directly to the DataOutput. I tested this with JavaBinCodec and I don't think performance will be very different here (see JMH benchmark results in SOLR-7971). Presumably, the huge amount of invocations of writeByte don't perform well compared to setting a byte in a scratch array directly.

          Its unclear to me the complexity is worth it. The data used in the benchmark is 100% latin-1 (completely english), which certainly isn't representative of reality, so the benchmarks don't mean anything to me.

          One thing to keep in mind is it only affects writes from string fields on flush, during merging, bulk copying can kick in, and even in the worst case where that can't happen, we really shouldn't be taking this codepath anyway, we should just do byte[] -> byte[] for string fields

          So I'm not sure if the optimization (especially for 10MB documents which is a ridiculous case) is really that powerful? We have to be extremely careful here because with these kind of optimizations, any bug -> data corruption.

          Show
          Robert Muir added a comment - Yes but that actually has better performance than writing bytes directly to the DataOutput. I tested this with JavaBinCodec and I don't think performance will be very different here (see JMH benchmark results in SOLR-7971 ). Presumably, the huge amount of invocations of writeByte don't perform well compared to setting a byte in a scratch array directly. Its unclear to me the complexity is worth it. The data used in the benchmark is 100% latin-1 (completely english), which certainly isn't representative of reality, so the benchmarks don't mean anything to me. One thing to keep in mind is it only affects writes from string fields on flush, during merging, bulk copying can kick in, and even in the worst case where that can't happen, we really shouldn't be taking this codepath anyway, we should just do byte[] -> byte[] for string fields So I'm not sure if the optimization (especially for 10MB documents which is a ridiculous case) is really that powerful? We have to be extremely careful here because with these kind of optimizations, any bug -> data corruption.
          Hide
          Robert Muir added a comment -

          Also i still think this doesn't really provide any benefits here to do this buffering, as the output in question is just a byte array anyway (going to compression). I don't think buffering to a separate byte[] really saves anything when we are talking about a ByteArrayDataOutput, this is an entirely different beast than the "FastOutputStream" used in the benchmark.

          So because its already going to a byte[] I think its not so useful to stream it this way for a huge document. For such a huge doc (10MB) case the current scheme probably has a number of disadvantages both in that buffering and in the underlying compression parameters anyway.

          I think it would be good to do real indexing benchmarks to see if this really helps your case. I suspect to improve it would really require other things, as many aspects of the current codec may be suboptimal: not just flush, but merge too.

          Show
          Robert Muir added a comment - Also i still think this doesn't really provide any benefits here to do this buffering, as the output in question is just a byte array anyway (going to compression). I don't think buffering to a separate byte[] really saves anything when we are talking about a ByteArrayDataOutput, this is an entirely different beast than the "FastOutputStream" used in the benchmark. So because its already going to a byte[] I think its not so useful to stream it this way for a huge document. For such a huge doc (10MB) case the current scheme probably has a number of disadvantages both in that buffering and in the underlying compression parameters anyway. I think it would be good to do real indexing benchmarks to see if this really helps your case. I suspect to improve it would really require other things, as many aspects of the current codec may be suboptimal: not just flush, but merge too.
          Hide
          Robert Muir added a comment -

          btw, in my opinion, the way to do this one would be to just call bufferedDocs.writeString() from CompressingStoredFieldsWriter.

          Then, fix override writeString() in GrowableByteArrayDataOutput to have a more efficient implementation than the default one from DataOutput (which makes its own byte[]/BytesRef, thats why its not being used here).

          By pushing it down to that output it would be much easier to test at the very least. Optimizing for large sizes otherwise is very risky because most tests will not cover the changes.

          Show
          Robert Muir added a comment - btw, in my opinion, the way to do this one would be to just call bufferedDocs.writeString() from CompressingStoredFieldsWriter. Then, fix override writeString() in GrowableByteArrayDataOutput to have a more efficient implementation than the default one from DataOutput (which makes its own byte[]/BytesRef, thats why its not being used here). By pushing it down to that output it would be much easier to test at the very least. Optimizing for large sizes otherwise is very risky because most tests will not cover the changes.
          Hide
          Robert Muir added a comment -

          here is my prototype just to show you what i mean. This one just removes the extra buffer entirely.

          So my suggestion would be, just benchmark/optimize/test this GrowableByteArrayDataOutput.writeString() itself.

          If it needs an extra buffer added back to speed up small strings, then lets add it back here. I also think this thing does not need to be in a .util package, since its only used by .compressing package. Lets move it there, and let it make appropriate tradeoffs specific to writing this data for CompressingXXX.

          And test the hell out of it with unit tests if the logic must be any fancier.

          Show
          Robert Muir added a comment - here is my prototype just to show you what i mean. This one just removes the extra buffer entirely. So my suggestion would be, just benchmark/optimize/test this GrowableByteArrayDataOutput.writeString() itself. If it needs an extra buffer added back to speed up small strings, then lets add it back here . I also think this thing does not need to be in a .util package, since its only used by .compressing package. Lets move it there, and let it make appropriate tradeoffs specific to writing this data for CompressingXXX. And test the hell out of it with unit tests if the logic must be any fancier.
          Hide
          Dawid Weiss added a comment -

          I like what Robert suggested. I'd still use Character constants/ methods where applicable though. Not that I don't know what the code means, but for somebody not familiar with UTF8 it may be easier on the eyes.

          Show
          Dawid Weiss added a comment - I like what Robert suggested. I'd still use Character constants/ methods where applicable though. Not that I don't know what the code means, but for somebody not familiar with UTF8 it may be easier on the eyes.
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Robert Muir. I like this idea of writing directly to the byte array of GrowableByteArrayDataOutput and doing away with the intermediate buffer entirely.

          I'll benchmark this method and report what I find.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Robert Muir . I like this idea of writing directly to the byte array of GrowableByteArrayDataOutput and doing away with the intermediate buffer entirely. I'll benchmark this method and report what I find.
          Hide
          Shalin Shekhar Mangar added a comment -

          This patch is based on Robert's earlier patch but I fallback to the double pass approach only if string is larger than 64KB. This patch also moves GrowableByteArrayDataOutput from the util to codecs.compressing package as suggested.

          I benchmarked both approaches (i.e. use double pass always vs use single pass below 64KB) against test data generated using TestUtil.randomRealisticUnicodeString between 5 and 64 characters) and for such short fields, double pass is approx 30% slower. I don't think short fields should pay this penalty considering those should be far more common.

          testWriteString1 = Use double pass always
          testWriteString2 = Use double pass if utf8 size is greater than 64KB
          testWriteStringDefault = Use writeString from base DataOutput class
          
          10K Randomly generated strings (5 <= len <= 64)
          ======================================
          java -server -Xmx2048M -Xms2048M -Dtests.seed=18262 -Dtests.datagen.path=./data.txt -Dtests.string.minlen=5 -Dtests.string.maxlen=64 -Dtests.string.num=10000 -jar target/benchmarks.jar -wi 5 -i 50 -gc true -f 2 -prof gc ".*GrowableByteArrayDataOutputBenchmark.*"
          
          # Run complete. Total time: 00:06:41
          
          Benchmark                                                                         Mode  Cnt        Score       Error   Units
          GrowableByteArrayDataOutputBenchmark.testWriteString1                            thrpt  100  2916182.627 ±  5219.401   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate             thrpt  100        0.001 ±     0.001  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate.norm        thrpt  100       ≈ 10⁻⁴                B/op
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.count                  thrpt  100          ≈ 0              counts
          GrowableByteArrayDataOutputBenchmark.testWriteString2                            thrpt  100  4226084.451 ±  7188.594   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate             thrpt  100      596.567 ±     1.016  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate.norm        thrpt  100      148.060 ±     0.001    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.count                  thrpt  100          ≈ 0              counts
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault                      thrpt  100  4221729.873 ± 13558.316   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate       thrpt  100      595.961 ±     1.916  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate.norm  thrpt  100      148.060 ±     0.001    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.count            thrpt  100        1.000              counts
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.time             thrpt  100       19.000                  ms
          
          10MB latin-1 field
          =============
          java -server -Xmx2048M -Xms2048M -Dtests.seed=18262 -Dtests.string.num=0 -Dtests.json.path=./input14.json -jar target/benchmarks.jar -wi 5 -i 50 -gc true -f 2 -prof gc ".*GrowableByteArrayDataOutputBenchmark.*
          
          # Run complete. Total time: 00:06:47
          
          Benchmark                                                                                  Mode  Cnt         Score        Error   Units
          GrowableByteArrayDataOutputBenchmark.testWriteString1                                     thrpt  100        27.985 ±      0.074   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate                      thrpt  100         0.001 ±      0.001  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate.norm                 thrpt  100        24.951 ±     20.652    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.count                           thrpt  100           ≈ 0               counts
          GrowableByteArrayDataOutputBenchmark.testWriteString2                                     thrpt  100        28.105 ±      0.090   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate                      thrpt  100         0.001 ±      0.001  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate.norm                 thrpt  100        24.888 ±     20.655    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.count                           thrpt  100           ≈ 0               counts
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault                               thrpt  100        36.185 ±      0.099   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate                thrpt  100      1123.864 ±      3.077  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate.norm           thrpt  100  32575891.405 ±     16.168    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.churn.PS_Eden_Space       thrpt  100       645.241 ±      7.098  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.churn.PS_Eden_Space.norm  thrpt  100  18703213.617 ± 205570.201    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.count                     thrpt  100       100.000               counts
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.time                      thrpt  100       299.000                   ms
          
          100MB latin-1 string
          ================
          java -server -Xmx2048M -Xms2048M -Dtests.seed=18262 -Dtests.string.num=0 -Dtests.json.path=./input140.json -jar target/benchmarks.jar -wi 5 -i 50 -gc true -f 2 -prof gc ".*GrowableByteArrayDataOutputBenchmark.*
          
          # Run complete. Total time: 00:07:14
          
          Benchmark                                                                         Mode  Cnt          Score     Error   Units
          GrowableByteArrayDataOutputBenchmark.testWriteString1                            thrpt  100          2.814 ±   0.008   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate             thrpt  100          0.001 ±   0.001  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate.norm        thrpt  100        236.853 ± 196.100    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.count                  thrpt  100            ≈ 0            counts
          GrowableByteArrayDataOutputBenchmark.testWriteString2                            thrpt  100          2.811 ±   0.022   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate             thrpt  100          0.001 ±   0.001  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate.norm        thrpt  100        236.853 ± 196.100    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.count                  thrpt  100            ≈ 0            counts
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault                      thrpt  100          3.617 ±   0.009   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate       thrpt  100       1123.487 ±   2.667  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate.norm  thrpt  100  325758521.800 ± 147.457    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.count            thrpt  100            ≈ 0            counts
          
          Show
          Shalin Shekhar Mangar added a comment - This patch is based on Robert's earlier patch but I fallback to the double pass approach only if string is larger than 64KB. This patch also moves GrowableByteArrayDataOutput from the util to codecs.compressing package as suggested. I benchmarked both approaches (i.e. use double pass always vs use single pass below 64KB) against test data generated using TestUtil.randomRealisticUnicodeString between 5 and 64 characters) and for such short fields, double pass is approx 30% slower. I don't think short fields should pay this penalty considering those should be far more common. testWriteString1 = Use double pass always testWriteString2 = Use double pass if utf8 size is greater than 64KB testWriteStringDefault = Use writeString from base DataOutput class 10K Randomly generated strings (5 <= len <= 64) ====================================== java -server -Xmx2048M -Xms2048M -Dtests.seed=18262 -Dtests.datagen.path=./data.txt -Dtests.string.minlen=5 -Dtests.string.maxlen=64 -Dtests.string.num=10000 -jar target/benchmarks.jar -wi 5 -i 50 -gc true -f 2 -prof gc ".*GrowableByteArrayDataOutputBenchmark.*" # Run complete. Total time: 00:06:41 Benchmark Mode Cnt Score Error Units GrowableByteArrayDataOutputBenchmark.testWriteString1 thrpt 100 2916182.627 ± 5219.401 ops/s GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate thrpt 100 0.001 ± 0.001 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate.norm thrpt 100 ≈ 10⁻⁴ B/op GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.count thrpt 100 ≈ 0 counts GrowableByteArrayDataOutputBenchmark.testWriteString2 thrpt 100 4226084.451 ± 7188.594 ops/s GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate thrpt 100 596.567 ± 1.016 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate.norm thrpt 100 148.060 ± 0.001 B/op GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.count thrpt 100 ≈ 0 counts GrowableByteArrayDataOutputBenchmark.testWriteStringDefault thrpt 100 4221729.873 ± 13558.316 ops/s GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate thrpt 100 595.961 ± 1.916 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate.norm thrpt 100 148.060 ± 0.001 B/op GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.count thrpt 100 1.000 counts GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.time thrpt 100 19.000 ms 10MB latin-1 field ============= java -server -Xmx2048M -Xms2048M -Dtests.seed=18262 -Dtests.string.num=0 -Dtests.json.path=./input14.json -jar target/benchmarks.jar -wi 5 -i 50 -gc true -f 2 -prof gc ".*GrowableByteArrayDataOutputBenchmark.* # Run complete. Total time: 00:06:47 Benchmark Mode Cnt Score Error Units GrowableByteArrayDataOutputBenchmark.testWriteString1 thrpt 100 27.985 ± 0.074 ops/s GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate thrpt 100 0.001 ± 0.001 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate.norm thrpt 100 24.951 ± 20.652 B/op GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.count thrpt 100 ≈ 0 counts GrowableByteArrayDataOutputBenchmark.testWriteString2 thrpt 100 28.105 ± 0.090 ops/s GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate thrpt 100 0.001 ± 0.001 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate.norm thrpt 100 24.888 ± 20.655 B/op GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.count thrpt 100 ≈ 0 counts GrowableByteArrayDataOutputBenchmark.testWriteStringDefault thrpt 100 36.185 ± 0.099 ops/s GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate thrpt 100 1123.864 ± 3.077 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate.norm thrpt 100 32575891.405 ± 16.168 B/op GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.churn.PS_Eden_Space thrpt 100 645.241 ± 7.098 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.churn.PS_Eden_Space.norm thrpt 100 18703213.617 ± 205570.201 B/op GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.count thrpt 100 100.000 counts GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.time thrpt 100 299.000 ms 100MB latin-1 string ================ java -server -Xmx2048M -Xms2048M -Dtests.seed=18262 -Dtests.string.num=0 -Dtests.json.path=./input140.json -jar target/benchmarks.jar -wi 5 -i 50 -gc true -f 2 -prof gc ".*GrowableByteArrayDataOutputBenchmark.* # Run complete. Total time: 00:07:14 Benchmark Mode Cnt Score Error Units GrowableByteArrayDataOutputBenchmark.testWriteString1 thrpt 100 2.814 ± 0.008 ops/s GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate thrpt 100 0.001 ± 0.001 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate.norm thrpt 100 236.853 ± 196.100 B/op GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.count thrpt 100 ≈ 0 counts GrowableByteArrayDataOutputBenchmark.testWriteString2 thrpt 100 2.811 ± 0.022 ops/s GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate thrpt 100 0.001 ± 0.001 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate.norm thrpt 100 236.853 ± 196.100 B/op GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.count thrpt 100 ≈ 0 counts GrowableByteArrayDataOutputBenchmark.testWriteStringDefault thrpt 100 3.617 ± 0.009 ops/s GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate thrpt 100 1123.487 ± 2.667 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate.norm thrpt 100 325758521.800 ± 147.457 B/op GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.count thrpt 100 ≈ 0 counts
          Hide
          Robert Muir added a comment -

          I cant read the benchmarks (the table formatting is crazy) but now with this patch, there are allocations for every short string where there was not before. This should be fixed.

          Show
          Robert Muir added a comment - I cant read the benchmarks (the table formatting is crazy) but now with this patch, there are allocations for every short string where there was not before. This should be fixed.
          Hide
          Shalin Shekhar Mangar added a comment -

          This patch adds scratch bytes to GrowableArrayDataOutput itself to reduce allocations for small strings.

          Show
          Shalin Shekhar Mangar added a comment - This patch adds scratch bytes to GrowableArrayDataOutput itself to reduce allocations for small strings.
          Hide
          Shalin Shekhar Mangar added a comment -

          In these results, testWriteString3 is the one which uses scratch bytes to encode small strings:

          java -server -Xmx2048M -Xms2048M -Dtests.seed=18262 -Dtests.datagen.path=./data.txt -Dtests.string.minlen=5 -Dtests.string.maxlen=64 -Dtests.string.num=10000 -jar target/benchmarks.jar -wi 5 -i 50 -gc true -f 2 -prof gc ".*GrowableByteArrayDataOutputBenchmark.*"
          
          # Run complete. Total time: 00:08:55
          
          Benchmark                                                                                  Mode  Cnt        Score       Error   Units
          GrowableByteArrayDataOutputBenchmark.testWriteString1                                     thrpt  100  2915687.179 ±  4981.266   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate                      thrpt  100        0.001 ±     0.001  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate.norm                 thrpt  100       ≈ 10⁻⁴                B/op
          GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.count                           thrpt  100          ≈ 0              counts
          GrowableByteArrayDataOutputBenchmark.testWriteString2                                     thrpt  100  4216428.245 ±  7822.793   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate                      thrpt  100      595.210 ±     1.103  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate.norm                 thrpt  100      148.060 ±     0.001    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.count                           thrpt  100        1.000              counts
          GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.time                            thrpt  100        2.000                  ms
          GrowableByteArrayDataOutputBenchmark.testWriteString3                                     thrpt  100  4581362.617 ± 11683.766   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteString3:·gc.alloc.rate                      thrpt  100        0.001 ±     0.001  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteString3:·gc.alloc.rate.norm                 thrpt  100       ≈ 10⁻⁴                B/op
          GrowableByteArrayDataOutputBenchmark.testWriteString3:·gc.count                           thrpt  100          ≈ 0              counts
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault                               thrpt  100  4277669.423 ± 19742.963   ops/s
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate                thrpt  100      603.855 ±     2.787  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate.norm           thrpt  100      148.060 ±     0.001    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.churn.PS_Eden_Space       thrpt  100        5.947 ±    20.169  MB/sec
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.churn.PS_Eden_Space.norm  thrpt  100        1.462 ±     4.958    B/op
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.count                     thrpt  100        2.000              counts
          GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.time                      thrpt  100        3.000                  ms
          
          Show
          Shalin Shekhar Mangar added a comment - In these results, testWriteString3 is the one which uses scratch bytes to encode small strings: java -server -Xmx2048M -Xms2048M -Dtests.seed=18262 -Dtests.datagen.path=./data.txt -Dtests.string.minlen=5 -Dtests.string.maxlen=64 -Dtests.string.num=10000 -jar target/benchmarks.jar -wi 5 -i 50 -gc true -f 2 -prof gc ".*GrowableByteArrayDataOutputBenchmark.*" # Run complete. Total time: 00:08:55 Benchmark Mode Cnt Score Error Units GrowableByteArrayDataOutputBenchmark.testWriteString1 thrpt 100 2915687.179 ± 4981.266 ops/s GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate thrpt 100 0.001 ± 0.001 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.alloc.rate.norm thrpt 100 ≈ 10⁻⁴ B/op GrowableByteArrayDataOutputBenchmark.testWriteString1:·gc.count thrpt 100 ≈ 0 counts GrowableByteArrayDataOutputBenchmark.testWriteString2 thrpt 100 4216428.245 ± 7822.793 ops/s GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate thrpt 100 595.210 ± 1.103 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.alloc.rate.norm thrpt 100 148.060 ± 0.001 B/op GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.count thrpt 100 1.000 counts GrowableByteArrayDataOutputBenchmark.testWriteString2:·gc.time thrpt 100 2.000 ms GrowableByteArrayDataOutputBenchmark.testWriteString3 thrpt 100 4581362.617 ± 11683.766 ops/s GrowableByteArrayDataOutputBenchmark.testWriteString3:·gc.alloc.rate thrpt 100 0.001 ± 0.001 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteString3:·gc.alloc.rate.norm thrpt 100 ≈ 10⁻⁴ B/op GrowableByteArrayDataOutputBenchmark.testWriteString3:·gc.count thrpt 100 ≈ 0 counts GrowableByteArrayDataOutputBenchmark.testWriteStringDefault thrpt 100 4277669.423 ± 19742.963 ops/s GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate thrpt 100 603.855 ± 2.787 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.alloc.rate.norm thrpt 100 148.060 ± 0.001 B/op GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.churn.PS_Eden_Space thrpt 100 5.947 ± 20.169 MB/sec GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.churn.PS_Eden_Space.norm thrpt 100 1.462 ± 4.958 B/op GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.count thrpt 100 2.000 counts GrowableByteArrayDataOutputBenchmark.testWriteStringDefault:·gc.time thrpt 100 3.000 ms
          Hide
          Robert Muir added a comment -

          OK this is starting to look good. I personally don't mind the complexity (even though I feel huge documents is not really a use case, here is your search result, somewhere in this 10MB document), as long as we add unit tests for this output class that stress the string writing.

          The thing is, otherwise that logic is basically untested, because tests rarely make long strings. That is ripe for bugs now or in the future.

          Show
          Robert Muir added a comment - OK this is starting to look good. I personally don't mind the complexity (even though I feel huge documents is not really a use case, here is your search result, somewhere in this 10MB document), as long as we add unit tests for this output class that stress the string writing. The thing is, otherwise that logic is basically untested, because tests rarely make long strings. That is ripe for bugs now or in the future.
          Hide
          Shalin Shekhar Mangar added a comment -

          Sure, I'll add a test.

          Show
          Shalin Shekhar Mangar added a comment - Sure, I'll add a test.
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch with a TestGrowableByteArrayDataOutput that stresses small and large strings.

          Show
          Shalin Shekhar Mangar added a comment - Patch with a TestGrowableByteArrayDataOutput that stresses small and large strings.
          Hide
          Robert Muir added a comment -

          +1, thanks for adding those tests.

          Show
          Robert Muir added a comment - +1, thanks for adding those tests.
          Hide
          ASF subversion and git services added a comment -

          Commit 1703219 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1703219 ]

          LUCENE-6779: Reduce memory allocated by CompressingStoredFieldsWriter to write strings larger than 64kb by an amount equal to string's utf8 size

          Show
          ASF subversion and git services added a comment - Commit 1703219 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1703219 ] LUCENE-6779 : Reduce memory allocated by CompressingStoredFieldsWriter to write strings larger than 64kb by an amount equal to string's utf8 size
          Hide
          ASF subversion and git services added a comment -

          Commit 1703231 from shalin@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1703231 ]

          LUCENE-6779: Reduce memory allocated by CompressingStoredFieldsWriter to write strings larger than 64kb by an amount equal to string's utf8 size

          Show
          ASF subversion and git services added a comment - Commit 1703231 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1703231 ] LUCENE-6779 : Reduce memory allocated by CompressingStoredFieldsWriter to write strings larger than 64kb by an amount equal to string's utf8 size
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks for the reviews Dawid and Robert!

          Show
          Shalin Shekhar Mangar added a comment - Thanks for the reviews Dawid and Robert!

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Shalin Shekhar Mangar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development