Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: master (7.0), 6.2
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      1D points already have an optimized merge implementation which works when points come in order. So maybe we could make IndexWriter's PointValuesWriter sort before feeding the PointsFormat and somehow propagate the information to the PointsFormat?

      The benefit is that flushing could directly stream points to disk with little memory usage.

      1. LUCENE-7396.patch
        74 kB
        Adrien Grand
      2. LUCENE-7396.patch
        62 kB
        Adrien Grand
      3. LUCENE-7396.patch
        45 kB
        Adrien Grand
      4. LUCENE-7396.patch
        21 kB
        Adrien Grand

        Activity

        Hide
        jpountz Adrien Grand added a comment -

        Here is a patch that demonstrates the idea. PointValuesWriter sorts before feeding the PointsFormat's writer and Lucene60PointsWriter.addField consumes the PointsReader twice in the one dimension case: once to detect whether it is sorted, and once to write values similarly to the way merging works. In spite of the fact that the reader is consumed twice, I saw better indexing performance of IndexAndSearchOpenStreetMaps1D: 3 runs on master gave me 74, 78 and 75 seconds while 3 runs with this patch gave me 65, 67 and 67 seconds.

        It would be nice if somehow we could propagate the information that the reader is sorted rather than having to iterate it twice?

        Show
        jpountz Adrien Grand added a comment - Here is a patch that demonstrates the idea. PointValuesWriter sorts before feeding the PointsFormat's writer and Lucene60PointsWriter.addField consumes the PointsReader twice in the one dimension case: once to detect whether it is sorted, and once to write values similarly to the way merging works. In spite of the fact that the reader is consumed twice, I saw better indexing performance of IndexAndSearchOpenStreetMaps1D: 3 runs on master gave me 74, 78 and 75 seconds while 3 runs with this patch gave me 65, 67 and 67 seconds. It would be nice if somehow we could propagate the information that the reader is sorted rather than having to iterate it twice?
        Hide
        mikemccand Michael McCandless added a comment -

        +1, cool!

        Show
        mikemccand Michael McCandless added a comment - +1, cool!
        Hide
        jpountz Adrien Grand added a comment -

        Here is a patch that uses a different approach. Flush passes a special implementation of a PointsReader that allows points to be reordered, so that codecs can sort points in the order that they are interested in. The benefit compared to the previous patch is that it is not specific to a codec anymore and also that it can be used in the multi-dimensional case. I got the following flush times (as reported by the IndexWriter log) with a 1GB buffer:

        Flush time (ms) master patch
        IndexAndSearchOpenStreetMaps1D (1 dim) 31089 18954 (-39.0%)
        IndexAndSearchOpenStreetMaps (2 dims) 123461 85235 (-30.1%)

        This looks encouraging, especially given that it also uses less memory than the current approach. However the patch is a bit disappointing in that it has a completely different implementation of the writing of the tree depending on whether the input can be reordered or not. I'll look into whether I can clean this up a bit.

        Show
        jpountz Adrien Grand added a comment - Here is a patch that uses a different approach. Flush passes a special implementation of a PointsReader that allows points to be reordered, so that codecs can sort points in the order that they are interested in. The benefit compared to the previous patch is that it is not specific to a codec anymore and also that it can be used in the multi-dimensional case. I got the following flush times (as reported by the IndexWriter log) with a 1GB buffer: Flush time (ms) master patch IndexAndSearchOpenStreetMaps1D (1 dim) 31089 18954 ( -39.0% ) IndexAndSearchOpenStreetMaps (2 dims) 123461 85235 ( -30.1% ) This looks encouraging, especially given that it also uses less memory than the current approach. However the patch is a bit disappointing in that it has a completely different implementation of the writing of the tree depending on whether the input can be reordered or not. I'll look into whether I can clean this up a bit.
        Hide
        mikemccand Michael McCandless added a comment -

        These results are awesome! I tested IndexAndSearchOpenStreetMaps1D and saw good indexing gains.

        However, I also tested with the NYC taxi data (http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml). Each of these docs has ~20 points, and unfortunately somehow writing points (from the IW log) is a bit (10-20%) slower. I wonder if something about the data distribution somehow affects the performance change here?

        Also, I don't think we should pre-budget into IW's buffer for the int[] ords we allocate? That is really a transient thing, only allocated (and then freed, or at least reclaimable by GC) for the one field currently writing its points, so I think it's fair to not count that against IW's buffer? It's allowed that IW allocates heap beyond its RAM buffer for transient things like this...

        Show
        mikemccand Michael McCandless added a comment - These results are awesome! I tested IndexAndSearchOpenStreetMaps1D and saw good indexing gains. However, I also tested with the NYC taxi data ( http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml ). Each of these docs has ~20 points, and unfortunately somehow writing points (from the IW log) is a bit (10-20%) slower. I wonder if something about the data distribution somehow affects the performance change here? Also, I don't think we should pre-budget into IW's buffer for the int[] ords we allocate? That is really a transient thing, only allocated (and then freed, or at least reclaimable by GC) for the one field currently writing its points, so I think it's fair to not count that against IW's buffer? It's allowed that IW allocates heap beyond its RAM buffer for transient things like this...
        Hide
        mikemccand Michael McCandless added a comment -

        I added a sop for how long it takes to flush each field's points.

        Trunk:

        127.0.0.1: IW 0 [2016-07-26T23:44:01.962Z; LuceneIndexing-1-thread-1]: 2047 msec to write docValues
        127.0.0.1: FIELD surcharge: 2218.62956 msec
        127.0.0.1: FIELD pick_up_lon: 1968.539463 msec
        127.0.0.1: FIELD mta_tax: 1967.196068 msec
        127.0.0.1: FIELD fare_amount: 2179.47771 msec
        127.0.0.1: FIELD trip_distance: 2042.19908 msec
        127.0.0.1: FIELD pick_up_lat: 2002.735026 msec
        127.0.0.1: FIELD drop_off_date_time: 1315.330631 msec
        127.0.0.1: FIELD tolls_amount: 1942.725832 msec
        127.0.0.1: FIELD passenger_count: 797.681094 msec
        127.0.0.1: FIELD drop_off_lon: 1937.206549 msec
        127.0.0.1: FIELD total_amount: 1987.058896 msec
        127.0.0.1: FIELD pick_up_date_time: 1296.005252 msec
        127.0.0.1: FIELD tip_amount: 2063.410329 msec
        127.0.0.1: FIELD drop_off_lat: 2045.020601 msec
        127.0.0.1: IW 0 [2016-07-26T23:44:27.726Z; LuceneIndexing-1-thread-1]: 25764 msec to write points
        

        and patch:

        127.0.0.1: IW 0 [2016-07-26T23:49:54.494Z; LuceneIndexing-1-thread-1]: 2033 msec to write docValues
        127.0.0.1: FIELD surcharge: 2137.926903 msec
        127.0.0.1: FIELD pick_up_lon: 2511.391725 msec
        127.0.0.1: FIELD mta_tax: 2144.822578 msec
        127.0.0.1: FIELD fare_amount: 3232.977894 msec
        127.0.0.1: FIELD trip_distance: 2545.771801 msec
        127.0.0.1: FIELD pick_up_lat: 2939.796276 msec
        127.0.0.1: FIELD drop_off_date_time: 1272.857191 msec
        127.0.0.1: FIELD tolls_amount: 2042.863782 msec
        127.0.0.1: FIELD passenger_count: 565.551751 msec
        127.0.0.1: FIELD drop_off_lon: 2493.79608 msec
        127.0.0.1: FIELD total_amount: 2596.043882 msec
        127.0.0.1: FIELD pick_up_date_time: 1308.397927 msec
        127.0.0.1: FIELD tip_amount: 2316.962831 msec
        127.0.0.1: FIELD drop_off_lat: 2748.935673 msec
        127.0.0.1: IW 0 [2016-07-26T23:50:25.415Z; LuceneIndexing-1-thread-1]: 30920 msec to write points
        

        This is using 1 thread w/ 1 GB IW buffer.

        Some fields seem to take similar time, but others are sizably different (e.g. the lat/lon points) ... odd.

        Show
        mikemccand Michael McCandless added a comment - I added a sop for how long it takes to flush each field's points. Trunk: 127.0.0.1: IW 0 [2016-07-26T23:44:01.962Z; LuceneIndexing-1-thread-1]: 2047 msec to write docValues 127.0.0.1: FIELD surcharge: 2218.62956 msec 127.0.0.1: FIELD pick_up_lon: 1968.539463 msec 127.0.0.1: FIELD mta_tax: 1967.196068 msec 127.0.0.1: FIELD fare_amount: 2179.47771 msec 127.0.0.1: FIELD trip_distance: 2042.19908 msec 127.0.0.1: FIELD pick_up_lat: 2002.735026 msec 127.0.0.1: FIELD drop_off_date_time: 1315.330631 msec 127.0.0.1: FIELD tolls_amount: 1942.725832 msec 127.0.0.1: FIELD passenger_count: 797.681094 msec 127.0.0.1: FIELD drop_off_lon: 1937.206549 msec 127.0.0.1: FIELD total_amount: 1987.058896 msec 127.0.0.1: FIELD pick_up_date_time: 1296.005252 msec 127.0.0.1: FIELD tip_amount: 2063.410329 msec 127.0.0.1: FIELD drop_off_lat: 2045.020601 msec 127.0.0.1: IW 0 [2016-07-26T23:44:27.726Z; LuceneIndexing-1-thread-1]: 25764 msec to write points and patch: 127.0.0.1: IW 0 [2016-07-26T23:49:54.494Z; LuceneIndexing-1-thread-1]: 2033 msec to write docValues 127.0.0.1: FIELD surcharge: 2137.926903 msec 127.0.0.1: FIELD pick_up_lon: 2511.391725 msec 127.0.0.1: FIELD mta_tax: 2144.822578 msec 127.0.0.1: FIELD fare_amount: 3232.977894 msec 127.0.0.1: FIELD trip_distance: 2545.771801 msec 127.0.0.1: FIELD pick_up_lat: 2939.796276 msec 127.0.0.1: FIELD drop_off_date_time: 1272.857191 msec 127.0.0.1: FIELD tolls_amount: 2042.863782 msec 127.0.0.1: FIELD passenger_count: 565.551751 msec 127.0.0.1: FIELD drop_off_lon: 2493.79608 msec 127.0.0.1: FIELD total_amount: 2596.043882 msec 127.0.0.1: FIELD pick_up_date_time: 1308.397927 msec 127.0.0.1: FIELD tip_amount: 2316.962831 msec 127.0.0.1: FIELD drop_off_lat: 2748.935673 msec 127.0.0.1: IW 0 [2016-07-26T23:50:25.415Z; LuceneIndexing-1-thread-1]: 30920 msec to write points This is using 1 thread w/ 1 GB IW buffer. Some fields seem to take similar time, but others are sizably different (e.g. the lat/lon points) ... odd.
        Hide
        mikemccand Michael McCandless added a comment -

        Also, for these tests, I fixed IW's buffer to be the same (i.e. didn't count the int[] ords against IW's buffer), so that in both cases we are flushing after the same number of docs.

        Show
        mikemccand Michael McCandless added a comment - Also, for these tests, I fixed IW's buffer to be the same (i.e. didn't count the int[] ords against IW's buffer), so that in both cases we are flushing after the same number of docs.
        Hide
        jpountz Adrien Grand added a comment -

        I looked into the slow down with Mike. The radix sort I was using in the 1D case has a fallback to quicksort when the range gets more narrow, which was pretty slow since it would call ByteBlockPool.readBytes for every single byte, it should be better now. I suspect I did not hit the problem with IndexAndSearchOpenStreetMaps1D because most of the time was spent on the first levels of recursion.

        The 2D case also got improved by using a radix select when recursively building the bkd tree instead of quickselect. The tie breaking on doc ids got improved by only looking at relevant bytes since we can know the number of required bits up-front thanks to maxDoc. And IW does not pre-budget ords anymore.

        I got the following IW logs when running IndexTaxis and IndexAndSearchOpenStreetMaps:

        master IndexAndSearchOpenStreetMaps, rambuffer=128MB
        IW 0 [2016-07-27T15:38:21.308Z; Thread-0]: 17525 msec to write points
        IW 0 [2016-07-27T15:38:44.657Z; Thread-0]: 16746 msec to write points
        IW 0 [2016-07-27T15:39:08.278Z; Thread-0]: 16982 msec to write points
        IW 0 [2016-07-27T15:39:32.613Z; Thread-0]: 17568 msec to write points
        IW 0 [2016-07-27T15:39:56.056Z; Thread-0]: 16684 msec to write points
        IW 0 [2016-07-27T15:40:06.646Z; main]: 7324 msec to write points
        
        master IndexTaxis, first 4 flushes
        IW 0 [2016-07-27T15:42:10.401Z; Thread-0]: 34422 msec to write points
        IW 0 [2016-07-27T15:43:15.561Z; Thread-0]: 32306 msec to write points
        IW 0 [2016-07-27T15:44:20.702Z; Thread-0]: 31753 msec to write points
        IW 0 [2016-07-27T15:45:24.920Z; Thread-0]: 32340 msec to write points
        
        patch IndexAndSearchOpenStreetMaps, ramBuffer=128MB
        IW 0 [2016-07-27T15:55:49.959Z; Thread-0]: 10581 msec to write points
        IW 0 [2016-07-27T15:56:08.098Z; Thread-0]: 10306 msec to write points
        IW 0 [2016-07-27T15:56:25.445Z; Thread-0]: 10226 msec to write points
        IW 0 [2016-07-27T15:56:42.513Z; Thread-0]: 10308 msec to write points
        IW 0 [2016-07-27T15:56:59.898Z; Thread-0]: 10162 msec to write points
        IW 0 [2016-07-27T15:57:08.497Z; main]: 4593 msec to write points
        
        patch IndexTaxis, first 4 flushes
        IW 0 [2016-07-27T15:47:10.906Z; Thread-0]: 25673 msec to write points
        IW 0 [2016-07-27T15:48:06.356Z; Thread-0]: 23615 msec to write points
        IW 0 [2016-07-27T15:49:03.327Z; Thread-0]: 23915 msec to write points
        IW 0 [2016-07-27T15:49:59.424Z; Thread-0]: 23482 msec to write points
        
        Show
        jpountz Adrien Grand added a comment - I looked into the slow down with Mike. The radix sort I was using in the 1D case has a fallback to quicksort when the range gets more narrow, which was pretty slow since it would call ByteBlockPool.readBytes for every single byte, it should be better now. I suspect I did not hit the problem with IndexAndSearchOpenStreetMaps1D because most of the time was spent on the first levels of recursion. The 2D case also got improved by using a radix select when recursively building the bkd tree instead of quickselect. The tie breaking on doc ids got improved by only looking at relevant bytes since we can know the number of required bits up-front thanks to maxDoc. And IW does not pre-budget ords anymore. I got the following IW logs when running IndexTaxis and IndexAndSearchOpenStreetMaps: master IndexAndSearchOpenStreetMaps, rambuffer=128MB IW 0 [2016-07-27T15:38:21.308Z; Thread-0]: 17525 msec to write points IW 0 [2016-07-27T15:38:44.657Z; Thread-0]: 16746 msec to write points IW 0 [2016-07-27T15:39:08.278Z; Thread-0]: 16982 msec to write points IW 0 [2016-07-27T15:39:32.613Z; Thread-0]: 17568 msec to write points IW 0 [2016-07-27T15:39:56.056Z; Thread-0]: 16684 msec to write points IW 0 [2016-07-27T15:40:06.646Z; main]: 7324 msec to write points master IndexTaxis, first 4 flushes IW 0 [2016-07-27T15:42:10.401Z; Thread-0]: 34422 msec to write points IW 0 [2016-07-27T15:43:15.561Z; Thread-0]: 32306 msec to write points IW 0 [2016-07-27T15:44:20.702Z; Thread-0]: 31753 msec to write points IW 0 [2016-07-27T15:45:24.920Z; Thread-0]: 32340 msec to write points patch IndexAndSearchOpenStreetMaps, ramBuffer=128MB IW 0 [2016-07-27T15:55:49.959Z; Thread-0]: 10581 msec to write points IW 0 [2016-07-27T15:56:08.098Z; Thread-0]: 10306 msec to write points IW 0 [2016-07-27T15:56:25.445Z; Thread-0]: 10226 msec to write points IW 0 [2016-07-27T15:56:42.513Z; Thread-0]: 10308 msec to write points IW 0 [2016-07-27T15:56:59.898Z; Thread-0]: 10162 msec to write points IW 0 [2016-07-27T15:57:08.497Z; main]: 4593 msec to write points patch IndexTaxis, first 4 flushes IW 0 [2016-07-27T15:47:10.906Z; Thread-0]: 25673 msec to write points IW 0 [2016-07-27T15:48:06.356Z; Thread-0]: 23615 msec to write points IW 0 [2016-07-27T15:49:03.327Z; Thread-0]: 23915 msec to write points IW 0 [2016-07-27T15:49:59.424Z; Thread-0]: 23482 msec to write points
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Adrien Grand, I also saw similar initial speedups when running
        IndexTaxis.java from luceneutil. I'll re-index all 1.2 B points
        w/ N threads and compare performance shortly.

        This is a very impressive change; here's my attempt at the high-level
        summary:

        • Improves utility apis for the select algorithm: new abstract
          Selector class, with Quick/Radix/IntroSelector impls
        • At flush, takes advantage of the fact that all values are already
          in heap (IW's RAM buffer), more cleanly than LUCENE-7390 (should
          we revert that?)
        • Fix 1D BKD flushing to also use the optimized writing that we do at
          merge
        • Fix N dim flushing to avoid up front sort on each dimension and
          instead use select algo at each recursion to find the split
          point, and only do a sort at each leaf block.
        • The general case (merging, N dims) still requires up front sort,
          likely using offline sorter

        Using select to split on each dim is faster for at least two reasons:

        • Select algo in the typical case is linear time, and N * log(N)
          sorting 1000 small chunks that are 1000 times smaller each is
          less work: N * log(N) vs N * log(N/1000).
        • The sorting in the leaf block is only in the 1 dimension
          needed for the run-length compression, vs N dimensions
          currently.

        Maybe MutablePointsReader should expose public byte getByteAt(int i, int bytePos);?
        This would save copying a whole value just because you
        want to see a specific byte.

        That's very clever how you logically "concatenate" the packed docID
        onto the end of the byte[] values that we are selecting in BKDWriter:

            final int shift = bitsPerDocId - ((k - cmpBytes + 1) << 3);
            return (reader.getDocID(i) >>> Math.max(0, shift)) & 0xff;
        

        I hope our points test cases are stressful enough in testing this
        tie break case!

        On line 522 of BKDWriter.java can't byte[] pivot = scratch1 be
        final? And also line 1426?

        The javadocs for RadixSelector.getFallbackSelector say sorter
        but should be selector? Can you improve those javadacs,
        e.g. fallback is used when too much recursion has happened or when the
        range is tiny?

        Can we remove RadixSelector.quickSelect and inline it into the one
        place it's called in select?

        Hmm, why are we changing ord from long to int here?:

           // only called from assert
        -  private boolean valueInOrder(long ord, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset) {
        -    if (ord > 0 && StringHelper.compare(bytesPerDim, lastPackedValue, 0, packedValue, packedValueOffset) > 0) {
        -      throw new AssertionError("values out of order: last value=" + new BytesRef(lastPackedValue) + " current value=" + new BytesRef(packedValue, packedValueOffset, packedBytesLength) + " ord=" + ord);
        +  private boolean valueInOrder(int ord, int sortedDim, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset,
        +      int[] docs, int docsOffset) {
        +    int dimOffset = sortedDim * bytesPerDim;
        +    if (ord > 0) {
        +      int cmp = StringHelper.compare(bytesPerDim, lastPackedValue, dimOffset, packedValue, packedValueOffset + dimOffset);
        +      if (cmp > 0) {
        +        throw new AssertionError("values out of order: last value=" + new BytesRef(lastPackedValue) + " current value=" + new BytesRef(packedValue, packedValueOffset, packedBytesLength) + " ord=" + ord);
        +      }
        +      if (cmp == 0 && docs[docsOffset + ord - 1] > docs[docsOffset + ord]) {
        +        throw new AssertionError("docs out of order: last doc=" + docs[docsOffset + ord - 1] + " current doc=" + docs[docsOffset + ord] + " ord=" + ord);
        +      }
             }
        -    System.arraycopy(packedValue, packedValueOffset, lastPackedValue, 0, bytesPerDim);
        +    System.arraycopy(packedValue, packedValueOffset, lastPackedValue, 0, packedBytesLength);
             return true;
           }
        

        It looks like we no longer call this from assert in the merge 1D
        case, except within one leaf block? Was that intentional?

        Show
        mikemccand Michael McCandless added a comment - Thanks Adrien Grand , I also saw similar initial speedups when running IndexTaxis.java from luceneutil. I'll re-index all 1.2 B points w/ N threads and compare performance shortly. This is a very impressive change; here's my attempt at the high-level summary: Improves utility apis for the select algorithm: new abstract Selector class, with Quick/Radix/IntroSelector impls At flush, takes advantage of the fact that all values are already in heap (IW's RAM buffer), more cleanly than LUCENE-7390 (should we revert that?) Fix 1D BKD flushing to also use the optimized writing that we do at merge Fix N dim flushing to avoid up front sort on each dimension and instead use select algo at each recursion to find the split point, and only do a sort at each leaf block. The general case (merging, N dims) still requires up front sort, likely using offline sorter Using select to split on each dim is faster for at least two reasons: Select algo in the typical case is linear time, and N * log(N) sorting 1000 small chunks that are 1000 times smaller each is less work: N * log(N) vs N * log(N/1000). The sorting in the leaf block is only in the 1 dimension needed for the run-length compression, vs N dimensions currently. Maybe MutablePointsReader should expose public byte getByteAt(int i, int bytePos); ? This would save copying a whole value just because you want to see a specific byte. That's very clever how you logically "concatenate" the packed docID onto the end of the byte[] values that we are selecting in BKDWriter: final int shift = bitsPerDocId - ((k - cmpBytes + 1) << 3); return (reader.getDocID(i) >>> Math.max(0, shift)) & 0xff; I hope our points test cases are stressful enough in testing this tie break case! On line 522 of BKDWriter.java can't byte[] pivot = scratch1 be final? And also line 1426? The javadocs for RadixSelector.getFallbackSelector say sorter but should be selector ? Can you improve those javadacs, e.g. fallback is used when too much recursion has happened or when the range is tiny? Can we remove RadixSelector.quickSelect and inline it into the one place it's called in select ? Hmm, why are we changing ord from long to int here?: // only called from assert - private boolean valueInOrder(long ord, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset) { - if (ord > 0 && StringHelper.compare(bytesPerDim, lastPackedValue, 0, packedValue, packedValueOffset) > 0) { - throw new AssertionError("values out of order: last value=" + new BytesRef(lastPackedValue) + " current value=" + new BytesRef(packedValue, packedValueOffset, packedBytesLength) + " ord=" + ord); + private boolean valueInOrder(int ord, int sortedDim, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset, + int[] docs, int docsOffset) { + int dimOffset = sortedDim * bytesPerDim; + if (ord > 0) { + int cmp = StringHelper.compare(bytesPerDim, lastPackedValue, dimOffset, packedValue, packedValueOffset + dimOffset); + if (cmp > 0) { + throw new AssertionError("values out of order: last value=" + new BytesRef(lastPackedValue) + " current value=" + new BytesRef(packedValue, packedValueOffset, packedBytesLength) + " ord=" + ord); + } + if (cmp == 0 && docs[docsOffset + ord - 1] > docs[docsOffset + ord]) { + throw new AssertionError("docs out of order: last doc=" + docs[docsOffset + ord - 1] + " current doc=" + docs[docsOffset + ord] + " ord=" + ord); + } } - System.arraycopy(packedValue, packedValueOffset, lastPackedValue, 0, bytesPerDim); + System.arraycopy(packedValue, packedValueOffset, lastPackedValue, 0, packedBytesLength); return true; } It looks like we no longer call this from assert in the merge 1D case, except within one leaf block? Was that intentional?
        Hide
        mikemccand Michael McCandless added a comment -

        I indexed all 1.2 B NYC taxi rides using luceneserver (https://github.com/mikemccand/luceneserver) and this patch is 13% faster overall indexing time (200.7 K docs/sec -> 226.7 K docs/sec): nice!

        Show
        mikemccand Michael McCandless added a comment - I indexed all 1.2 B NYC taxi rides using luceneserver ( https://github.com/mikemccand/luceneserver ) and this patch is 13% faster overall indexing time (200.7 K docs/sec -> 226.7 K docs/sec): nice!
        Hide
        jpountz Adrien Grand added a comment -

        This is a great summary of the change.

        more cleanly than LUCENE-7390 (should we revert that?)

        I agree we should only have this or LUCENE-7390, not both.

        Maybe MutablePointsReader should expose public byte getByteAt(int i, int bytePos);? This would save copying a whole value just because you want to see a specific byte.

        I initally wanted to limit the number of methods to a minimum but since the use of this API is very contained it is probably ok. It also seems to help with flush performance, I am now seeing flush taking ~20 secs instead of 23 previously with IndexTaxis.

        I hope our points test cases are stressful enough in testing this tie break case!

        Indeed it was not, changing this branch to return a constant does not break the tests. I extracted the sorting/partitioning logic to a helper class with dedicated tests to test this better (it would require too many docs to be indexed otherwise).

        It looks like we no longer call this from assert in the merge 1D case, except within one leaf block? Was that intentional?

        Definitely not. I added it back.

        Show
        jpountz Adrien Grand added a comment - This is a great summary of the change. more cleanly than LUCENE-7390 (should we revert that?) I agree we should only have this or LUCENE-7390 , not both. Maybe MutablePointsReader should expose public byte getByteAt(int i, int bytePos);? This would save copying a whole value just because you want to see a specific byte. I initally wanted to limit the number of methods to a minimum but since the use of this API is very contained it is probably ok. It also seems to help with flush performance, I am now seeing flush taking ~20 secs instead of 23 previously with IndexTaxis. I hope our points test cases are stressful enough in testing this tie break case! Indeed it was not, changing this branch to return a constant does not break the tests. I extracted the sorting/partitioning logic to a helper class with dedicated tests to test this better (it would require too many docs to be indexed otherwise). It looks like we no longer call this from assert in the merge 1D case, except within one leaf block? Was that intentional? Definitely not. I added it back.
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Adrien Grand, I'm re-indexing the 1.2B taxi rides!

        Can't we use the MutableReader.getByteAt when computing the cardinality of each dim's suffix prefix byte in the leaf block writing too?

        {@cod mid} should be {@code mid} in MutablePointsReaderUtils.partition, but hopefully ant precommit would catch that.

        The patch looks great: +1 to commit. We can polish in follow on issues ... I think it's important to get the builds chewing on this great improvement.

        I'll revert LUCENE-7390 once you've pushed.

        Show
        mikemccand Michael McCandless added a comment - Thanks Adrien Grand , I'm re-indexing the 1.2B taxi rides! Can't we use the MutableReader.getByteAt when computing the cardinality of each dim's suffix prefix byte in the leaf block writing too? {@cod mid} should be {@code mid} in MutablePointsReaderUtils.partition , but hopefully ant precommit would catch that. The patch looks great: +1 to commit. We can polish in follow on issues ... I think it's important to get the builds chewing on this great improvement. I'll revert LUCENE-7390 once you've pushed.
        Hide
        mikemccand Michael McCandless added a comment -

        I made a separate speedup in luceneserver to bring indexing rate to 232.7 K docs/sec, and then with this patch it gets even faster to 261.4 K docs/sec, net/net ~27% speedup!

        Show
        mikemccand Michael McCandless added a comment - I made a separate speedup in luceneserver to bring indexing rate to 232.7 K docs/sec, and then with this patch it gets even faster to 261.4 K docs/sec, net/net ~27% speedup!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 6b8b34ed358287a1cadf848e14089601a3ce1671 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6b8b34e ]

        LUCENE-7396: Speed flush of points.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 6b8b34ed358287a1cadf848e14089601a3ce1671 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6b8b34e ] LUCENE-7396 : Speed flush of points.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 60975d2dfa0bf855220db6d9755b7e24c14a59bb in lucene-solr's branch refs/heads/master from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=60975d2 ]

        LUCENE-7396: Speed flush of points.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 60975d2dfa0bf855220db6d9755b7e24c14a59bb in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=60975d2 ] LUCENE-7396 : Speed flush of points.
        Hide
        jpountz Adrien Grand added a comment -

        Michael McCandless I'm curious what this separate speedup is about?

        Show
        jpountz Adrien Grand added a comment - Michael McCandless I'm curious what this separate speedup is about?
        Hide
        mikemccand Michael McCandless added a comment -

        Michael McCandless I'm curious what this separate speedup is about?

        It was this change: https://github.com/mikemccand/luceneserver/commit/e69c560d1e22f7b7e03b9ff88fd1d6f6d0079fdc

        Previously luceneserver was always building facets, but I fixed it to skip that if there are no facet fields since it's a somewhat expensive no-op in that case.

        Show
        mikemccand Michael McCandless added a comment - Michael McCandless I'm curious what this separate speedup is about? It was this change: https://github.com/mikemccand/luceneserver/commit/e69c560d1e22f7b7e03b9ff88fd1d6f6d0079fdc Previously luceneserver was always building facets, but I fixed it to skip that if there are no facet fields since it's a somewhat expensive no-op in that case.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1aecdd28d130c757770de67bfde52f3c989bd134 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1aecdd2 ]

        LUCENE-7390: revert this change, since it's obsoleted by the much better LUCENE-7396

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1aecdd28d130c757770de67bfde52f3c989bd134 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1aecdd2 ] LUCENE-7390 : revert this change, since it's obsoleted by the much better LUCENE-7396
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit c5c5335c9b36b30b5f49f8d354011d6fa874f383 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c5c5335 ]

        LUCENE-7390: revert this change, since it's obsoleted by the much better LUCENE-7396

        Show
        jira-bot ASF subversion and git services added a comment - Commit c5c5335c9b36b30b5f49f8d354011d6fa874f383 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c5c5335 ] LUCENE-7390 : revert this change, since it's obsoleted by the much better LUCENE-7396
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 60975d2dfa0bf855220db6d9755b7e24c14a59bb in lucene-solr's branch refs/heads/apiv2 from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=60975d2 ]

        LUCENE-7396: Speed flush of points.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 60975d2dfa0bf855220db6d9755b7e24c14a59bb in lucene-solr's branch refs/heads/apiv2 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=60975d2 ] LUCENE-7396 : Speed flush of points.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1aecdd28d130c757770de67bfde52f3c989bd134 in lucene-solr's branch refs/heads/apiv2 from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1aecdd2 ]

        LUCENE-7390: revert this change, since it's obsoleted by the much better LUCENE-7396

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1aecdd28d130c757770de67bfde52f3c989bd134 in lucene-solr's branch refs/heads/apiv2 from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1aecdd2 ] LUCENE-7390 : revert this change, since it's obsoleted by the much better LUCENE-7396
        Hide
        mikemccand Michael McCandless added a comment -

        Bulk close resolved issues after 6.2.0 release.

        Show
        mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.

          People

          • Assignee:
            Unassigned
            Reporter:
            jpountz Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development