Lucene - Core
  1. Lucene - Core
  2. LUCENE-6777

Switch GeoPointTermsEnum range list to use a reusable BytesRef

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      GeoPointTermsEnum currently constructs a BytesRef for every computed range, then sorts on this BytesRef. This adds an unnecessary memory overhead since the TermsEnum only requires BytesRef on calls to nextSeekTerm and accept and the ranges only need to be sorted by their long representation. This issue adds the following two improvements:

      1. Lazily compute the BytesRef on demand only when its needed
      2. Add a single, transient BytesRef to GeoPointTermsEnum

      This will further cut back on heap usage when constructing ranges across every segment.

      1. LUCENE-6777.patch
        5 kB
        Nicholas Knize
      2. LUCENE-6777.patch
        5 kB
        Nicholas Knize
      3. LUCENE-6777.patch
        5 kB
        Nicholas Knize
      4. LUCENE-6777.patch
        5 kB
        Nicholas Knize

        Issue Links

          Activity

          Hide
          Nicholas Knize added a comment -

          Patch that converts GeoPointTermsEnum to use a reusable BytesRef for comparing Ranges against indexed terms.

          Will add performance numbers shortly.

          Show
          Nicholas Knize added a comment - Patch that converts GeoPointTermsEnum to use a reusable BytesRef for comparing Ranges against indexed terms. Will add performance numbers shortly.
          Hide
          Michael McCandless added a comment -

          Thanks Nicholas Knize! I don't think we have to perf test before committing ... but can we try to simplify the code a bit? E.g. things like:

          -    if (term.compareTo(rangeBounds.get(0).cell) < 0) {
          +    if (term.compareTo(this.nextSubRange = rangeBounds.get(0).asBytesRef(this.nextSubRange)) < 0) {
          

          break that inner assignment out into its own line before?

          Another example, instead of:

              final int result = Short.compare(this.shift, other.shift);
              return (result == 0) ? Long.compare(this.start, other.start) : result;
          

          can we do e.g.:

              final int result = Short.compare(this.shift, other.shift);
              if (result == 0) {
                return result;
              }
              return Long.compare(this.start, other.start);
          

          Instead of cloning the somewhat hairy
          NumericUtils.longToPrefixCodedBytes, is it possible to e.g. reuse
          a shared BytesRefBuilder?

          Instead of the lazy alloc of reusable can we change the asBytesRef to
          fillBytesRef, and it takes the non-null BytesRef in, and
          returns void? It's confusing to both take a parameter and return a result that
          are sort of doing the same thing ...

          Show
          Michael McCandless added a comment - Thanks Nicholas Knize ! I don't think we have to perf test before committing ... but can we try to simplify the code a bit? E.g. things like: - if (term.compareTo(rangeBounds.get(0).cell) < 0) { + if (term.compareTo(this.nextSubRange = rangeBounds.get(0).asBytesRef(this.nextSubRange)) < 0) { break that inner assignment out into its own line before? Another example, instead of: final int result = Short.compare(this.shift, other.shift); return (result == 0) ? Long.compare(this.start, other.start) : result; can we do e.g.: final int result = Short.compare(this.shift, other.shift); if (result == 0) { return result; } return Long.compare(this.start, other.start); Instead of cloning the somewhat hairy NumericUtils.longToPrefixCodedBytes , is it possible to e.g. reuse a shared BytesRefBuilder ? Instead of the lazy alloc of reusable can we change the asBytesRef to fillBytesRef , and it takes the non-null BytesRef in, and returns void? It's confusing to both take a parameter and return a result that are sort of doing the same thing ...
          Hide
          Nicholas Knize added a comment -

          Attached patch that splits some of the longer ternary operators.

          is it possible to e.g. reuse a shared BytesRefBuilder?

          I would have preferred this to begin with, but longToPrefixCodedBytes calls the following

          bytes.grow(BUF_SIZE_LONG);

          So it continues to grow the byte array for whatever reused buffer is passed. As an alternative I can update NumericUtils to include a version that reuses a BytesRefBuilder? That is if we think it will be useful outside this one use case?

          Show
          Nicholas Knize added a comment - Attached patch that splits some of the longer ternary operators. is it possible to e.g. reuse a shared BytesRefBuilder? I would have preferred this to begin with, but longToPrefixCodedBytes calls the following bytes.grow(BUF_SIZE_LONG); So it continues to grow the byte array for whatever reused buffer is passed. As an alternative I can update NumericUtils to include a version that reuses a BytesRefBuilder? That is if we think it will be useful outside this one use case?
          Hide
          Nicholas Knize added a comment -

          Sorry, missed the last suggestion for refactoring

          asBytesRef

          to

          fillBytesRef

          . Updated patch attached.

          Show
          Nicholas Knize added a comment - Sorry, missed the last suggestion for refactoring asBytesRef to fillBytesRef . Updated patch attached.
          Hide
          Nicholas Knize added a comment -

          Deleted wrong patch... correct one attached.

          Show
          Nicholas Knize added a comment - Deleted wrong patch... correct one attached.
          Hide
          Michael McCandless added a comment -

          So it continues to grow the byte array for whatever reused buffer is passed.

          Hmm but that grow is (effectively: a couple if statements) a no-op if the builder's byte[] is already large enough?

          That NumericUtils encoding code is so hairy, I really don't think we want it in more places than one!

          Also, the check for reusable == null isn't needed? It can become assert reusable != null, and maybe rename reusable to result?

          Show
          Michael McCandless added a comment - So it continues to grow the byte array for whatever reused buffer is passed. Hmm but that grow is (effectively: a couple if statements) a no-op if the builder's byte[] is already large enough? That NumericUtils encoding code is so hairy, I really don't think we want it in more places than one! Also, the check for reusable == null isn't needed? It can become assert reusable != null , and maybe rename reusable to result ?
          Hide
          Nicholas Knize added a comment -

          Updated patch to address comments.

          • Removes duplicate longToPrefixCodedBytes
          • Refactors variable naming
          • Uses reusable BytesRefBuilder
          Show
          Nicholas Knize added a comment - Updated patch to address comments. Removes duplicate longToPrefixCodedBytes Refactors variable naming Uses reusable BytesRefBuilder
          Hide
          Michael McCandless added a comment -

          Thanks Nicholas Knize, new patch looks great! I'll run tests & commit shortly ...

          Show
          Michael McCandless added a comment - Thanks Nicholas Knize , new patch looks great! I'll run tests & commit shortly ...
          Hide
          ASF subversion and git services added a comment -

          Commit 1702307 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1702307 ]

          LUCENE-6777: reuse BytesRef when visiting term ranges in GeoPointTermsEnum

          Show
          ASF subversion and git services added a comment - Commit 1702307 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1702307 ] LUCENE-6777 : reuse BytesRef when visiting term ranges in GeoPointTermsEnum
          Hide
          ASF subversion and git services added a comment -

          Commit 1702308 from Michael McCandless in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1702308 ]

          LUCENE-6777: reuse BytesRef when visiting term ranges in GeoPointTermsEnum

          Show
          ASF subversion and git services added a comment - Commit 1702308 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1702308 ] LUCENE-6777 : reuse BytesRef when visiting term ranges in GeoPointTermsEnum
          Hide
          Michael McCandless added a comment -
          Show
          Michael McCandless added a comment - Thanks Nicholas Knize !

            People

            • Assignee:
              Unassigned
              Reporter:
              Nicholas Knize
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development