Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7071

Can we reeduce excessive byte[] copying in OfflineSorter?

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, master (7.0)
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      OfflineSorter, which dimensional points uses heavily in the > 1D case,
      works by reading one partition, a set of N unsorted values, from disk
      and sorting it in memory and writing it out again.

      The sort invokes a provided Comparator on two BytesRef values,
      each of which is fully copied from the ByteBlockPool, when it could
      often reference a slice from the pool instead.

      Another byte[] copy happens when iterating through the sorted values.

      This is an optimization ... I'm targeting 6.1.0 not 6.0.0!

      1. LUCENE-7071.patch
        9 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Patch, avoiding copying bytes when the referenced slice already lies within a single byte[] block from the pool. The new APIs are a bit ugly looking, however, they are package private, so I think it's OK? I also can't think of any cleaner way to pack bytes in so nothing is "wasted", yet avoid copying bytes in the common case.

        I also stumbled upon and fixed some pre-existing "ignore BytesRef.offset" bugs in suggest's SortedInputIterator.

        This gives a ~10% speedup on the time it takes to merge all ~61M 2D lat/lon points in the London, UK benchmark.

        Show
        mikemccand Michael McCandless added a comment - Patch, avoiding copying bytes when the referenced slice already lies within a single byte[] block from the pool. The new APIs are a bit ugly looking, however, they are package private, so I think it's OK? I also can't think of any cleaner way to pack bytes in so nothing is "wasted", yet avoid copying bytes in the common case. I also stumbled upon and fixed some pre-existing "ignore BytesRef.offset " bugs in suggest's SortedInputIterator . This gives a ~10% speedup on the time it takes to merge all ~61M 2D lat/lon points in the London, UK benchmark.
        Hide
        rcmuir Robert Muir added a comment -

        I think its ok since its package-private. We should avoid doing this kind of copying in a comparison function!

        Show
        rcmuir Robert Muir added a comment - I think its ok since its package-private. We should avoid doing this kind of copying in a comparison function!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 3d633c6e68ec7a2e47d398daae203582537593a4 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3d633c6 ]

        LUCENE-7071: reduce byte copying costs of OfflineSorter

        Show
        jira-bot ASF subversion and git services added a comment - Commit 3d633c6e68ec7a2e47d398daae203582537593a4 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3d633c6 ] LUCENE-7071 : reduce byte copying costs of OfflineSorter
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 549e6d7c497b1da0368f012ef0a2521cf0548582 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=549e6d7 ]

        LUCENE-7071: reduce byte copying costs of OfflineSorter

        Show
        jira-bot ASF subversion and git services added a comment - Commit 549e6d7c497b1da0368f012ef0a2521cf0548582 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=549e6d7 ] LUCENE-7071 : reduce byte copying costs of OfflineSorter
        Hide
        mikemccand Michael McCandless added a comment -

        Yay, a not-for-6.0 issue!

        Show
        mikemccand Michael McCandless added a comment - Yay, a not-for-6.0 issue!
        Hide
        hossman Hoss Man added a comment -

        Manually correcting fixVersion per Step #S5 of LUCENE-7271

        Show
        hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development