Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently it always does 4 passes over the data (one per byte, since ints have 4 bytes). However, most of the time, we know maxDoc, so we can use this information to do fewer passes when they are not necessary. For instance, if maxDoc is less than or equal to 2^24, we only need 3 passes, and if maxDoc is less than or equals to 2^16, we only need two passes.

      1. LUCENE-7261.patch
        8 kB
        Adrien Grand
      2. MSBRadixSorter.java
        4 kB
        Adrien Grand

        Activity

        Hide
        jpountz Adrien Grand added a comment -

        Here is a patch. It also removes the offset parameter since we always use 0 in practice. This makes the code much easier to follow.

        Show
        jpountz Adrien Grand added a comment - Here is a patch. It also removes the offset parameter since we always use 0 in practice. This makes the code much easier to follow.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        +1

        As an aside, I was looking at this stuff a while ago and had decided on trying an MSB sort first (before you added LSB).

        • can be in-place
        • since all buckets are sorted relative to each other, can delegate to a different sorting algorithm when buckets become small
        Show
        yseeley@gmail.com Yonik Seeley added a comment - +1 As an aside, I was looking at this stuff a while ago and had decided on trying an MSB sort first (before you added LSB). can be in-place since all buckets are sorted relative to each other, can delegate to a different sorting algorithm when buckets become small
        Hide
        jpountz Adrien Grand added a comment -

        I implemented LSB because it is very easy to implement. But +1 to explore whether we can make things faster or generate less garbage with MSB sort first.

        Show
        jpountz Adrien Grand added a comment - I implemented LSB because it is very easy to implement. But +1 to explore whether we can make things faster or generate less garbage with MSB sort first.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        I had prev implemented MSD for integers and I was grepping for that code this morning... can't seem to find it

        Show
        yseeley@gmail.com Yonik Seeley added a comment - I had prev implemented MSD for integers and I was grepping for that code this morning... can't seem to find it
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 8ca6f6651ede19bfaee9051e9b87927685cb9be0 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8ca6f66 ]

        LUCENE-7261: Speed up LSBRadixSorter.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 8ca6f6651ede19bfaee9051e9b87927685cb9be0 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8ca6f66 ] LUCENE-7261 : Speed up LSBRadixSorter.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ef45d4b2e1f9c967b62340acb027f50888a00ba2 in lucene-solr's branch refs/heads/master from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ef45d4b ]

        LUCENE-7261: Speed up LSBRadixSorter.

        Show
        jira-bot ASF subversion and git services added a comment - Commit ef45d4b2e1f9c967b62340acb027f50888a00ba2 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ef45d4b ] LUCENE-7261 : Speed up LSBRadixSorter.
        Hide
        hossman Hoss Man added a comment -

        Manually correcting fixVersion per Step #S5 of LUCENE-7271

        Show
        hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271
        Hide
        jpountz Adrien Grand added a comment -

        Yonik Seeley FYI I played with MSB radix sort to see how it compares with LSB in terms of performance but LSB outperformed it by about 2x. I am attaching the code here in case you want to have a look, it is totally possible that the implementation is not optimal...

        Show
        jpountz Adrien Grand added a comment - Yonik Seeley FYI I played with MSB radix sort to see how it compares with LSB in terms of performance but LSB outperformed it by about 2x. I am attaching the code here in case you want to have a look, it is totally possible that the implementation is not optimal...
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Thanks Adrien, interesting. Definitely not the results I expected (but I never tested against anything else). IIRC I think reorder looked a bit different in my implementation. When I get some time, I'll take a crack at it just for the fun of it

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Thanks Adrien, interesting. Definitely not the results I expected (but I never tested against anything else). IIRC I think reorder looked a bit different in my implementation. When I get some time, I'll take a crack at it just for the fun of it

          People

          • Assignee:
            jpountz Adrien Grand
            Reporter:
            jpountz Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development