Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7568

Optimize merge when index sorting is used but the index is already sorted

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 6.4
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When the index sorting is defined a lot of optimizations are disabled during the merge. For instance the bulk merge of the compressing stored fields is disabled since documents are not merged sequentially. Though it can happen that index sorting is enabled but the index is already in sorted order (the sort field is not filled or filled with the same value for all documents). In such case we can detect that the sort is not needed and activate the merge optimization.

      1. LUCENE-7568.patch
        23 kB
        Jim Ferenczi
      2. LUCENE-7568.patch
        25 kB
        Jim Ferenczi

        Activity

        Hide
        jim.ferenczi Jim Ferenczi added a comment -

        Here is a first patch that detects if an index is already sorted and makes this information available through MergeState. This information is then used by all the merge strategy to activate (or not) some optimizations.

        Show
        jim.ferenczi Jim Ferenczi added a comment - Here is a first patch that detects if an index is already sorted and makes this information available through MergeState. This information is then used by all the merge strategy to activate (or not) some optimizations.
        Hide
        mikemccand Michael McCandless added a comment -

        This looks great! Thanks Jim Ferenczi.

        Maybe we could improve the new tests a bit to:

        • Allow merging, using newLogMergePolicy, which keeps docs in order but randomizes how merges are done; we want to make sure this opto still applies when a newly merged segment is then picked for another merge
        • Assert that the resulting MergeState.needsIndexSort is always false

        I think this would increase test coverage since MultiSorter.sort is only part of the logic in computing that boolean.

        To do that 2nd part ... I think you could make a simple FilterCodec that overrides one of the formats, e.g. PointsFormat, so that it can intercept the merge call at which point it would check the boolean?

        Show
        mikemccand Michael McCandless added a comment - This looks great! Thanks Jim Ferenczi . Maybe we could improve the new tests a bit to: Allow merging, using newLogMergePolicy , which keeps docs in order but randomizes how merges are done; we want to make sure this opto still applies when a newly merged segment is then picked for another merge Assert that the resulting MergeState.needsIndexSort is always false I think this would increase test coverage since MultiSorter.sort is only part of the logic in computing that boolean. To do that 2nd part ... I think you could make a simple FilterCodec that overrides one of the formats, e.g. PointsFormat , so that it can intercept the merge call at which point it would check the boolean?
        Hide
        jim.ferenczi Jim Ferenczi added a comment -

        Thanks for the review Michael McCandless].
        I've modified the test with your suggestions. I am not sure I use the FilterCodec appropriately though (especially how I choose the delegating codec), can you take a look ?

        Show
        jim.ferenczi Jim Ferenczi added a comment - Thanks for the review Michael McCandless ]. I've modified the test with your suggestions. I am not sure I use the FilterCodec appropriately though (especially how I choose the delegating codec), can you take a look ?
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Jim Ferenczi, nice test refactoring! The patch looks great ... I'll push shortly.

        Show
        mikemccand Michael McCandless added a comment - Thanks Jim Ferenczi , nice test refactoring! The patch looks great ... I'll push shortly.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 21735161dcbdfcad52220d0389637c43f0d7989d in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2173516 ]

        LUCENE-7568: Optimize merging when index sorting is used but the index is already sorted

        Show
        jira-bot ASF subversion and git services added a comment - Commit 21735161dcbdfcad52220d0389637c43f0d7989d in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2173516 ] LUCENE-7568 : Optimize merging when index sorting is used but the index is already sorted
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 8d7c540e2c5da81063acf5b29f2a86b670b4a969 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8d7c540 ]

        LUCENE-7568: Optimize merging when index sorting is used but the index is already sorted

        Show
        jira-bot ASF subversion and git services added a comment - Commit 8d7c540e2c5da81063acf5b29f2a86b670b4a969 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8d7c540 ] LUCENE-7568 : Optimize merging when index sorting is used but the index is already sorted
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Jim Ferenczi!

        Show
        mikemccand Michael McCandless added a comment - Thanks Jim Ferenczi !
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 37b75bef3f39840227f85aa5c330337fd101b003 in lucene-solr's branch refs/heads/master from Jim Ferenczi
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=37b75be ]

        LUCENE-7568: Fix test that should never create segments with a single document.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 37b75bef3f39840227f85aa5c330337fd101b003 in lucene-solr's branch refs/heads/master from Jim Ferenczi [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=37b75be ] LUCENE-7568 : Fix test that should never create segments with a single document.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 568f130c501c9c04a40a27e7952699490f155759 in lucene-solr's branch refs/heads/branch_6x from Jim Ferenczi
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=568f130 ]

        LUCENE-7568: Fix test that should never create segments with a single document.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 568f130c501c9c04a40a27e7952699490f155759 in lucene-solr's branch refs/heads/branch_6x from Jim Ferenczi [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=568f130 ] LUCENE-7568 : Fix test that should never create segments with a single document.

          People

          • Assignee:
            Unassigned
            Reporter:
            jim.ferenczi Jim Ferenczi
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development