Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7671

Enhance UpgradeIndexMergePolicy with additional options

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Enhance UpgradeIndexMergePolicy to be a MergePolicy that can be used outside the scope the IndexUpgrader.

      The enhancement aims to allow the UpgradeIndexMergePolicy to:

      1) Delegate normal force merges to the underlying merge policy
      2) Enable a flag that will explicitly tell UpgradeIndexMergePolicy when it should start looking for upgrades.
      3) Allow new segments to be considered to be merged with old segments, depending on underlying MergePolicy.
      4) Be configurable for backwards compatibility such that only segments needing an upgrade would be considered when merging, no explicit upgrades.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user kelaban opened a pull request:

          https://github.com/apache/lucene-solr/pull/151

          LUCENE-7671 - Enhance UpgradeIndexMergePolicy with additional options

          Also updates Backwards compatibility tests

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/kelaban/lucene-solr jira/master/LUCENE-7671/upgrading-merge-policy

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/151.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #151


          commit c0120745617cbd50a920338cb83e751b9a26963e
          Author: Keith Laban <klaban1@bloomberg.net>
          Date: 2017-01-25T17:08:22Z

          LUCENE-7671 - Enhance UpgradeIndexMergePolicy with additional options


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user kelaban opened a pull request: https://github.com/apache/lucene-solr/pull/151 LUCENE-7671 - Enhance UpgradeIndexMergePolicy with additional options Also updates Backwards compatibility tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/kelaban/lucene-solr jira/master/ LUCENE-7671 /upgrading-merge-policy Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/151.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #151 commit c0120745617cbd50a920338cb83e751b9a26963e Author: Keith Laban <klaban1@bloomberg.net> Date: 2017-01-25T17:08:22Z LUCENE-7671 - Enhance UpgradeIndexMergePolicy with additional options
          Hide
          mikemccand Michael McCandless added a comment -

          Can you describe the overall goal/use case here? E.g. is the goal to reshape UpgradeIndexMergePolicy so that you can use it indefinitely as the merge policy for your IndexWriter?

          Show
          mikemccand Michael McCandless added a comment - Can you describe the overall goal/use case here? E.g. is the goal to reshape UpgradeIndexMergePolicy so that you can use it indefinitely as the merge policy for your IndexWriter ?
          Hide
          k317h Keith Laban added a comment - - edited

          Yes, in a nutshell that is my goal. Ultimately I'd like to build a request handler in solr that can enable a core to upgrade it's segments without first taking it offline or reconfiguring the index writer. When not engaged it should have no effect, and when it is engaged it should do minimal work at a time.

          The additional bells and whistles of this PR are for backwards compatibility.

          Previous the behavior was:
          1) Determine all old segments
          2) Ask wrapping merge policy what to do with the old segments
          3) Merge segments specified by wrapped merge policy
          4) Merge the remaining old segments into a single new segment

          Meaning, if you were to upgrade an old index using TierdMergePolicy or similar as the wrapped MP and said w.forceMerge(Integer.MAX_INT), the TMP would say nothing to do, but the UpgradeIndexMergePolicy would then take it upon itself to merge everything down into a single segment.

          Ideally if max number of segments > number of segments and the wrapped MP is happy, the UIMP should not take it upon itself make any merge decisions and only upgrade segments needing upgrade by rewritting each segment.

          An additional decision to rely on cascading calls from the IW was chosen so that if this was being used as the default MP and an upgrade was in progress, old segments could still be candidates for merges issued during a commit.

          The idea is loosely based on elasticsearch's ElasticsearchMergePolicy.

          There should probably also be support for a Predicate to be passed for determining whether the segment should be upgraded (rewritten), then this MP can be used for things such as deciding to rewrite segments with a different codec.

          Show
          k317h Keith Laban added a comment - - edited Yes, in a nutshell that is my goal. Ultimately I'd like to build a request handler in solr that can enable a core to upgrade it's segments without first taking it offline or reconfiguring the index writer. When not engaged it should have no effect, and when it is engaged it should do minimal work at a time. The additional bells and whistles of this PR are for backwards compatibility. Previous the behavior was: 1) Determine all old segments 2) Ask wrapping merge policy what to do with the old segments 3) Merge segments specified by wrapped merge policy 4) Merge the remaining old segments into a single new segment Meaning, if you were to upgrade an old index using TierdMergePolicy or similar as the wrapped MP and said w.forceMerge(Integer.MAX_INT) , the TMP would say nothing to do, but the UpgradeIndexMergePolicy would then take it upon itself to merge everything down into a single segment. Ideally if max number of segments > number of segments and the wrapped MP is happy, the UIMP should not take it upon itself make any merge decisions and only upgrade segments needing upgrade by rewritting each segment. An additional decision to rely on cascading calls from the IW was chosen so that if this was being used as the default MP and an upgrade was in progress, old segments could still be candidates for merges issued during a commit. The idea is loosely based on elasticsearch's ElasticsearchMergePolicy. There should probably also be support for a Predicate to be passed for determining whether the segment should be upgraded (rewritten), then this MP can be used for things such as deciding to rewrite segments with a different codec.
          Hide
          k317h Keith Laban added a comment -

          Rebased with master to keep up-to-date. Michael McCandless did you ever have an opportunity to review my previous post?

          Show
          k317h Keith Laban added a comment - Rebased with master to keep up-to-date. Michael McCandless did you ever have an opportunity to review my previous post?
          Hide
          k317h Keith Laban added a comment - - edited

          Because i've added some oldSingleSegmentNames with this commit, TestBackwardsCompatibility.testUpgradeOldSingleSegmentIndexWithAdditions now fails after rebasing LUCENE-7703 because of this line https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L2657

          Prior to this commit this test effectively does nothing as there were no old single segment test files. This failure should be addressed outside the scope of this ticket.

          Show
          k317h Keith Laban added a comment - - edited Because i've added some oldSingleSegmentNames with this commit, TestBackwardsCompatibility.testUpgradeOldSingleSegmentIndexWithAdditions now fails after rebasing LUCENE-7703 because of this line https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L2657 Prior to this commit this test effectively does nothing as there were no old single segment test files. This failure should be addressed outside the scope of this ticket.
          Hide
          mikemccand Michael McCandless added a comment -

          Hi Keith Laban, sorry, I changed jobs and got super busy but I will try to have a look at this again soon

          Net/net I think it's reasonable to make it easier to upgrade an index w/o having to close and open a new IndexWriter to set a new merge policy ...

          Show
          mikemccand Michael McCandless added a comment - Hi Keith Laban , sorry, I changed jobs and got super busy but I will try to have a look at this again soon Net/net I think it's reasonable to make it easier to upgrade an index w/o having to close and open a new IndexWriter to set a new merge policy ...
          Hide
          mikemccand Michael McCandless added a comment -

          I think, given the added complexity here, we should make a whole new
          merge policy, and leave the existing UpgradeIndexMergePolicy.java
          as is?

          Can you also add getters when you add a setter (e.g. setMaxUpgradesAtATime?

          Instead of two booleans setUpgradeInProgress and
          setRequireExplicitUpgrades, can't we just have one (i.e., it's
          always explicit)? Maybe rename it to setEnableUpgrades? The CLI
          tool would just enable that.

          If ignoreNewSegments is true, and requireExplicitUpgrades
          is true but upgradeInProgress is false, it looks like with this
          patch we will fail to carry out the wrap'd MP's force merge request?

          Typo in the javadocs for setMaxUpgradesAtATime: natrually -> naturally

          Show
          mikemccand Michael McCandless added a comment - I think, given the added complexity here, we should make a whole new merge policy, and leave the existing UpgradeIndexMergePolicy.java as is? Can you also add getters when you add a setter (e.g. setMaxUpgradesAtATime ? Instead of two booleans setUpgradeInProgress and setRequireExplicitUpgrades , can't we just have one (i.e., it's always explicit)? Maybe rename it to setEnableUpgrades ? The CLI tool would just enable that. If ignoreNewSegments is true , and requireExplicitUpgrades is true but upgradeInProgress is false, it looks like with this patch we will fail to carry out the wrap'd MP's force merge request? Typo in the javadocs for setMaxUpgradesAtATime : natrually -> naturally
          Hide
          k317h Keith Laban added a comment -

          How would you feel about removing some of the complexity if we are going to split it out into a separate merge policy? A lot of added complexity here was to make it compatible with the previous version. E.g. ignoreNewSegments will conform to the old behavior of only considering segments needing upgrade as merge candidates.

          Ideally this merge policy will have the options:

          • setEnableUpgrades
          • setMaxUpgradesAtATime

          And the behavior should be:

          • Delegate to the wrapped MP UNLESS enableUpgrades is set
          • When enabledUpgrades is set first delegate to wrap'd MP then rewrite (no merge) old segments in new format
          Show
          k317h Keith Laban added a comment - How would you feel about removing some of the complexity if we are going to split it out into a separate merge policy? A lot of added complexity here was to make it compatible with the previous version. E.g. ignoreNewSegments will conform to the old behavior of only considering segments needing upgrade as merge candidates. Ideally this merge policy will have the options: setEnableUpgrades setMaxUpgradesAtATime And the behavior should be: Delegate to the wrapped MP UNLESS enableUpgrades is set When enabledUpgrades is set first delegate to wrap'd MP then rewrite (no merge) old segments in new format
          Hide
          mikemccand Michael McCandless added a comment -

          A lot of added complexity here was to make it compatible with the previous version.

          Right, I think as a separate merge policy things can be much cleaner.

          Show
          mikemccand Michael McCandless added a comment - A lot of added complexity here was to make it compatible with the previous version. Right, I think as a separate merge policy things can be much cleaner.
          Hide
          k317h Keith Laban added a comment -

          Pushed a new version that breaks out the behavior outlined above into a new class called LiveUpgradeIndexMergePolicy. The name indicates that a different IndexWriter need not be used.

          Additionally, because the new MP does not have an option to ignore new segments, I've changed the IndexUpgrader script to accept an additional parameter telling it to "include-new-segments" allowing the switch between the old behavior and new behavior.

          The test are randomized between old/new behavior for coverage.

          Show
          k317h Keith Laban added a comment - Pushed a new version that breaks out the behavior outlined above into a new class called LiveUpgradeIndexMergePolicy . The name indicates that a different IndexWriter need not be used. Additionally, because the new MP does not have an option to ignore new segments, I've changed the IndexUpgrader script to accept an additional parameter telling it to "include-new-segments" allowing the switch between the old behavior and new behavior. The test are randomized between old/new behavior for coverage.
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Keith Laban; I'll have a look.

          Show
          mikemccand Michael McCandless added a comment - Thanks Keith Laban ; I'll have a look.
          Hide
          mikemccand Michael McCandless added a comment -

          Hi Keith Laban, the latest PR looks great! I found only a couple minor things:

          Typo in this test case:

          testUpgradeWithExcplicitUpgrades

          (the first c should be removed).

          In the IndexUpgrader usage output can you explain what the -include-new-segments does?

          Show
          mikemccand Michael McCandless added a comment - Hi Keith Laban , the latest PR looks great! I found only a couple minor things: Typo in this test case: testUpgradeWithExcplicitUpgrades (the first c should be removed). In the IndexUpgrader usage output can you explain what the -include-new-segments does?
          Hide
          k317h Keith Laban added a comment -

          Michael McCandless I updated PR with some extra changes:

          • Fixed typo in testUpgradeWithExcplicitUpgrades
          • Added usage for -include-new-segments options
          • Also added -num-segments option for IndexUpgrader and usage
          • Added random toggle for new options to be added in tests

          Still outstanding: See my earlier comment about the failing test.

          Show
          k317h Keith Laban added a comment - Michael McCandless I updated PR with some extra changes: Fixed typo in testUpgradeWithExcplicitUpgrades Added usage for -include-new-segments options Also added -num-segments option for IndexUpgrader and usage Added random toggle for new options to be added in tests Still outstanding: See my earlier comment about the failing test.
          Hide
          mikemccand Michael McCandless added a comment -

          Hi Keith Laban, thank you. To fix the failing test, can you change it to use the version of IndexWriter.addIndexes that takes CodecReader[] instead? This should have the same effect (some old, some new segments in one index).

          Show
          mikemccand Michael McCandless added a comment - Hi Keith Laban , thank you. To fix the failing test, can you change it to use the version of IndexWriter.addIndexes that takes CodecReader[] instead? This should have the same effect (some old, some new segments in one index).
          Hide
          k317h Keith Laban added a comment -

          The test now passes using the CodeReader interface. I also renamed the class to LiveUpgradeSegmentsMergePolicy because it upgrades on a segment by segment basis verse the UpgradeIndexMergePolicy which tries to ugprade all segments.

          It looks like someone has been touching TestBackwardCompatibility class and this no longer cleanly lands on master. I'll take a look at catching it up later

          Show
          k317h Keith Laban added a comment - The test now passes using the CodeReader interface. I also renamed the class to LiveUpgradeSegmentsMergePolicy because it upgrades on a segment by segment basis verse the UpgradeIndexMergePolicy which tries to ugprade all segments. It looks like someone has been touching TestBackwardCompatibility class and this no longer cleanly lands on master. I'll take a look at catching it up later
          Hide
          k317h Keith Laban added a comment -

          Michael McCandless, I pushed an update which at the moment cleanly applies.

          However is looks like LUCENE-7756 break addIndexes again because of this check which was added:

          +  private void validateMergeReader(CodecReader leaf) {
          +    LeafMetaData segmentMeta = leaf.getMetaData();
          +    if (segmentInfos.getIndexCreatedVersionMajor() != segmentMeta.getCreatedVersionMajor()) {
          +      throw new IllegalArgumentException("Cannot merge a segment that has been created with major version "
          +          + segmentMeta.getCreatedVersionMajor() + " into this index which has been created by major version "
          +          + segmentInfos.getIndexCreatedVersionMajor());
          +    }
          +
          +    if (segmentInfos.getIndexCreatedVersionMajor() >= 7 && segmentMeta.getMinVersion() == null) {
          +      throw new IllegalStateException("Indexes created on or after Lucene 7 must record the created version major, but " + leaf + " hides it");
          +    }
          +
          +    Sort leafIndexSort = segmentMeta.getSort();
          +    if (config.getIndexSort() != null && leafIndexSort != null
          +        && config.getIndexSort().equals(leafIndexSort) == false) {
          +      throw new IllegalArgumentException("cannot change index sort from " + leafIndexSort + " to " + config.getIndexSort());
          +    }
          +  }
          

          Is this mergepolicy working against future goals of lucene such that it will be impossible to upgrade major versions without reIndexing?

          Show
          k317h Keith Laban added a comment - Michael McCandless , I pushed an update which at the moment cleanly applies. However is looks like LUCENE-7756 break addIndexes again because of this check which was added: + private void validateMergeReader(CodecReader leaf) { + LeafMetaData segmentMeta = leaf.getMetaData(); + if (segmentInfos.getIndexCreatedVersionMajor() != segmentMeta.getCreatedVersionMajor()) { + throw new IllegalArgumentException( "Cannot merge a segment that has been created with major version " + + segmentMeta.getCreatedVersionMajor() + " into this index which has been created by major version " + + segmentInfos.getIndexCreatedVersionMajor()); + } + + if (segmentInfos.getIndexCreatedVersionMajor() >= 7 && segmentMeta.getMinVersion() == null ) { + throw new IllegalStateException( "Indexes created on or after Lucene 7 must record the created version major, but " + leaf + " hides it" ); + } + + Sort leafIndexSort = segmentMeta.getSort(); + if (config.getIndexSort() != null && leafIndexSort != null + && config.getIndexSort().equals(leafIndexSort) == false ) { + throw new IllegalArgumentException( "cannot change index sort from " + leafIndexSort + " to " + config.getIndexSort()); + } + } Is this mergepolicy working against future goals of lucene such that it will be impossible to upgrade major versions without reIndexing?

            People

            • Assignee:
              Unassigned
              Reporter:
              k317h Keith Laban
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development