Solr
  1. Solr
  2. SOLR-2567

Solr should default to TieredMergePolicy

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3, 4.0-ALPHA
    • Component/s: update
    • Labels:
      None

      Description

      even if we set a luceneMatchVersion to >= 3.2 (SOLR-2557),
      Solr still defaults to LogByte

      1. SOLR-2567.patch
        11 kB
        Robert Muir
      2. SOLR-2567.patch
        6 kB
        Robert Muir
      3. SOLR-2567.patch
        5 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          the patch currently adjusts the default based on luceneMatchVersion, but this is confusing if we release 3.2 that "disagrees" with lucene's actual 3.2 defaults.

          we could check onOrAfter 3.3 at least to be safe so we don't surprise anyone when 3.3 comes out

          Show
          Robert Muir added a comment - the patch currently adjusts the default based on luceneMatchVersion, but this is confusing if we release 3.2 that "disagrees" with lucene's actual 3.2 defaults. we could check onOrAfter 3.3 at least to be safe so we don't surprise anyone when 3.3 comes out
          Hide
          Robert Muir added a comment -

          updated the patch, to check onOrAfter 3.3.

          Also it applies mergefactor and compoundfile to TieredMergePolicy.

          I think this is ready to commit.

          Show
          Robert Muir added a comment - updated the patch, to check onOrAfter 3.3. Also it applies mergefactor and compoundfile to TieredMergePolicy. I think this is ready to commit.
          Hide
          Michael McCandless added a comment -

          Patch looks great – thanks Robert!

          Show
          Michael McCandless added a comment - Patch looks great – thanks Robert!
          Hide
          Yonik Seeley added a comment -

          the patch currently adjusts the default based on luceneMatchVersion, but this is confusing if we release 3.2 that "disagrees" with lucene's actual 3.2 defaults.

          I don't think all of Solr's defaults need to match Lucene's defaults in general - we can take it on a case-by-case basis. In this case, a TieredMergePolicy default certainly does make sense. I'm not even sure it needed to be linked to luceneMatchVersion... it's a rare solr user that would depend on the specifics of merging or docid behavioral changes.

          Show
          Yonik Seeley added a comment - the patch currently adjusts the default based on luceneMatchVersion, but this is confusing if we release 3.2 that "disagrees" with lucene's actual 3.2 defaults. I don't think all of Solr's defaults need to match Lucene's defaults in general - we can take it on a case-by-case basis. In this case, a TieredMergePolicy default certainly does make sense. I'm not even sure it needed to be linked to luceneMatchVersion... it's a rare solr user that would depend on the specifics of merging or docid behavioral changes.
          Hide
          Robert Muir added a comment -

          Well, mostly here I was admitting that I screwed this up for 3.2

          One of the major reasons we respun was to ensure solr used the most recent lucene version configuration so it had features like this... yet it still didnt enable TieredMP.

          I guess what I'm saying is... I'm willing to respin if anyone thinks its a huge deal. In this case I honestly think our 3.2RC is a good release even though this isn't enabled yet, and we should just target 3.3.

          Show
          Robert Muir added a comment - Well, mostly here I was admitting that I screwed this up for 3.2 One of the major reasons we respun was to ensure solr used the most recent lucene version configuration so it had features like this... yet it still didnt enable TieredMP. I guess what I'm saying is... I'm willing to respin if anyone thinks its a huge deal. In this case I honestly think our 3.2RC is a good release even though this isn't enabled yet, and we should just target 3.3.
          Hide
          Uwe Schindler added a comment -

          I just wanted to remember that the matchVersion stuff this time is done without reparsing each time duck

          I agree with Yonik, that in the case of Solr, we can easily change the default without matchVersion, as nothing in solr depends on docids in order. Solr always used real docids.

          Show
          Uwe Schindler added a comment - I just wanted to remember that the matchVersion stuff this time is done without reparsing each time duck I agree with Yonik, that in the case of Solr, we can easily change the default without matchVersion, as nothing in solr depends on docids in order. Solr always used real docids.
          Hide
          Michael McCandless added a comment -

          I don't think we should respin – just make it the default MP for 3.3?

          But, I think we should use luceneMatchVersion to conditionalize it.
          It's a biggish change, that docIDs can now be "shuffled", so in case
          there is some Solr app out there that relies on docIDs staying in order,
          we should respect the luceneMatchVersion.

          Show
          Michael McCandless added a comment - I don't think we should respin – just make it the default MP for 3.3? But, I think we should use luceneMatchVersion to conditionalize it. It's a biggish change, that docIDs can now be "shuffled", so in case there is some Solr app out there that relies on docIDs staying in order, we should respect the luceneMatchVersion.
          Hide
          Uwe Schindler added a comment -

          Additionally, I dont think its a problem to respin because of that. If we want consistency in Solr with Lucene (and I am huge +1 to be consistent in all case - this time against Yonik), we should respin and fix this (along with the small issues brought on the vote thread).

          Show
          Uwe Schindler added a comment - Additionally, I dont think its a problem to respin because of that. If we want consistency in Solr with Lucene (and I am huge +1 to be consistent in all case - this time against Yonik), we should respin and fix this (along with the small issues brought on the vote thread).
          Hide
          Robert Muir added a comment -

          Solr always used real docids.

          Even when you sort only by score? in this case the lucene docid is the "tiebreaker", no?

          Show
          Robert Muir added a comment - Solr always used real docids. Even when you sort only by score? in this case the lucene docid is the "tiebreaker", no?
          Hide
          Mark Miller added a comment -

          My initial reaction was it was not worth re-spinning for this either. But mostly based on the assumption that the user could just configure it, and at worst we could drop a note about it.

          Then I saw that there would be some side affects to that based on this patch? I didn't really look closely, but from memory, I guess it would disable being able to set compound file? And you wouldn't be able to change the tiered policies settings? Or does Solr's config any given method voodoo work in this case?

          Either way, I'd be fine if we didn't respin or if we did. It would still be an improvement, even if it didn't support compound, and then that can be another improvement.

          Show
          Mark Miller added a comment - My initial reaction was it was not worth re-spinning for this either. But mostly based on the assumption that the user could just configure it, and at worst we could drop a note about it. Then I saw that there would be some side affects to that based on this patch? I didn't really look closely, but from memory, I guess it would disable being able to set compound file? And you wouldn't be able to change the tiered policies settings? Or does Solr's config any given method voodoo work in this case? Either way, I'd be fine if we didn't respin or if we did. It would still be an improvement, even if it didn't support compound, and then that can be another improvement.
          Hide
          Robert Muir added a comment -

          Additionally, I dont think its a problem to respin because of that.

          You are right, its no problem at all to respin, I don't mind doing this at all. The only problem is that committers must spend their time to re-review the release candidate, but I suppose if they watch the commits list they can be more confident in their checking and not have to waste a lot of time, since its only this issue and maybe two other small things that were found.

          In fact, Mike committed a "HuperDuper" automated-checker to do some of the grunt work of checking a release to luceneutil today: http://code.google.com/a/apache-extras.org/p/luceneutil/source/detail?r=5bf3a0ea4922bbeb38a0a16c503b42e07c028eab

          So maybe its a good opportunity to use this utility to aid in easier review of release candidates.

          Show
          Robert Muir added a comment - Additionally, I dont think its a problem to respin because of that. You are right, its no problem at all to respin, I don't mind doing this at all. The only problem is that committers must spend their time to re-review the release candidate, but I suppose if they watch the commits list they can be more confident in their checking and not have to waste a lot of time, since its only this issue and maybe two other small things that were found. In fact, Mike committed a "HuperDuper" automated-checker to do some of the grunt work of checking a release to luceneutil today: http://code.google.com/a/apache-extras.org/p/luceneutil/source/detail?r=5bf3a0ea4922bbeb38a0a16c503b42e07c028eab So maybe its a good opportunity to use this utility to aid in easier review of release candidates.
          Hide
          Robert Muir added a comment -

          Then I saw that there would be some side affects to that based on this patch? I didn't really look closely, but from memory, I guess it would disable being able to set compound file? And you wouldn't be able to change the tiered policies settings? Or does Solr's config any given method voodoo work in this case?

          Right, this is definitely a downside to not having this patch, they cannot set compoundFile or mergeFactor.

          And you are also right about tiered policy settings... you will notice I didn't actually enable these in the patch. I guess thats because I thought it would be confusing to add "MP-specific" settings all piled in to the config. I was planning on opening another issue for this to allow for a factory configuration like other things, especially so I can pass a 'long' random seed so that tests can use MockRandomMergePolicy

          Show
          Robert Muir added a comment - Then I saw that there would be some side affects to that based on this patch? I didn't really look closely, but from memory, I guess it would disable being able to set compound file? And you wouldn't be able to change the tiered policies settings? Or does Solr's config any given method voodoo work in this case? Right, this is definitely a downside to not having this patch, they cannot set compoundFile or mergeFactor. And you are also right about tiered policy settings... you will notice I didn't actually enable these in the patch. I guess thats because I thought it would be confusing to add "MP-specific" settings all piled in to the config. I was planning on opening another issue for this to allow for a factory configuration like other things, especially so I can pass a 'long' random seed so that tests can use MockRandomMergePolicy
          Hide
          Robert Muir added a comment -

          Anyway, I'm interested in what others think the right thing to do here is... we can bring it to the dev@ list if it will grab more attention.

          the choices to me seem to be:

          1. release 3.2 as is, enable TieredMergePolicy in 3.3, using luceneMatchVersion >= 3.3
          2. release 3.2 as is, enable TieredMergePolicy in 3.3, just changing the default completely.
          3. respin and use this patch, with TieredMergePolicy defaulted on for luceneMatchVersion >= 3.2 (consistent with lucene), but you don't have total configuration over all its settings (just mergefactor and compoundfile).
          4. respin like the above, but also quickly try to add support for MergePolicyFactory or similar, so that all tieredMP settings are exposed

          and maybe there are other ideas I haven't mentioned.

          Show
          Robert Muir added a comment - Anyway, I'm interested in what others think the right thing to do here is... we can bring it to the dev@ list if it will grab more attention. the choices to me seem to be: release 3.2 as is, enable TieredMergePolicy in 3.3, using luceneMatchVersion >= 3.3 release 3.2 as is, enable TieredMergePolicy in 3.3, just changing the default completely. respin and use this patch, with TieredMergePolicy defaulted on for luceneMatchVersion >= 3.2 (consistent with lucene), but you don't have total configuration over all its settings (just mergefactor and compoundfile). respin like the above, but also quickly try to add support for MergePolicyFactory or similar, so that all tieredMP settings are exposed and maybe there are other ideas I haven't mentioned.
          Hide
          Steve Rowe added a comment -

          +1 for #1 (release 3.2 as is, enable full TMP support in 3.3)

          Show
          Steve Rowe added a comment - +1 for #1 (release 3.2 as is, enable full TMP support in 3.3)
          Hide
          Michael McCandless added a comment -

          +1 for #1 too

          Show
          Michael McCandless added a comment - +1 for #1 too
          Hide
          Yonik Seeley added a comment -

          Yeah, our release process can be painful enough (esp given the 72 hr thing) that I don't think this should keep us from releasing 3.2.
          I'm fine with #1 or #2

          Show
          Yonik Seeley added a comment - Yeah, our release process can be painful enough (esp given the 72 hr thing) that I don't think this should keep us from releasing 3.2. I'm fine with #1 or #2
          Hide
          Robert Muir added a comment -

          OK, I created SOLR-2572 to improve mergepolicy configuration.

          This way when we enable TieredMP, the user will be able to configure all of its options

          Show
          Robert Muir added a comment - OK, I created SOLR-2572 to improve mergepolicy configuration. This way when we enable TieredMP, the user will be able to configure all of its options
          Hide
          Robert Muir added a comment -

          ok, here's a patch for committing.

          After reviewing SOLR-2572, I think its overkill at the moment.

          This is because SolrPluginUtils.applySetters takes care of all possible parameters we have in all of our mergepolicies (including contrib) available today: all the parameters are setters that take primitive types.

          So, I'd like to commit this shortly and cancel SOLR-2572 as "won't fix".

          I added a test that configures some tiered-mp specific stuff.

          Show
          Robert Muir added a comment - ok, here's a patch for committing. After reviewing SOLR-2572 , I think its overkill at the moment. This is because SolrPluginUtils.applySetters takes care of all possible parameters we have in all of our mergepolicies (including contrib) available today: all the parameters are setters that take primitive types. So, I'd like to commit this shortly and cancel SOLR-2572 as "won't fix". I added a test that configures some tiered-mp specific stuff.
          Hide
          Robert Muir added a comment -

          Bulk close for 3.3

          Show
          Robert Muir added a comment - Bulk close for 3.3

            People

            • Assignee:
              Robert Muir
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development