Lucene - Core
  1. Lucene - Core
  2. LUCENE-3454

rename optimize to a less cool-sounding name

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.4, 4.0-ALPHA
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I think users see the name optimize and feel they must do this, because who wants a suboptimal system? but this probably just results in wasted time and resources.

      maybe rename to collapseSegments or something?

      1. LUCENE-3454.patch
        274 kB
        Michael McCandless
      2. LUCENE-3454.patch
        203 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          if anyone wants to take this, don't hesitate!

          Show
          Robert Muir added a comment - if anyone wants to take this, don't hesitate!
          Hide
          Otis Gospodnetic added a comment -

          Would it be wise to stick with a name less specific than collapseSegments for example, in order not to have an incorrect name that requires another renaming when this "command" ends up doing something new in the future?

          Show
          Otis Gospodnetic added a comment - Would it be wise to stick with a name less specific than collapseSegments for example, in order not to have an incorrect name that requires another renaming when this "command" ends up doing something new in the future?
          Hide
          Robert Muir added a comment -

          Otis: thats a good point, currently optimize is just a "request", we should probably figure out what it "should be".

          Should it be collapseSegments, which is a well-defined request without a cool sounding name?

          Or, should it be <something else>, which gives you a more optimal configuration for search performance... (i still think optimize is a bad name even for this)? Personally i suspect its going to be hard to support this case, e.g. you would really need to know things like, if the user has executionService set on the IndexSearcher and how big the threadpool is and things like that to make an 'optimal' configuration... and we don't have a nice way of knowing that information today.

          Show
          Robert Muir added a comment - Otis: thats a good point, currently optimize is just a "request", we should probably figure out what it "should be". Should it be collapseSegments, which is a well-defined request without a cool sounding name? Or, should it be <something else>, which gives you a more optimal configuration for search performance... (i still think optimize is a bad name even for this)? Personally i suspect its going to be hard to support this case, e.g. you would really need to know things like, if the user has executionService set on the IndexSearcher and how big the threadpool is and things like that to make an 'optimal' configuration... and we don't have a nice way of knowing that information today.
          Hide
          Shai Erera added a comment -

          I personally find optimize() clear, and people got used to it already. However I do agree it sounds like it's doing magic, but really it just merges segments.

          We already have maybeMerge(), how about consolidating the two under merge(MergeLevel) where MergeLevel is either AGGRESSIVE (optimize) or DELICATE (maybeMerge)? Or something in that spirit.

          Anyway, if we end up renaming, we should probably rename MergePolicy.findSegmentsForOptimize as well. If we consolidate, we should consolidate it as well.

          Show
          Shai Erera added a comment - I personally find optimize() clear, and people got used to it already. However I do agree it sounds like it's doing magic, but really it just merges segments. We already have maybeMerge(), how about consolidating the two under merge(MergeLevel) where MergeLevel is either AGGRESSIVE (optimize) or DELICATE (maybeMerge)? Or something in that spirit. Anyway, if we end up renaming, we should probably rename MergePolicy.findSegmentsForOptimize as well. If we consolidate, we should consolidate it as well.
          Hide
          Uwe Schindler added a comment -

          I like Shai's idea, let's remove optimize completely. maybeMerge has the ideal name.

          MergePolicy should then have only one method also getting this aggresiveness parameter.

          Show
          Uwe Schindler added a comment - I like Shai's idea, let's remove optimize completely. maybeMerge has the ideal name. MergePolicy should then have only one method also getting this aggresiveness parameter.
          Hide
          Michael McCandless added a comment -

          How about maybeMerge() and maybeMerge(int maxSegCount) (since we have to absorb optimize(int maxSegCount) too)?

          This way what used to be optimize is now maybeMerge(1).

          I like that this forces the user to make an explicit decision about how many segments they require the index to merge down to, so they realize by picking 1 they are asking for tons of merging.

          Show
          Michael McCandless added a comment - How about maybeMerge() and maybeMerge(int maxSegCount) (since we have to absorb optimize(int maxSegCount) too)? This way what used to be optimize is now maybeMerge(1). I like that this forces the user to make an explicit decision about how many segments they require the index to merge down to, so they realize by picking 1 they are asking for tons of merging.
          Hide
          Shai Erera added a comment -

          Fine. Can we use the opportunity to remove 'maybe'? Just merge(). If it ends up doing nothing, it's still ok, with me at least.

          Show
          Shai Erera added a comment - Fine. Can we use the opportunity to remove 'maybe'? Just merge(). If it ends up doing nothing, it's still ok, with me at least.
          Hide
          Michael McCandless added a comment -

          Hmm... I like maybeMerge() better because it makes it clear that there may be no merging done?

          But maybe we can find a different word from maybe instead mergeIfNeeded?

          Show
          Michael McCandless added a comment - Hmm... I like maybeMerge() better because it makes it clear that there may be no merging done? But maybe we can find a different word from maybe instead mergeIfNeeded?
          Hide
          Shai Erera added a comment -

          It's just that maybeMerge feels like asking a question, to which you expect an answer. Why is it so important to have this 'maybe' or 'ifNeeded' in the API? just like optimize(), MP decides what to merge, and optimize() can end up doing nothing as well ...

          Why do we have IR.reopen and not maybeReopen? That that it returns something is not much different than IW.merge(), IMO.

          Show
          Shai Erera added a comment - It's just that maybeMerge feels like asking a question, to which you expect an answer. Why is it so important to have this 'maybe' or 'ifNeeded' in the API? just like optimize(), MP decides what to merge, and optimize() can end up doing nothing as well ... Why do we have IR.reopen and not maybeReopen? That that it returns something is not much different than IW.merge(), IMO.
          Hide
          Eks Dev added a comment -

          as a user,

          • +1 to parameter, if you do not know what NooSegments mean, you shouldn't be invoking this method.
          • what about tryMerge(int ), attempMerge
          Show
          Eks Dev added a comment - as a user, +1 to parameter, if you do not know what NooSegments mean, you shouldn't be invoking this method. what about tryMerge(int ), attempMerge
          Hide
          DM Smith added a comment -

          When I started w/ Lucene, I read the docs and was drawn to call optimize because of its "cool name." However, it was reading the documentation at the time that convinced me that it was appropriate for my use case:

          Creating an index that once created would never be modified. It needed to be as fast as possible for search on low performance computing devices (old laptops, ancient computers, netbooks, phones, ...).

          Maybe I misunderstood, but wasn't it and isn't it still appropriate for that?

          And I have no idea what NooSegments means.

          If you want a really uncool name how about dumbDown()?

          But either way, please document the appropriate use cases for it.

          Show
          DM Smith added a comment - When I started w/ Lucene, I read the docs and was drawn to call optimize because of its "cool name." However, it was reading the documentation at the time that convinced me that it was appropriate for my use case: Creating an index that once created would never be modified. It needed to be as fast as possible for search on low performance computing devices (old laptops, ancient computers, netbooks, phones, ...). Maybe I misunderstood, but wasn't it and isn't it still appropriate for that? And I have no idea what NooSegments means. If you want a really uncool name how about dumbDown()? But either way, please document the appropriate use cases for it.
          Hide
          Michael McCandless added a comment -

          Why do we have IR.reopen and not maybeReopen?

          That's a good point... and I actually don't like that naming either!!

          I think it trips users up because of it's hidden maybe-ness, ie, users who always close the old reader on calling this API (which, if no reopen actually occurred, closes the very same reader it had just returned).

          Also, IW already has a merge method, taking a OneMerge parameter; that naming I think is correct because it's unconditional: IW will carry out the merge you passed to it.

          Naming is the hardest part!

          Show
          Michael McCandless added a comment - Why do we have IR.reopen and not maybeReopen? That's a good point... and I actually don't like that naming either!! I think it trips users up because of it's hidden maybe-ness, ie, users who always close the old reader on calling this API (which, if no reopen actually occurred, closes the very same reader it had just returned). Also, IW already has a merge method, taking a OneMerge parameter; that naming I think is correct because it's unconditional: IW will carry out the merge you passed to it. Naming is the hardest part!
          Hide
          Mark Miller added a comment -

          In spitball land... requestAggressiveMerge?

          Show
          Mark Miller added a comment - In spitball land... requestAggressiveMerge?
          Hide
          Mike Sokolov added a comment -

          squeeze()? You apply some pressure - maybe it merges, and maybe it doesn't. Depending how hard you squeeze. squeeze(1) is squeezing harder than squeeze(10), which is a bit weird, but that's just down to units of squishiness.

          Show
          Mike Sokolov added a comment - squeeze()? You apply some pressure - maybe it merges, and maybe it doesn't. Depending how hard you squeeze. squeeze(1) is squeezing harder than squeeze(10), which is a bit weird, but that's just down to units of squishiness.
          Hide
          Jan Høydahl added a comment -

          I've seen people optimize after every single add(). Even the Django search framework Solango did this terrible mistake.
          So I think renaming is welcome, and now that Lucene does such a good merging job for you most people won't need optimize. How about:

          reduceSegments(int numSegments)

          Does it convey the "maybeness" well enough? If I use it I need to know what a segment is and have some feeling for how many segments there are and how many I want, so its usage will imply some knowledge. But maybeMerge() works as well.

          Show
          Jan Høydahl added a comment - I've seen people optimize after every single add(). Even the Django search framework Solango did this terrible mistake. So I think renaming is welcome, and now that Lucene does such a good merging job for you most people won't need optimize. How about: reduceSegments(int numSegments) Does it convey the "maybeness" well enough? If I use it I need to know what a segment is and have some feeling for how many segments there are and how many I want, so its usage will imply some knowledge. But maybeMerge() works as well.
          Hide
          Doron Cohen added a comment -

          To me merge(num) doing nothing "because there are already no more than n segments" is as fine as close() doing nothing "because of already being closed" so +1 for merge(num).

          Show
          Doron Cohen added a comment - To me merge(num) doing nothing "because there are already no more than n segments" is as fine as close() doing nothing "because of already being closed" so +1 for merge(num).
          Hide
          Mindaugas Žakšauskas added a comment -

          What about "defragment"?

          Show
          Mindaugas Žakšauskas added a comment - What about "defragment"?
          Hide
          Robert Muir added a comment -

          setting affects 3.x:

          Here's my +1 for a hard backwards break to remove this name.

          Show
          Robert Muir added a comment - setting affects 3.x: Here's my +1 for a hard backwards break to remove this name.
          Hide
          Grant Ingersoll added a comment -

          How about compactIndex()?

          Show
          Grant Ingersoll added a comment - How about compactIndex()?
          Hide
          Michael McCandless added a comment -

          Patch.

          I picked the name 'mergeIfNeeded', and it requires you pass in the maxSegmentCount (so mergeIfNeeded(1) == what optimize() does today).

          All tests pass, including Solr, but Solr has tons of user-facing references still to "optimize" that I'm not sure how to fix (I tried to remove all references to optimize in Lucene).

          I put nocommits for tests that need to be renamed... I'll do that at the last second so patching is easier.

          Show
          Michael McCandless added a comment - Patch. I picked the name 'mergeIfNeeded', and it requires you pass in the maxSegmentCount (so mergeIfNeeded(1) == what optimize() does today). All tests pass, including Solr, but Solr has tons of user-facing references still to "optimize" that I'm not sure how to fix (I tried to remove all references to optimize in Lucene). I put nocommits for tests that need to be renamed... I'll do that at the last second so patching is easier.
          Hide
          Yonik Seeley added a comment -

          but Solr has tons of user-facing references still to "optimize" that I'm not sure how to fix

          That's fine. We probably shouldn't be changing Solr's external interfaces (and configuration options) for renamings like this.

          Show
          Yonik Seeley added a comment - but Solr has tons of user-facing references still to "optimize" that I'm not sure how to fix That's fine. We probably shouldn't be changing Solr's external interfaces (and configuration options) for renamings like this.
          Hide
          Robert Muir added a comment -

          Yonik: I'm not sure? Jan's comment really hits home at why I opened this issue.

          I think its worth considering "optimize" command to throw an exception in Solr for this reason... we can add an expert option like mergeIfNeeded instead?

          I really think optimize is very dangerous because it sounds magical, the kind of thing that tempts new users into getting terrible performance,
          instead it should be an expert thing.

          Show
          Robert Muir added a comment - Yonik: I'm not sure? Jan's comment really hits home at why I opened this issue. I think its worth considering "optimize" command to throw an exception in Solr for this reason... we can add an expert option like mergeIfNeeded instead? I really think optimize is very dangerous because it sounds magical, the kind of thing that tempts new users into getting terrible performance, instead it should be an expert thing.
          Hide
          Hoss Man added a comment -

          I think its worth considering "optimize" command to throw an exception in Solr for this reason... we can add an expert option like mergeIfNeeded instead?

          I have mixed feelings on this ... Solr has really tried to ensure that the user level APIs (ie: HTTP params and the xml syntax) are either unaffected by under the cover changes, or provide good backcompat with warnings logged when people use "deprecated" syntax.

          But I would suggest that we start by keeping this issue focused on nailing down the java level issues in the "lucene" layer and get that committed. Then when we're sure it's done and settled and happy, create a new issue for tracking the solr level API changes, with considerations like:

          • should we log a deprecation warning to 3.x if <optimize> or param optimize=true is specified, and ignore in 4.x?
          • should the existing 'maxSegments' attribute became an option on "commit" or add a new explicit "merge" command?
          • what do we do with postOptimize events listener registrations? log a deprecation warning, or rename and fire them if/when a merge to a single segment happens?

          (my personal opinion, for any general change not just optimize, is to add new syntax to reflect the new world order, but as long as we can support the old syntax we should – with anoying warning logs)

          Show
          Hoss Man added a comment - I think its worth considering "optimize" command to throw an exception in Solr for this reason... we can add an expert option like mergeIfNeeded instead? I have mixed feelings on this ... Solr has really tried to ensure that the user level APIs (ie: HTTP params and the xml syntax) are either unaffected by under the cover changes, or provide good backcompat with warnings logged when people use "deprecated" syntax. But I would suggest that we start by keeping this issue focused on nailing down the java level issues in the "lucene" layer and get that committed. Then when we're sure it's done and settled and happy, create a new issue for tracking the solr level API changes, with considerations like: should we log a deprecation warning to 3.x if <optimize> or param optimize=true is specified, and ignore in 4.x? should the existing 'maxSegments' attribute became an option on "commit" or add a new explicit "merge" command? what do we do with postOptimize events listener registrations? log a deprecation warning, or rename and fire them if/when a merge to a single segment happens? (my personal opinion, for any general change not just optimize, is to add new syntax to reflect the new world order, but as long as we can support the old syntax we should – with anoying warning logs)
          Hide
          Yonik Seeley added a comment -

          It's at least an order of magnitude more painful to change common external interfaces in Solr than in Lucene (or any other type-safe library). "optimize" has been with us a very long time, and it should stay. It should be both simple and effective enough to clarify the documentation if needed. Changing the name would cause far more collective confusion and pain than the very small percent of people who might not understand when it's appropriate to call (and changing the name won't help them understand when it is appropriate!).

          On a side note: optimize still does just that - an index with fewer segments and deleted documents is still more efficient to search. We just need to make sure to document the downsides as well.

          Show
          Yonik Seeley added a comment - It's at least an order of magnitude more painful to change common external interfaces in Solr than in Lucene (or any other type-safe library). "optimize" has been with us a very long time, and it should stay. It should be both simple and effective enough to clarify the documentation if needed. Changing the name would cause far more collective confusion and pain than the very small percent of people who might not understand when it's appropriate to call (and changing the name won't help them understand when it is appropriate!). On a side note: optimize still does just that - an index with fewer segments and deleted documents is still more efficient to search. We just need to make sure to document the downsides as well.
          Hide
          Robert Muir added a comment -

          Changing the name would cause far more collective confusion and pain than the very small percent of people who might not understand when it's appropriate to call (and changing the name won't help them understand when it is appropriate!).

          And thats exactly my goal here: whatever it takes. I want to cause confusion so that people think twice before calling optimize. Because I totally disagree with you that its a small percentage of people that know when its appropriate to call, when integrations such as Jan's example call it after every commit: thats horrible.

          Thats why I think we should rename it in lucene, rename it in solr, but if people start pushing back thats fine.

          I can totally take this up myself with the blogging/email list route instead and intentionally cause so much fucking confusion that nobody will ever use this again.

          Show
          Robert Muir added a comment - Changing the name would cause far more collective confusion and pain than the very small percent of people who might not understand when it's appropriate to call (and changing the name won't help them understand when it is appropriate!). And thats exactly my goal here: whatever it takes. I want to cause confusion so that people think twice before calling optimize. Because I totally disagree with you that its a small percentage of people that know when its appropriate to call, when integrations such as Jan's example call it after every commit: thats horrible. Thats why I think we should rename it in lucene, rename it in solr, but if people start pushing back thats fine. I can totally take this up myself with the blogging/email list route instead and intentionally cause so much fucking confusion that nobody will ever use this again.
          Hide
          Yonik Seeley added a comment -

          There should be extremely good reasons for changing Solr's external APIs - and this simply doesn't come close to rising to that level.

          Show
          Yonik Seeley added a comment - There should be extremely good reasons for changing Solr's external APIs - and this simply doesn't come close to rising to that level.
          Hide
          Robert Muir added a comment -

          fine, then i'm +1 to mike's patch.

          If solr wants to be slow, then it can stay slow, I'm gonna stop fighting it.

          Show
          Robert Muir added a comment - fine, then i'm +1 to mike's patch. If solr wants to be slow, then it can stay slow, I'm gonna stop fighting it.
          Hide
          Uwe Schindler added a comment -

          My proposal would be to leave "optimize" in Solr, but make it a no-op. No code will break and Solr will just drop a warning not in the log.

          Show
          Uwe Schindler added a comment - My proposal would be to leave "optimize" in Solr, but make it a no-op. No code will break and Solr will just drop a warning not in the log.
          Hide
          Uwe Schindler added a comment -

          The same could be done in 3.x of Lucene to prevent hard backwards breaks. As optimize does nothing to your index thats visible to the API/IndexReader/..., we could also meke IndexWriter.optimize() a no-op in Lucene 3.x and document it in the javadocs to do nothing anymore and that its only there to support binary backwards.

          Show
          Uwe Schindler added a comment - The same could be done in 3.x of Lucene to prevent hard backwards breaks. As optimize does nothing to your index thats visible to the API/IndexReader/..., we could also meke IndexWriter.optimize() a no-op in Lucene 3.x and document it in the javadocs to do nothing anymore and that its only there to support binary backwards.
          Hide
          Robert Muir added a comment -

          very small percent of people who might not understand when it's appropriate to call

          Are you saying only a small percentage of people use DataImportHandler or SpellChecker?

          Because both of these optimize by default on any operation...

          Show
          Robert Muir added a comment - very small percent of people who might not understand when it's appropriate to call Are you saying only a small percentage of people use DataImportHandler or SpellChecker? Because both of these optimize by default on any operation...
          Hide
          Uwe Schindler added a comment -

          Mike, about your patch:

          Would it make sense to remove the extra method completely from MergePolicy and simply do all in one method? By default, IW will call merge(maxSegments=Integer.MAX_VALUE) and when you call mergeIfNeeded it would pass merge(maxSegments=n).

          Otherwise +1 to commmit.

          The stupid and horrible default optimizes in Solr should be handled in another SOLR-xxx issue.

          Show
          Uwe Schindler added a comment - Mike, about your patch: Would it make sense to remove the extra method completely from MergePolicy and simply do all in one method? By default, IW will call merge(maxSegments=Integer.MAX_VALUE) and when you call mergeIfNeeded it would pass merge(maxSegments=n). Otherwise +1 to commmit. The stupid and horrible default optimizes in Solr should be handled in another SOLR-xxx issue.
          Hide
          Michael McCandless added a comment -

          OK let's leave this issue focused on Lucene only since changing Solr is contentious.

          Would it make sense to remove the extra method completely from MergePolicy and simply do all in one method?

          +1, good idea! I'll rework the patch.

          Show
          Michael McCandless added a comment - OK let's leave this issue focused on Lucene only since changing Solr is contentious. Would it make sense to remove the extra method completely from MergePolicy and simply do all in one method? +1, good idea! I'll rework the patch.
          Hide
          Mark Miller added a comment -

          Perhaps I am the only one, but I find these ifNeeded, mabyeThis, mabyeThat method names so ugly. I prefer JavaDoc for trying to catch the subtleties.

          +1 on compact as suggested by Grant or something like that.

          Show
          Mark Miller added a comment - Perhaps I am the only one, but I find these ifNeeded, mabyeThis, mabyeThat method names so ugly. I prefer JavaDoc for trying to catch the subtleties. +1 on compact as suggested by Grant or something like that.
          Hide
          Robert Muir added a comment -

          But names like compact/vacuum/etc imply that its some sort of necessary maintenance routine...

          Show
          Robert Muir added a comment - But names like compact/vacuum/etc imply that its some sort of necessary maintenance routine...
          Hide
          Mark Miller added a comment -

          Vacuum may because it's part of the db vocabulary - I don't think compact does at all. Beyond that, that is what javadocs are for. Personally, I don't think we need to try and design the api for complete morons that just call methods without reading the javadoc for what they do.

          Show
          Mark Miller added a comment - Vacuum may because it's part of the db vocabulary - I don't think compact does at all. Beyond that, that is what javadocs are for. Personally, I don't think we need to try and design the api for complete morons that just call methods without reading the javadoc for what they do.
          Hide
          Mark Miller added a comment -

          And if we want to keep thinking of the complete moron, I'd make the same argument about mergeIfNeeded - the user might think these needs to be called all the time! My, it doesn't merge when needed? I better call this every couple seconds to make sure it merges when its needed! It's a ridiculous never ending path IMO.

          Show
          Mark Miller added a comment - And if we want to keep thinking of the complete moron, I'd make the same argument about mergeIfNeeded - the user might think these needs to be called all the time! My, it doesn't merge when needed? I better call this every couple seconds to make sure it merges when its needed! It's a ridiculous never ending path IMO.
          Hide
          Robert Muir added a comment -

          I think a lot of damage has been done by the name being optimize:
          e.g.: why does an incremental DIH update trigger this by default?

          what sense does this make?

          You can keep lying to yourself that the only problem is complete morons, i'm not buying it.

          Show
          Robert Muir added a comment - I think a lot of damage has been done by the name being optimize: e.g.: why does an incremental DIH update trigger this by default? what sense does this make? You can keep lying to yourself that the only problem is complete morons, i'm not buying it.
          Hide
          Shai Erera added a comment -

          So now we ended up with mergeIfNeeded and maybeMerge()? At the start of this issue, it looks like we agreed to consolidate all methods under a single maybeMerge(). Mike suggested to have two variants of this, one that doesn't take maxNumSegments and one that does ... I'm fine with that too, as long as we have a single name.

          I also agree with Mark, all these maybe's/ifNeeded (IR has these now too !) should be part of the Javadocs, not the method names. For instance, IW.rollback() closes the IndexWriter, but the method is not called rollbackAndClose. IMO, that is even more confusing than the IfNeeded versions, because I do not anticipate the instance to be closed just because I rolled-back changes.

          IndexReader.openIfNeeded in fact does reopen() (the old version), but it's not called reopenIfNeeded. So we force the users to read the javadocs in order to understand that openIfNeeded reuses the unchanged segments from the given IndexReader and only opens the new ones ...

          Names are hard, and I think we should opt for simple and intuitive ones. Javadocs should be used to clarify what the method does in more details. I personally was never confused by optimize(), so I don't mind if it's kept. But apparently others were confused by it (have no idea why) ...

          I don't mind maybeMerge() (perhaps because it's not a new API), but if we want to remove the maybe-ness, let's call it something like invokeMergePolicy() (with and without maxNumSegments)? We can replace invoke by some other verb, maybe consultMP / runMerges ...?

          Show
          Shai Erera added a comment - So now we ended up with mergeIfNeeded and maybeMerge()? At the start of this issue, it looks like we agreed to consolidate all methods under a single maybeMerge(). Mike suggested to have two variants of this, one that doesn't take maxNumSegments and one that does ... I'm fine with that too, as long as we have a single name. I also agree with Mark, all these maybe's/ifNeeded (IR has these now too !) should be part of the Javadocs, not the method names. For instance, IW.rollback() closes the IndexWriter, but the method is not called rollbackAndClose. IMO, that is even more confusing than the IfNeeded versions, because I do not anticipate the instance to be closed just because I rolled-back changes. IndexReader.openIfNeeded in fact does reopen() (the old version), but it's not called reopenIfNeeded. So we force the users to read the javadocs in order to understand that openIfNeeded reuses the unchanged segments from the given IndexReader and only opens the new ones ... Names are hard, and I think we should opt for simple and intuitive ones. Javadocs should be used to clarify what the method does in more details. I personally was never confused by optimize(), so I don't mind if it's kept. But apparently others were confused by it (have no idea why) ... I don't mind maybeMerge() (perhaps because it's not a new API), but if we want to remove the maybe-ness, let's call it something like invokeMergePolicy() (with and without maxNumSegments)? We can replace invoke by some other verb, maybe consultMP / runMerges ...?
          Hide
          Yonik Seeley added a comment -

          e.g.: why does an incremental DIH update trigger this by default?

          Not sure... If they actually meant for this to happen, this just reinforces my point that the name of the call is not the issue. The guys who wrote DIH absolutely know what optimize does - hence if it was renamed they would have simply called maybeMerge(1) or whatever today.

          At the time it was written, and depending on what it was used for, maybe it did make sense.

          You can keep lying to yourself that the only problem is complete morons, i'm not buying it.

          That's exactly it - it's not morons, so a simple name change won't really help. We need to document the current tradeoffs so people can make more informed decisions about when to optimize. And some people will still chose to maybeMerge(1) when others think maybe they shouldn't. That's OK.

          Show
          Yonik Seeley added a comment - e.g.: why does an incremental DIH update trigger this by default? Not sure... If they actually meant for this to happen, this just reinforces my point that the name of the call is not the issue. The guys who wrote DIH absolutely know what optimize does - hence if it was renamed they would have simply called maybeMerge(1) or whatever today. At the time it was written, and depending on what it was used for, maybe it did make sense. You can keep lying to yourself that the only problem is complete morons, i'm not buying it. That's exactly it - it's not morons, so a simple name change won't really help. We need to document the current tradeoffs so people can make more informed decisions about when to optimize. And some people will still chose to maybeMerge(1) when others think maybe they shouldn't. That's OK.
          Hide
          Robert Muir added a comment -

          At the time it was written, and depending on what it was used for, maybe it did make sense.

          You are missing the whole point, now it defaults to optimize on each incremental update, basically O(n^2) indexing performance,
          unless the user knows enough to know that optimize = shitty perf here and explicitly turn it off. and given its cool-sounding name,
          it discourages them from even questioning that it could be making things slow.

          Show
          Robert Muir added a comment - At the time it was written, and depending on what it was used for, maybe it did make sense. You are missing the whole point, now it defaults to optimize on each incremental update, basically O(n^2) indexing performance, unless the user knows enough to know that optimize = shitty perf here and explicitly turn it off. and given its cool-sounding name, it discourages them from even questioning that it could be making things slow.
          Hide
          Michael McCandless added a comment -

          How about the name "forceMerge(int)" instead?

          Fundamentally, this is a different operation from maybeMerge() because
          that method only does "natural" merges, ie ones that the MP has
          selected on its own.

          Whereas forceMerge means you are forcing the MP to do merging that it
          otherwise would not have naturally chosen to do.

          I don't like names like compact/defragment since they still imply this
          is a sort of necessary periodic maintenance that you are expected / need
          to call.

          The fact is, Lucene has made excellent progress on getting good
          performance on multi-segment indexes: Query rewriting (eg MTQ) and
          searching is per-segment. TieredMP now targets segments with
          deletions, and can merge out-of-order, etc. Reducing the index down
          to 1 segment is rarely justified given the cost (yes, there are times,
          like a fully static index, but this is rare).

          The goal here is to discourage "typical" users from calling
          optimize ("expert" users will of course find the method and use it,
          hopefully in the "right" cases).

          The API is badly trappy today; we've seen this over and over now (I
          just got a private email a few days ago... when I asked why they
          optimize after every "batch" they said "because it just seemed like
          the right thing to do"). We've all seen many users fall into this
          trap.

          We can try to debate why this is so... I don't think it's because they
          are "morons". I think there are many other explanations. EG, our own
          FAQs, javadocs, the Lucene in Action book, tutorials, etc., all
          frequently "suggested" optimize in the past. I think, also, users
          often don't realize Lucene has "segments" and that optimize means
          these segments are "fully rewritten" and that this then implies O(N^2)
          cost if you call after every doc/batch, etc. These things are obvious
          to Lucene developers, but not so to users.

          Show
          Michael McCandless added a comment - How about the name "forceMerge(int)" instead? Fundamentally, this is a different operation from maybeMerge() because that method only does "natural" merges, ie ones that the MP has selected on its own. Whereas forceMerge means you are forcing the MP to do merging that it otherwise would not have naturally chosen to do. I don't like names like compact/defragment since they still imply this is a sort of necessary periodic maintenance that you are expected / need to call. The fact is, Lucene has made excellent progress on getting good performance on multi-segment indexes: Query rewriting (eg MTQ) and searching is per-segment. TieredMP now targets segments with deletions, and can merge out-of-order, etc. Reducing the index down to 1 segment is rarely justified given the cost (yes, there are times, like a fully static index, but this is rare). The goal here is to discourage "typical" users from calling optimize ("expert" users will of course find the method and use it, hopefully in the "right" cases). The API is badly trappy today; we've seen this over and over now (I just got a private email a few days ago... when I asked why they optimize after every "batch" they said "because it just seemed like the right thing to do"). We've all seen many users fall into this trap. We can try to debate why this is so... I don't think it's because they are "morons". I think there are many other explanations. EG, our own FAQs, javadocs, the Lucene in Action book, tutorials, etc., all frequently "suggested" optimize in the past. I think, also, users often don't realize Lucene has "segments" and that optimize means these segments are "fully rewritten" and that this then implies O(N^2) cost if you call after every doc/batch, etc. These things are obvious to Lucene developers, but not so to users.
          Hide
          Michael McCandless added a comment -

          For instance, IW.rollback() closes the IndexWriter, but the method is not called rollbackAndClose.

          I agree, this is bad, because it's unexpected that the IW is closed: I
          think we should rename it to rollbackAndClose (I'll open a separate
          issue).

          Show
          Michael McCandless added a comment - For instance, IW.rollback() closes the IndexWriter, but the method is not called rollbackAndClose. I agree, this is bad, because it's unexpected that the IW is closed: I think we should rename it to rollbackAndClose (I'll open a separate issue).
          Hide
          Doron Cohen added a comment -

          Perhaps I am the only one, but I find these ifNeeded, mabyeThis, mabyeThat method names so ugly. I prefer JavaDoc for trying to catch the subtleties.

          I feel that way too.

          But a name change here seems in place, because as pointed above, there is an issue with current catchy name optimize().

          My personal preference between the names suggested above is Mike's last one: forceMerge(int):

          • it describes what's done
          • does not suggest to do wonders
          • requires caller to think twice because of deciding to force a certain behavior
          Show
          Doron Cohen added a comment - Perhaps I am the only one, but I find these ifNeeded, mabyeThis, mabyeThat method names so ugly. I prefer JavaDoc for trying to catch the subtleties. I feel that way too. But a name change here seems in place, because as pointed above, there is an issue with current catchy name optimize() . My personal preference between the names suggested above is Mike's last one: forceMerge(int) : it describes what's done does not suggest to do wonders requires caller to think twice because of deciding to force a certain behavior
          Hide
          Shai Erera added a comment -

          How about the name "forceMerge(int)" instead?

          Fundamentally, this is a different operation from maybeMerge() because
          that method only does "natural" merges, ie ones that the MP has
          selected on its own.

          Whereas forceMerge means you are forcing the MP to do merging that it
          otherwise would not have naturally chosen to do.

          I'm not sure that I agree ... I could set MP in such a way that forceMerge(1) would still do nothing. That's very simple in fact, and I do this today. I set LogMP's maxMergeMB(ForOptimize) to 4GB, which means that I never end up merging segments larger than that. I call optimize() whenever I can, but at some point, optimize will do nothing (if the indexing process stops), or after my index grew a lot, many segments won't be merged, and optimize/forceMerge(1) will actually end up with X+1 segments, where X is the number of segments that are too large for me to merge.

          Therefore I'm not sure that trading optimize for forceMerge is much better. Sure, it has a less cool name, but now I think it will be even more confusing, because I'll call forceMerge(1) and that won't do what I asked.

          I think that the problem is that we try to come up with names that reflect what API we IndexWriter should call on MP. That's why we try to distinguish between maybeMerge() and optimize(int). So maybe we should go for a more extreme change – how about having one method merge() which takes a MergePolicy with a single method findSegmentsForMerge(). We will provide MPs that are good for 'regular' merges and 'optimize' and the user can pass whatever he wishes to do. The user can also pass the same MP instance to IWC, and that will control the regular merges IW does from time to time (we default to a 'regular' merging MP).

          Just a thought.

          Show
          Shai Erera added a comment - How about the name "forceMerge(int)" instead? Fundamentally, this is a different operation from maybeMerge() because that method only does "natural" merges, ie ones that the MP has selected on its own. Whereas forceMerge means you are forcing the MP to do merging that it otherwise would not have naturally chosen to do. I'm not sure that I agree ... I could set MP in such a way that forceMerge(1) would still do nothing. That's very simple in fact, and I do this today. I set LogMP's maxMergeMB(ForOptimize) to 4GB, which means that I never end up merging segments larger than that. I call optimize() whenever I can, but at some point, optimize will do nothing (if the indexing process stops), or after my index grew a lot, many segments won't be merged, and optimize/forceMerge(1) will actually end up with X+1 segments, where X is the number of segments that are too large for me to merge. Therefore I'm not sure that trading optimize for forceMerge is much better. Sure, it has a less cool name, but now I think it will be even more confusing, because I'll call forceMerge(1) and that won't do what I asked. I think that the problem is that we try to come up with names that reflect what API we IndexWriter should call on MP. That's why we try to distinguish between maybeMerge() and optimize(int). So maybe we should go for a more extreme change – how about having one method merge() which takes a MergePolicy with a single method findSegmentsForMerge(). We will provide MPs that are good for 'regular' merges and 'optimize' and the user can pass whatever he wishes to do. The user can also pass the same MP instance to IWC, and that will control the regular merges IW does from time to time (we default to a 'regular' merging MP). Just a thought.
          Hide
          Mark Miller added a comment -

          I like forceMerge. I'm not opposed to renaming optimize, just one of the candidates so far and some of the new method names in general. Optimize is not the best description IMO regardless of how catchy it is. It's not really optimal for NRT as just one of many points...

          Show
          Mark Miller added a comment - I like forceMerge. I'm not opposed to renaming optimize, just one of the candidates so far and some of the new method names in general. Optimize is not the best description IMO regardless of how catchy it is. It's not really optimal for NRT as just one of many points...
          Show
          Michael McCandless added a comment - Some quick googling uncovers depressing examples of over-optimizing: https://jira.duraspace.org/browse/FCREPO-155 http://stackoverflow.com/questions/3912253/is-it-mandatory-to-optimize-the-lucene-index-after-write http://issues.liferay.com/browse/LPS-2944 http://download.oracle.com/docs/cd/E19316-01/820-7054/girqf/index.html https://issues.sonatype.org/browse/MNGECLIPSE-2359 http://blog.inflinx.com/tag/lucene That last one has this fun comment: // Lucene recommends calling optimize upon completion of indexing writer.optimize();
          Hide
          Mark Miller added a comment -

          I'm not sure sure that is a strong case.

          The ones that mention optimizing after loading all your data is practically what had been often recommended for some time. Can't say the same about optimizing after every add.

          However, up to most of Lucene 2, we still had JavaDoc that said:

          If an index will not have more documents added for a while and optimal search performance is desired, then the optimize method should be called before the index is closed.

          Based on that, I'd likely think I should optimize after bulk loading up all my data like one of those links asks about.

          The optimize javadoc itself even simply said:

          Requests an "optimize" operation on an index, priming the index for the fastest available search. Traditionally this has meant merging all segments into a single segment as is done in the default merge policy, but individaul merge policies may implement optimize in different ways.

          Since, much of this javadoc had gotten better. But it's no surprise that there are cases of confusion out there? Most of those are from before this javadoc was fixed - and even then the old code and javaodoc/advice are out there reverberating around on google.

          The situation with the javadoc is much better today - someone shouldn't need to ask those questions, or have those problems, but the javadoc used to be a trap in showing this great optimize method and not properly explaining or warning about its use.

          Creating method names for cowboy method calling coders that don't read javadoc seems like the wrong approach to me.

          Though I'm still +1 on renaming optimize to something more fitting.

          Show
          Mark Miller added a comment - I'm not sure sure that is a strong case. The ones that mention optimizing after loading all your data is practically what had been often recommended for some time. Can't say the same about optimizing after every add. However, up to most of Lucene 2, we still had JavaDoc that said: If an index will not have more documents added for a while and optimal search performance is desired, then the optimize method should be called before the index is closed. Based on that, I'd likely think I should optimize after bulk loading up all my data like one of those links asks about. The optimize javadoc itself even simply said: Requests an "optimize" operation on an index, priming the index for the fastest available search. Traditionally this has meant merging all segments into a single segment as is done in the default merge policy, but individaul merge policies may implement optimize in different ways. Since, much of this javadoc had gotten better. But it's no surprise that there are cases of confusion out there? Most of those are from before this javadoc was fixed - and even then the old code and javaodoc/advice are out there reverberating around on google. The situation with the javadoc is much better today - someone shouldn't need to ask those questions, or have those problems, but the javadoc used to be a trap in showing this great optimize method and not properly explaining or warning about its use. Creating method names for cowboy method calling coders that don't read javadoc seems like the wrong approach to me. Though I'm still +1 on renaming optimize to something more fitting.
          Hide
          Shai Erera added a comment -

          Well ... I don't like forceMerge for two reasons: (1) it may not actually force anything and (2) it takes a parameter maxNumSegments which is just one of the factors one would want to consider when doing optimize/merge. For instance, when I do index optimization, I cap the process by a time constraint and let it run until the time is exhausted. Given that today I can either call maybeMerge() or optimize(), I call optimize(), and I like it that I don't need to pass any 'fake' parameter, even though I'm aware that under the covers it does optimize(1).

          What about expungeDeletes? Why is it not called maybeExpungeDeletes? Because by tweaking MP settings I can make it leave some deletes in segments. And why isn't it called expungeDeletesThenMerge or expungeDeletesNoMerge (ala the now gone addIndexesNoOptimize)? Don't get me wrong (before anyone opens an issue to rename it too) - I like expungeDeletes!

          I feel that we mask from the user what happens under the covers when he calls any of the 3 methods (maybMerge, optimize, expungeDeletes). maybeMerge relate to mergeFactor as a bounding criteria (don't merge if there are aren't X segments at the same level), while optimize just uses it to determine how many segments to merge at once, and expungeDeletes ignores it altogether.

          So if MP determines so much what will happen when IndexWriter calls it, why hide it from the method call? Instead of setting an MP once on IWC and hope that the settings I've done will match any future call to one of these methods, why not allow the user to pass the desired MP for the action he wants to perform? That way we can focus MP implementations on specific tasks.

          Show
          Shai Erera added a comment - Well ... I don't like forceMerge for two reasons: (1) it may not actually force anything and (2) it takes a parameter maxNumSegments which is just one of the factors one would want to consider when doing optimize/merge. For instance, when I do index optimization, I cap the process by a time constraint and let it run until the time is exhausted. Given that today I can either call maybeMerge() or optimize(), I call optimize(), and I like it that I don't need to pass any 'fake' parameter, even though I'm aware that under the covers it does optimize(1). What about expungeDeletes? Why is it not called maybeExpungeDeletes? Because by tweaking MP settings I can make it leave some deletes in segments. And why isn't it called expungeDeletesThenMerge or expungeDeletesNoMerge (ala the now gone addIndexesNoOptimize)? Don't get me wrong (before anyone opens an issue to rename it too) - I like expungeDeletes! I feel that we mask from the user what happens under the covers when he calls any of the 3 methods (maybMerge, optimize, expungeDeletes). maybeMerge relate to mergeFactor as a bounding criteria (don't merge if there are aren't X segments at the same level), while optimize just uses it to determine how many segments to merge at once, and expungeDeletes ignores it altogether. So if MP determines so much what will happen when IndexWriter calls it, why hide it from the method call? Instead of setting an MP once on IWC and hope that the settings I've done will match any future call to one of these methods, why not allow the user to pass the desired MP for the action he wants to perform? That way we can focus MP implementations on specific tasks.
          Hide
          Michael McCandless added a comment -

          I could set MP in such a way that forceMerge(1) would still do nothing. That's very simple in fact, and I do this today.

          Sure, but remember: 1) this is the exception case (not the rule), and
          2) you are an expert user.

          We should name our APIs according to what they "typically" do, not by
          the exceptional cases, which can and should be handled by javadocs
          (for example, that you must close your IW if you hit OOME during
          addDocument).

          The expert users (like you) who are bumping up against these
          exceptions can easily understand and handle them.

          I think forceMerge(int) does a pretty good job explaining what the MP
          is going to try to do.

          I think that the problem is that we try to come up with names that reflect what API we IndexWriter should call on MP.

          Right, optimize() is really sugar for "invoking the current MP and do
          whatever it says".

          So maybe we should go for a more extreme change – how about having one method merge() which takes a MergePolicy with a single method findSegmentsForMerge().

          This is a great idea! But I think we should pursue it under a new
          issue, after renaming the optimize method? It's a bigger change.

          If we took this approach... I think IW would still need a "default MP"
          that it uses to kick off natural merges over time? (Ie, after a new
          segment is flushed).

          Alternatively, we could have this extra MP sit fully "outside" of IW,
          and so instead of calling IW.merge(MP) you'd call MP.merge(IW), and
          that "foreign" MP would register merges with IW? Still, it's gonna
          get tricky, how the "natural" MP interacts with this foreign MP.

          Or... maybe we remove IW.optimize, and instead open up a method on
          each MP impl (eg MP.forceMerge(int)), and you invoke this method on
          the MP instead? This way you still have the one MP, but IW doesn't
          need to expose hard-to-name sugar? Still sounds tricky though... the
          MP would ask IW to maybeMerge

          Show
          Michael McCandless added a comment - I could set MP in such a way that forceMerge(1) would still do nothing. That's very simple in fact, and I do this today. Sure, but remember: 1) this is the exception case (not the rule), and 2) you are an expert user. We should name our APIs according to what they "typically" do, not by the exceptional cases, which can and should be handled by javadocs (for example, that you must close your IW if you hit OOME during addDocument). The expert users (like you) who are bumping up against these exceptions can easily understand and handle them. I think forceMerge(int) does a pretty good job explaining what the MP is going to try to do. I think that the problem is that we try to come up with names that reflect what API we IndexWriter should call on MP. Right, optimize() is really sugar for "invoking the current MP and do whatever it says". So maybe we should go for a more extreme change – how about having one method merge() which takes a MergePolicy with a single method findSegmentsForMerge(). This is a great idea! But I think we should pursue it under a new issue, after renaming the optimize method? It's a bigger change. If we took this approach... I think IW would still need a "default MP" that it uses to kick off natural merges over time? (Ie, after a new segment is flushed). Alternatively, we could have this extra MP sit fully "outside" of IW, and so instead of calling IW.merge(MP) you'd call MP.merge(IW), and that "foreign" MP would register merges with IW? Still, it's gonna get tricky, how the "natural" MP interacts with this foreign MP. Or... maybe we remove IW.optimize, and instead open up a method on each MP impl (eg MP.forceMerge(int)), and you invoke this method on the MP instead? This way you still have the one MP, but IW doesn't need to expose hard-to-name sugar? Still sounds tricky though... the MP would ask IW to maybeMerge
          Hide
          Shai Erera added a comment -

          Sure, but remember: 1) this is the exception case (not the rule)

          I disagree ... I find myself more and more these days telling people to limit their merge size because of performance issues, whether it's for optimize/maybeMerge. Therefore I don't think it's the exception case, or will remain like that for long.

          I think forceMerge(int) does a pretty good job explaining what the MP is going to try to do.

          Is that a Javadoc statement? Because we could have just fixed optimize() javadocs without adding API that sort of commits to something that may not happen.

          How about naming it doMaintenance?

          If we took this approach... I think IW would still need a "default MP"
          that it uses to kick off natural merges over time? (Ie, after a new
          segment is flushed).
          

          Sure, we will provide the best MP for doing natural/regular merges which will be the default of IWC.

          I agree this route is bigger than just renaming optimize(), and I don't think that we need to change the interaction between IW and MP. But let's handle that in a separate issue.

          Show
          Shai Erera added a comment - Sure, but remember: 1) this is the exception case (not the rule) I disagree ... I find myself more and more these days telling people to limit their merge size because of performance issues, whether it's for optimize/maybeMerge. Therefore I don't think it's the exception case, or will remain like that for long. I think forceMerge(int) does a pretty good job explaining what the MP is going to try to do. Is that a Javadoc statement? Because we could have just fixed optimize() javadocs without adding API that sort of commits to something that may not happen. How about naming it doMaintenance? If we took this approach... I think IW would still need a " default MP" that it uses to kick off natural merges over time? (Ie, after a new segment is flushed). Sure, we will provide the best MP for doing natural/regular merges which will be the default of IWC. I agree this route is bigger than just renaming optimize(), and I don't think that we need to change the interaction between IW and MP. But let's handle that in a separate issue.
          Hide
          Michael McCandless added a comment -

          Sure, but remember: 1) this is the exception case (not the rule)

          I disagree ... I find myself more and more these days telling people to limit their merge size because of performance issues, whether it's for optimize/maybeMerge. Therefore I don't think it's the exception case, or will remain like that for long.

          But today, in 3.x or trunk (ie TieredMergePolicy), if you call
          forceMerge(N) this will in fact merge away until you have <= N
          segments.

          I think if you use either LogDoc/ByteSizeMergePolicy, forceMerge also
          does what it says. It's only if you change their maxMBForOptimize from
          the default, and you have a large enough index to hit that limit, that
          forceMerge(1) may in fact produce more than one segment.

          So, sure, if you go and change the settings, swap in a different
          MergePolicy, etc., you can make it so IW.forceMerge(int) does
          something totally different. But that's the exception, not the rule;
          that's what the "experts" do, not the normal users who use the
          defaults.

          I think forceMerge(int) does a pretty good job explaining what the MP is going to try to do.

          Is that a Javadoc statement? Because we could have just fixed optimize() javadocs without adding API that sort of commits to something that may not happen.

          I think in the javadocs we should explain that forceMerge just asks
          the MP to pick merges, passing the minSegmentCount, ie explain the
          "exception case" via javadocs and let the method name explain the
          common case. I think this is in general how we should name our
          methods...

          How about naming it doMaintenance?

          I don't really like that choice, for the same reason I don't like
          defragment/compact: it implies you (the app) are expected to
          periodically call it, whereas forced merging is very much an optional
          operation since Lucene works so well against multi-segment indexes
          these days.

          If we took this approach... I think IW would still need a "default MP" that it uses to kick off natural merges over time? (Ie, after a new segment is flushed).

          Sure, we will provide the best MP for doing natural/regular merges which will be the default of IWC.

          I agree this route is bigger than just renaming optimize(), and I don't think that we need to change the interaction between IW and MP. But let's handle that in a separate issue.

          Can you open a new issue so we can explore the foreign-MP idea?
          Replacing optimize/forceMerge and expungeDeletes with a new
          "merge(MP)" seems compelling.

          Let's leave this issue on the simple renaming....

          Show
          Michael McCandless added a comment - Sure, but remember: 1) this is the exception case (not the rule) I disagree ... I find myself more and more these days telling people to limit their merge size because of performance issues, whether it's for optimize/maybeMerge. Therefore I don't think it's the exception case, or will remain like that for long. But today, in 3.x or trunk (ie TieredMergePolicy), if you call forceMerge(N) this will in fact merge away until you have <= N segments. I think if you use either LogDoc/ByteSizeMergePolicy, forceMerge also does what it says. It's only if you change their maxMBForOptimize from the default, and you have a large enough index to hit that limit, that forceMerge(1) may in fact produce more than one segment. So, sure, if you go and change the settings, swap in a different MergePolicy, etc., you can make it so IW.forceMerge(int) does something totally different. But that's the exception, not the rule; that's what the "experts" do, not the normal users who use the defaults. I think forceMerge(int) does a pretty good job explaining what the MP is going to try to do. Is that a Javadoc statement? Because we could have just fixed optimize() javadocs without adding API that sort of commits to something that may not happen. I think in the javadocs we should explain that forceMerge just asks the MP to pick merges, passing the minSegmentCount, ie explain the "exception case" via javadocs and let the method name explain the common case. I think this is in general how we should name our methods... How about naming it doMaintenance? I don't really like that choice, for the same reason I don't like defragment/compact: it implies you (the app) are expected to periodically call it, whereas forced merging is very much an optional operation since Lucene works so well against multi-segment indexes these days. If we took this approach... I think IW would still need a "default MP" that it uses to kick off natural merges over time? (Ie, after a new segment is flushed). Sure, we will provide the best MP for doing natural/regular merges which will be the default of IWC. I agree this route is bigger than just renaming optimize(), and I don't think that we need to change the interaction between IW and MP. But let's handle that in a separate issue. Can you open a new issue so we can explore the foreign-MP idea? Replacing optimize/forceMerge and expungeDeletes with a new "merge(MP)" seems compelling. Let's leave this issue on the simple renaming....
          Hide
          Shai Erera added a comment -

          Can you open a new issue so we can explore the foreign-MP idea?

          I will.

          Show
          Shai Erera added a comment - Can you open a new issue so we can explore the foreign-MP idea? I will.
          Hide
          Michael McCandless added a comment -

          OK, LUCENE-3569 will explore the foreign MergePolicy approach.

          Back to this issue... we won't be able to find a name that everyone
          loves, of course (this is why naming is the hardest part!).

          But forceMerge got at least some traction (3 people OK'd it), and it
          does explain what you get from Lucene today, out of the box. I think
          it's a good improvement over what we have today (optimize). Progress
          not perfection...

          Shai, are you absolutely dead set against the name "forceMerge"? I
          mean it's clear you'd like to do a bigger change (LUCENE-3569), but in
          the mean time, forceMerge is at least better than optimize?

          And if you are dead set against it, can you enumerate some alternative
          names? We need to find a name that nobody hates (hopefully
          possible)... not one that everybody loves (not possible).

          Naming is the hardest part

          Show
          Michael McCandless added a comment - OK, LUCENE-3569 will explore the foreign MergePolicy approach. Back to this issue... we won't be able to find a name that everyone loves, of course (this is why naming is the hardest part!). But forceMerge got at least some traction (3 people OK'd it), and it does explain what you get from Lucene today, out of the box. I think it's a good improvement over what we have today (optimize). Progress not perfection... Shai, are you absolutely dead set against the name "forceMerge"? I mean it's clear you'd like to do a bigger change ( LUCENE-3569 ), but in the mean time, forceMerge is at least better than optimize? And if you are dead set against it, can you enumerate some alternative names? We need to find a name that nobody hates (hopefully possible)... not one that everybody loves (not possible). Naming is the hardest part
          Hide
          Shai Erera added a comment -

          Shai, are you absolutely dead set against the name "forceMerge"?

          No, I am not dead set against it. Feel free to proceed with forceMerge for now, because I don't want to hold up this issue.

          Obviously I'm on the minority side ...

          Show
          Shai Erera added a comment - Shai, are you absolutely dead set against the name "forceMerge"? No, I am not dead set against it. Feel free to proceed with forceMerge for now, because I don't want to hold up this issue. Obviously I'm on the minority side ...
          Hide
          Michael McCandless added a comment -

          OK thanks Shai. I'll work up a new patch...

          Show
          Michael McCandless added a comment - OK thanks Shai. I'll work up a new patch...
          Hide
          Michael McCandless added a comment -

          OK, new patch with "forceMerge". I think it's ready to commit!

          I left the MergePolicy method separate (renamed to findForcedMerges);
          I'm not sure we should merge it with the findMerges since that one
          finds "natural" merges. But we can explore on a new issue... this
          patch is immense and "rote" renaming so I plan commit soon...

          The patch should be applyable; I generated with "svn diff
          --show-copies-as-adds" (svn 1.7).

          Show
          Michael McCandless added a comment - OK, new patch with "forceMerge". I think it's ready to commit! I left the MergePolicy method separate (renamed to findForcedMerges); I'm not sure we should merge it with the findMerges since that one finds "natural" merges. But we can explore on a new issue... this patch is immense and "rote" renaming so I plan commit soon... The patch should be applyable; I generated with "svn diff --show-copies-as-adds" (svn 1.7).
          Hide
          Uwe Schindler added a comment -

          Bulk close after release of 3.5

          Show
          Uwe Schindler added a comment - Bulk close after release of 3.5

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development