Lucene - Core
  1. Lucene - Core
  2. LUCENE-4246

Fix IndexWriter.close() to not commit or wait for pending merges

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New
    1. LUCENE-4246.patch
      554 kB
      Michael McCandless
    2. LUCENE-4246.patch
      553 kB
      Michael McCandless

      Issue Links

        Activity

        Hide
        Shai Erera added a comment -

        Moving my comment from LUCENE-4245:

        Why is that different from OutputStream.close() calling flush()? Would you like to call flush() yourself everytime before you close()?

        I don't think there's anything we should 'fix' ...

        Perhaps, if you want to simplify, close() should just commit and NEVER wait for merges. If someone wants that, he can call commit() + waitForMerges().

        Show
        Shai Erera added a comment - Moving my comment from LUCENE-4245 : Why is that different from OutputStream.close() calling flush()? Would you like to call flush() yourself everytime before you close()? I don't think there's anything we should 'fix' ... Perhaps, if you want to simplify, close() should just commit and NEVER wait for merges. If someone wants that, he can call commit() + waitForMerges().
        Hide
        Uwe Schindler added a comment -

        Perhaps, if you want to simplify, close() should just commit and NEVER wait for merges. If someone wants that, he can call commit() + waitForMerges().

        Yes!

        I just repeat from the other issue: As CMS is involved, e.g. CMS.close() is interruptible, as it calls join() on all spawned merge threads after informing them to stop working. But as they are separate threads, there is some wait operation involved. We can simplify here, but we should still make IndexWriter.close() not interruptible.

        Show
        Uwe Schindler added a comment - Perhaps, if you want to simplify, close() should just commit and NEVER wait for merges. If someone wants that, he can call commit() + waitForMerges(). Yes! I just repeat from the other issue: As CMS is involved, e.g. CMS.close() is interruptible, as it calls join() on all spawned merge threads after informing them to stop working. But as they are separate threads, there is some wait operation involved. We can simplify here, but we should still make IndexWriter.close() not interruptible.
        Hide
        Robert Muir added a comment -

        Because is transactional. Outputstream is not

        Show
        Robert Muir added a comment - Because is transactional. Outputstream is not
        Hide
        Shai Erera added a comment -

        Still, I don't want to call commit(), then close() every time I want to close. And I think it will confuse users, i.e. we have close(), commit() and rollback(). We know what commit() and rollback() do, but what does close() without commit() does? Is it like rollback()? If so, should we remove rollback()?

        I think that close() doing commit() is fine. Even if it is transactional. Someone can always call rollback() if he wants to close w/o commit. That's the purpose of rollback().

        Show
        Shai Erera added a comment - Still, I don't want to call commit(), then close() every time I want to close. And I think it will confuse users, i.e. we have close(), commit() and rollback(). We know what commit() and rollback() do, but what does close() without commit() does? Is it like rollback()? If so, should we remove rollback()? I think that close() doing commit() is fine. Even if it is transactional. Someone can always call rollback() if he wants to close w/o commit. That's the purpose of rollback().
        Hide
        Michael McCandless added a comment -

        Perhaps, if you want to simplify, close() should just commit and NEVER wait for merges. If someone wants that, he can call commit() + waitForMerges().

        I think you mean waitForMerges(), then commit()? Else the merges aren't committed...

        I think this idea (close simply closes) makes sense! But:

        Would close() throw an exception if there were pending (uncommitted, un-rolled-back) changes? Ie to avoid the obvious trap of users on upgrading assuming close still implicitly calls commit else they silently lose their changes.

        Also if merges are still running I think close() should throw an exception, else CMS will be silently starved (never able to complete a large merge) for apps that open IW, add/update some docs, commit, close. We'd need to open up a "stopAllMerges" call.

        Show
        Michael McCandless added a comment - Perhaps, if you want to simplify, close() should just commit and NEVER wait for merges. If someone wants that, he can call commit() + waitForMerges(). I think you mean waitForMerges(), then commit()? Else the merges aren't committed... I think this idea (close simply closes) makes sense! But: Would close() throw an exception if there were pending (uncommitted, un-rolled-back) changes? Ie to avoid the obvious trap of users on upgrading assuming close still implicitly calls commit else they silently lose their changes. Also if merges are still running I think close() should throw an exception, else CMS will be silently starved (never able to complete a large merge) for apps that open IW, add/update some docs, commit, close. We'd need to open up a "stopAllMerges" call.
        Hide
        Robert Muir added a comment -

        Close should alias to rollback. Otherwise its broken. We
        should fix this before we have to support backwards... Committing shit on close is a massive bug, no other transactional apis act this way.

        If you are against this, fine, another option is to remove the transactional API.

        But the current situation is broken beyond words.

        Show
        Robert Muir added a comment - Close should alias to rollback. Otherwise its broken. We should fix this before we have to support backwards... Committing shit on close is a massive bug, no other transactional apis act this way. If you are against this, fine, another option is to remove the transactional API. But the current situation is broken beyond words.
        Hide
        Robert Muir added a comment -

        I don't think we need to worry too much about apps that close indexwriter each time they add a doc being starved. Such apps are likely optimizing each time too

        Show
        Robert Muir added a comment - I don't think we need to worry too much about apps that close indexwriter each time they add a doc being starved. Such apps are likely optimizing each time too
        Hide
        Uwe Schindler added a comment -

        I am fine with either proposal, but this has nothing to do with the interruptible bug, because close() has to still "wait" for CMS to shutdown (the mergeScheduler.close() call would keep alive, and that one is interruptible at the moment). My patch in the other issue ensures that the close of the scheduler and shutting down all its threads works, although some threads were interrupted by Jetty or any other automatic shutdown. Thats all this issue was opened, this here is not related at all.

        By this issue we just have less interruptible calls on close, but that can be done after discussion, so I want to fix issue LUCENE-4245.

        Show
        Uwe Schindler added a comment - I am fine with either proposal, but this has nothing to do with the interruptible bug, because close() has to still "wait" for CMS to shutdown (the mergeScheduler.close() call would keep alive, and that one is interruptible at the moment). My patch in the other issue ensures that the close of the scheduler and shutting down all its threads works, although some threads were interrupted by Jetty or any other automatic shutdown. Thats all this issue was opened, this here is not related at all. By this issue we just have less interruptible calls on close, but that can be done after discussion, so I want to fix issue LUCENE-4245 .
        Hide
        Robert Muir added a comment -

        I think its related: I think close() is too complicated today.

        In my opinion close() should just close resources: we should try to minimize what it does so it can easily be
        used in e.g. finally block.

        As far as commit() goes, I think you should always be explicit about that and not mix it up with close().

        As far as pending merges, if we want to have an option to do that, I would prefer it not be the default
        Closeable behavior of close(), instead something separate like shutdown().

        Show
        Robert Muir added a comment - I think its related: I think close() is too complicated today. In my opinion close() should just close resources: we should try to minimize what it does so it can easily be used in e.g. finally block. As far as commit() goes, I think you should always be explicit about that and not mix it up with close(). As far as pending merges, if we want to have an option to do that, I would prefer it not be the default Closeable behavior of close(), instead something separate like shutdown().
        Hide
        Robert Muir added a comment -

        By this issue we just have less interruptible calls on close, but that can be done after discussion, so I want to fix issue LUCENE-4245.

        I opened a separate issue because I think its related but separate. I think it would be good to fix LUCENE-4245 first as its easier, I just want an issue open for the bigger issue of what close() does.

        Show
        Robert Muir added a comment - By this issue we just have less interruptible calls on close, but that can be done after discussion, so I want to fix issue LUCENE-4245 . I opened a separate issue because I think its related but separate. I think it would be good to fix LUCENE-4245 first as its easier, I just want an issue open for the bigger issue of what close() does.
        Hide
        Uwe Schindler added a comment -

        OK. That's all I wanted to make the test failures go away!

        Show
        Uwe Schindler added a comment - OK. That's all I wanted to make the test failures go away!
        Hide
        Robert Muir added a comment -

        Would close() throw an exception if there were pending (uncommitted, un-rolled-back) changes? Ie to avoid the obvious trap of users on upgrading assuming close still implicitly calls commit else they silently lose their changes.

        I'm torn here mostly because rollback() closes the IndexWriter. if it didn't do this then the path to me seems clear.
        But since it does, its confusing.

        E.g., just looking at the JDBC Connection api as a parallel, its a little vague but I think some impls actually throw Exception here? officially its of course...

        "It is strongly recommended that an application explicitly commits or rolls back an active transaction prior to calling the close method. If the close method is called and there is an active transaction, the results are implementation-defined."

        Seeing that we used to commit() on close(), I think its probably best that we would throw an Exception (if possible, i have no idea at a glance how). Otherwise the behavior change would be a trap... this would make it a lot easier.

        We already do this if you entered the first phase of a two-phase commit.

        But I really don't think we should do anything with pending merges: like i said we could have a separate 'graceful slow shutdown' for that that waits for merges: its unrelated to transactional semantics.

        Is it like rollback()? If so, should we remove rollback()?

        I think we should leave ourselves open to the possibility that one day, maybe rollback() can be done without closing.

        Show
        Robert Muir added a comment - Would close() throw an exception if there were pending (uncommitted, un-rolled-back) changes? Ie to avoid the obvious trap of users on upgrading assuming close still implicitly calls commit else they silently lose their changes. I'm torn here mostly because rollback() closes the IndexWriter. if it didn't do this then the path to me seems clear. But since it does, its confusing. E.g., just looking at the JDBC Connection api as a parallel, its a little vague but I think some impls actually throw Exception here? officially its of course... "It is strongly recommended that an application explicitly commits or rolls back an active transaction prior to calling the close method. If the close method is called and there is an active transaction, the results are implementation-defined." Seeing that we used to commit() on close(), I think its probably best that we would throw an Exception (if possible, i have no idea at a glance how). Otherwise the behavior change would be a trap... this would make it a lot easier. We already do this if you entered the first phase of a two-phase commit. But I really don't think we should do anything with pending merges: like i said we could have a separate 'graceful slow shutdown' for that that waits for merges: its unrelated to transactional semantics. Is it like rollback()? If so, should we remove rollback()? I think we should leave ourselves open to the possibility that one day, maybe rollback() can be done without closing.
        Hide
        Shai Erera added a comment -

        I still don't understand why change behavior that is out there for ages. Everyone is used to calling close() and that commits things. I agree with you that if from the beginning people would need to call commit() before close(), then that's better. But I don't think there is a good justification to change current behavior.

        Now you're talking about detecting whether there were any changes or not – that doesn't simplify close() logic IMO.

        In short, I don't feel that changing current behavior is going to do any good, on the contrary, it would confuse people.

        Show
        Shai Erera added a comment - I still don't understand why change behavior that is out there for ages. Everyone is used to calling close() and that commits things. I agree with you that if from the beginning people would need to call commit() before close(), then that's better. But I don't think there is a good justification to change current behavior. Now you're talking about detecting whether there were any changes or not – that doesn't simplify close() logic IMO. In short, I don't feel that changing current behavior is going to do any good, on the contrary, it would confuse people.
        Hide
        Robert Muir added a comment -

        I still don't understand why change behavior that is out there for ages. Everyone is used to calling close() and that commits things.

        I'm not used to it: its awkward for a transactional API.

        I agree with you that if from the beginning people would need to call commit() before close(), then that's better.

        Then we should fix it! Just because "well its already this broken way", is no reason at all to not change things.
        We should fix things so they work in a sane way.

        Previous behavior doesn't matter at all here.

        Show
        Robert Muir added a comment - I still don't understand why change behavior that is out there for ages. Everyone is used to calling close() and that commits things. I'm not used to it: its awkward for a transactional API. I agree with you that if from the beginning people would need to call commit() before close(), then that's better. Then we should fix it! Just because "well its already this broken way", is no reason at all to not change things. We should fix things so they work in a sane way. Previous behavior doesn't matter at all here.
        Hide
        Yonik Seeley added a comment -

        But I don't think there is a good justification to change current behavior.

        +1

        We have a rollback call already. Changing the semantics of close() does not simplify anything (i.e. it doesn't make the issue that spawned this one any easier).

        Show
        Yonik Seeley added a comment - But I don't think there is a good justification to change current behavior. +1 We have a rollback call already. Changing the semantics of close() does not simplify anything (i.e. it doesn't make the issue that spawned this one any easier).
        Hide
        Robert Muir added a comment -

        Thats ok: resistance to change is just encouragement to me.

        "it used to work this way" is no good reason not to fix broken stuff.

        Show
        Robert Muir added a comment - Thats ok: resistance to change is just encouragement to me. "it used to work this way" is no good reason not to fix broken stuff.
        Hide
        Shai Erera added a comment -

        I disagree with you about OutStream.close not being transactional ... you know what, that's not even the point. You got used to the data being flushed when you call OS.close(). Same as you (or anyone else if you would like to pretend you're not used to it) got used to documents being committed when you close IndexWriter.

        There is no good justification changing that behavior for IndexWriter. It's not broken ... at least, I didn't hear anyone besides you claiming it is - on contrary, I hear more voices against this change.

        I think that that we've made changes to the API and Lucene code behavior too lightly in the last couple of years. When it's justified because e.g. a user might run himself into troubles, then you're right – that that the program worked like that doesn't mean it shouldn't change. But the change you're proposing here is likely to cause more damage (even if the damage is mostly less convenient coding) than be useful to anyone.

        IndexWriter.close() won't be simplified a lot, yet users will need to change their code and think what to do with exceptions like YouHaveUncommittedChangesException ...

        I repeat my proposal again – let's change close() to never wait for merges (i.e. remove the close(boolean) variant) – that IMO is behavior I doubt if people rely on (merges are a form of optimization, not program correctness and potential data loss).

        Show
        Shai Erera added a comment - I disagree with you about OutStream.close not being transactional ... you know what, that's not even the point. You got used to the data being flushed when you call OS.close(). Same as you (or anyone else if you would like to pretend you're not used to it) got used to documents being committed when you close IndexWriter. There is no good justification changing that behavior for IndexWriter. It's not broken ... at least, I didn't hear anyone besides you claiming it is - on contrary, I hear more voices against this change. I think that that we've made changes to the API and Lucene code behavior too lightly in the last couple of years. When it's justified because e.g. a user might run himself into troubles, then you're right – that that the program worked like that doesn't mean it shouldn't change. But the change you're proposing here is likely to cause more damage (even if the damage is mostly less convenient coding) than be useful to anyone. IndexWriter.close() won't be simplified a lot, yet users will need to change their code and think what to do with exceptions like YouHaveUncommittedChangesException ... I repeat my proposal again – let's change close() to never wait for merges (i.e. remove the close(boolean) variant) – that IMO is behavior I doubt if people rely on (merges are a form of optimization, not program correctness and potential data loss).
        Hide
        Uwe Schindler added a comment -

        I repeat my proposal again – let's change close() to never wait for merges (i.e. remove the close(boolean) variant) – that IMO is behavior I doubt if people rely on (merges are a form of optimization, not program correctness and potential data loss).

        +1 - If you look at my changes in LUCENE-4245 you will see that waitForMerges is a pain. I finally got it working that IW.close() now for sure cleans up (means IOUtils.close MergePolicy and MergeScheduler, whatever happened). The implicit flush() and commit() are hairy, but doable.

        Show
        Uwe Schindler added a comment - I repeat my proposal again – let's change close() to never wait for merges (i.e. remove the close(boolean) variant) – that IMO is behavior I doubt if people rely on (merges are a form of optimization, not program correctness and potential data loss). +1 - If you look at my changes in LUCENE-4245 you will see that waitForMerges is a pain. I finally got it working that IW.close() now for sure cleans up (means IOUtils.close MergePolicy and MergeScheduler, whatever happened). The implicit flush() and commit() are hairy, but doable.
        Hide
        Michael McCandless added a comment -

        I'm torn here mostly because rollback() closes the IndexWriter. if it didn't do this then the path to me seems clear.

        I think we could change rollback() to not close ... I can't remember offhand why it closes as a side effect.

        Show
        Michael McCandless added a comment - I'm torn here mostly because rollback() closes the IndexWriter. if it didn't do this then the path to me seems clear. I think we could change rollback() to not close ... I can't remember offhand why it closes as a side effect.
        Hide
        Michael McCandless added a comment -

        Pain or not, waiting for merges is important, to avoid the starvation
        case where the user never waits for merges and so the biggish ones
        never get a chance to complete.

        We can't make that the default: it's too trappy. Apps that open an
        IW, index the current batch of new or changed docs, close IW (a
        reasonable way to use IW), would at some point get waaay too many
        segments in their index.

        Or: maybe, close() could throw an exception if merges are still in
        progress (just like it should (I think) if pending changes weren't
        committed nor rolled back)? And, add a public killMerges method (just
        calls waitForMerges(false)). This way if you try to close without
        having first killed or waited for merges, the close fails and you see
        why.

        Show
        Michael McCandless added a comment - Pain or not, waiting for merges is important, to avoid the starvation case where the user never waits for merges and so the biggish ones never get a chance to complete. We can't make that the default: it's too trappy. Apps that open an IW, index the current batch of new or changed docs, close IW (a reasonable way to use IW), would at some point get waaay too many segments in their index. Or: maybe, close() could throw an exception if merges are still in progress (just like it should (I think) if pending changes weren't committed nor rolled back)? And, add a public killMerges method (just calls waitForMerges(false)). This way if you try to close without having first killed or waited for merges, the close fails and you see why.
        Hide
        Robert Muir added a comment -

        Shai: look at the JDBC connection API.

        I dont know what you are talking about saying OutputStream is transactional: Are you seriously joking?!

        Show
        Robert Muir added a comment - Shai: look at the JDBC connection API. I dont know what you are talking about saying OutputStream is transactional: Are you seriously joking?!
        Hide
        Robert Muir added a comment -

        Pain or not, waiting for merges is important, to avoid the starvation
        case where the user never waits for merges and so the biggish ones
        never get a chance to complete.

        We can't make that the default: it's too trappy. Apps that open an
        IW, index the current batch of new or changed docs, close IW (a
        reasonable way to use IW), would at some point get waaay too many
        segments in their index.

        I think its also trappy to wait for merges by default: many
        people have rapidly changing content and time to visibility
        is important.

        Unfortunately we know some apps close() the IndexWriter whenever
        they make changes visible: you can tell me these apps are broken,
        etc, etc, but our PMC just released one of those this weekend.

        Show
        Robert Muir added a comment - Pain or not, waiting for merges is important, to avoid the starvation case where the user never waits for merges and so the biggish ones never get a chance to complete. We can't make that the default: it's too trappy. Apps that open an IW, index the current batch of new or changed docs, close IW (a reasonable way to use IW), would at some point get waaay too many segments in their index. I think its also trappy to wait for merges by default: many people have rapidly changing content and time to visibility is important. Unfortunately we know some apps close() the IndexWriter whenever they make changes visible: you can tell me these apps are broken, etc, etc, but our PMC just released one of those this weekend.
        Hide
        Shai Erera added a comment -

        Shai: look at the JDBC connection API.

        I'm arguing about changing current behavior with no apparent gains. And I'm not sure that I want to compare IW API to JDBC ... to me they are not the same things, but this may be just semantics.

        I dont know what you are talking about saying OutputStream is transactional

        I didn't say OS is transactional, but rather that it's not the point. The point is changing API that works one way for years, which users rely on its behavior for years, their apps will be broken in who knows how many ways etc etc ... again, for no apparent gains.

        This is what I object to – changing semantics of API which people are happy with and used to it. If it's so important to you, you can make a new API closeNoCommit(). Today it's rollback(), but if that ever changes, we have an explicit API.

        And I don't understand if you argue for the sake arguing, or you really believe there is anyone out there intending to close IW without committing the changes? Do you intend to do it?

        Show
        Shai Erera added a comment - Shai: look at the JDBC connection API. I'm arguing about changing current behavior with no apparent gains. And I'm not sure that I want to compare IW API to JDBC ... to me they are not the same things, but this may be just semantics. I dont know what you are talking about saying OutputStream is transactional I didn't say OS is transactional, but rather that it's not the point. The point is changing API that works one way for years, which users rely on its behavior for years, their apps will be broken in who knows how many ways etc etc ... again, for no apparent gains. This is what I object to – changing semantics of API which people are happy with and used to it. If it's so important to you, you can make a new API closeNoCommit(). Today it's rollback(), but if that ever changes, we have an explicit API. And I don't understand if you argue for the sake arguing, or you really believe there is anyone out there intending to close IW without committing the changes? Do you intend to do it?
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Mark Miller added a comment -

        you really believe there is anyone out there intending to close IW without committing the changes?

        I do it, but I use rollback for this...

        Show
        Mark Miller added a comment - you really believe there is anyone out there intending to close IW without committing the changes? I do it, but I use rollback for this...
        Hide
        Michael McCandless added a comment -

        I think a good compromise here is for close to throw an exception if merges are still running or if there are uncommitted changes. We'd also add an "abortRunningMerges()".

        This way the app can be explicit about discarding changes, aborting merges, and separate that from close(), which becomes a fast operation.

        Show
        Michael McCandless added a comment - I think a good compromise here is for close to throw an exception if merges are still running or if there are uncommitted changes. We'd also add an "abortRunningMerges()". This way the app can be explicit about discarding changes, aborting merges, and separate that from close(), which becomes a fast operation.
        Hide
        Shai Erera added a comment -

        Close() throws an exception – that's just for back-compat right? I.e., the app can't do anything with that exception except either calling commit() (that's the back-compat part) or rollback() (if it needs to close fast). And abortRunningMerges is just like close(false) today (without the commit())?

        I mean, if an app wants to commit + wait + close, the 1 line close() becomes wait(), commit() and close(). And just like I made a mistake saying you should call commit() then wait(), I'm sure others will make that mistake too.

        So again, why go to great lengths to complicate the lives of the innocent developer? Can't we just decide (I think that's the 3rd time I'm proposing it) to never wait for merges on close(), and keep close() committing changes? Would close() be really simplified if it will need to detect running merges and uncommitted changes?

        Maybe someone puts up a patch with the changes to IW only, so we can review? I personally don't see the gain in changing the behavior of close(), but maybe a patch will clarify the gains ...

        Show
        Shai Erera added a comment - Close() throws an exception – that's just for back-compat right? I.e., the app can't do anything with that exception except either calling commit() (that's the back-compat part) or rollback() (if it needs to close fast). And abortRunningMerges is just like close(false) today (without the commit())? I mean, if an app wants to commit + wait + close, the 1 line close() becomes wait(), commit() and close(). And just like I made a mistake saying you should call commit() then wait(), I'm sure others will make that mistake too. So again, why go to great lengths to complicate the lives of the innocent developer? Can't we just decide (I think that's the 3rd time I'm proposing it) to never wait for merges on close(), and keep close() committing changes? Would close() be really simplified if it will need to detect running merges and uncommitted changes? Maybe someone puts up a patch with the changes to IW only, so we can review? I personally don't see the gain in changing the behavior of close(), but maybe a patch will clarify the gains ...
        Hide
        Michael McCandless added a comment -

        Can't we just decide (I think that's the 3rd time I'm proposing it) to never wait for merges on close(), and keep close() committing changes?

        I don't think we can never wait for merges on close.

        That can easily lead to an accidental "denial of service attack on big merges", which would be an awful trap. Ie, a big merge kicks off, but never has a chance to finish because the app closes/opens new IWs frequently. Then every IW that's opened will restart the merge, spend CPU/IO resources, only to abort when the IW is closed.

        I've never liked that close "hides" this wait-for-massive-merge to finish, but I also don't like this "just abort the massive merge" solution: it would create a nasty trap. I would prefer that it's explicit (abortMerges or waitForMerges).

        Show
        Michael McCandless added a comment - Can't we just decide (I think that's the 3rd time I'm proposing it) to never wait for merges on close(), and keep close() committing changes? I don't think we can never wait for merges on close. That can easily lead to an accidental "denial of service attack on big merges", which would be an awful trap. Ie, a big merge kicks off, but never has a chance to finish because the app closes/opens new IWs frequently. Then every IW that's opened will restart the merge, spend CPU/IO resources, only to abort when the IW is closed. I've never liked that close "hides" this wait-for-massive-merge to finish, but I also don't like this "just abort the massive merge" solution: it would create a nasty trap. I would prefer that it's explicit (abortMerges or waitForMerges).
        Hide
        Shai Erera added a comment -

        Today a simple app just calls close(). Now you propose that it will call close(), but try-catch two exceptions RunningMerges and UncommittedChanges. Then the poor developer will need to decide what to do with them .. should he call rollback()? should he call commit()? then close(), only to try-catch those two ex, now documenting "it's fine now"?

        While the app today can choose to call waitForMerges, or close(false), or commit, then close.

        The changes will make the code more verbose and I'm afraid will raise the bar for simple apps. Aren't you the one that always pushes for simple, more approachable API?

        And with that solution, what would rollback() do? Unless we change rollback to not close, it's just an alias, as Robert put it, to close, only it doesn't throw the two new exceptions?

        Show
        Shai Erera added a comment - Today a simple app just calls close(). Now you propose that it will call close(), but try-catch two exceptions RunningMerges and UncommittedChanges. Then the poor developer will need to decide what to do with them .. should he call rollback()? should he call commit()? then close(), only to try-catch those two ex, now documenting "it's fine now"? While the app today can choose to call waitForMerges, or close(false), or commit, then close. The changes will make the code more verbose and I'm afraid will raise the bar for simple apps. Aren't you the one that always pushes for simple, more approachable API? And with that solution, what would rollback() do? Unless we change rollback to not close, it's just an alias, as Robert put it, to close, only it doesn't throw the two new exceptions?
        Hide
        Michael McCandless added a comment -

        I agree it'd make IndexWriter more verbose for simple apps, which is not great, but it'd also make it more transparent, which may be the right tradeoff here. It's important for apps to know that IW.waitForMerges can sometimes take a very long time. Also, in issues like LUCENE-4638, it'd be clearer what went wrong, ie Mark would have hit the exception during commit().

        Those two exceptions from IW.close() are there to catch coding errors, ie the app was not explicit about whether it wanted to abort merges and discard changes. What to do about those exceptions would be clear: you either rollback or commit, and you either waitForMerges or abortMerges, before calling close.

        close() does too many things now, and they are surprising. Maybe we should rename it to a new method to make it clear it "waits for merges, commits, closes", and then shift to the new semantics for close.

        If we did this change, rollback() would discard changes since the last commit but NOT close.

        Show
        Michael McCandless added a comment - I agree it'd make IndexWriter more verbose for simple apps, which is not great, but it'd also make it more transparent, which may be the right tradeoff here. It's important for apps to know that IW.waitForMerges can sometimes take a very long time. Also, in issues like LUCENE-4638 , it'd be clearer what went wrong, ie Mark would have hit the exception during commit(). Those two exceptions from IW.close() are there to catch coding errors, ie the app was not explicit about whether it wanted to abort merges and discard changes. What to do about those exceptions would be clear: you either rollback or commit, and you either waitForMerges or abortMerges, before calling close. close() does too many things now, and they are surprising. Maybe we should rename it to a new method to make it clear it "waits for merges, commits, closes", and then shift to the new semantics for close. If we did this change, rollback() would discard changes since the last commit but NOT close.
        Hide
        Mark Miller added a comment -

        What I'm really dying for is just a close that is sure to close. Whatever problems it encounters, whatever goes wrong - sure, I'd like to know about it - but please still close and release the writer lock. I know my index will still be good up till the last commit and I can open a new IndexWriter. I really need this for some tests - mainly because jetty and/or the test framework can interrupt threads - this can cause interruption exceptions and channel interruption exceptions that kill the close. For a std interruption exception I seem to be able to retry close and succeed - but not a channel interruption exception.

        Show
        Mark Miller added a comment - What I'm really dying for is just a close that is sure to close. Whatever problems it encounters, whatever goes wrong - sure, I'd like to know about it - but please still close and release the writer lock. I know my index will still be good up till the last commit and I can open a new IndexWriter. I really need this for some tests - mainly because jetty and/or the test framework can interrupt threads - this can cause interruption exceptions and channel interruption exceptions that kill the close. For a std interruption exception I seem to be able to retry close and succeed - but not a channel interruption exception.
        Hide
        Robert Muir added a comment -

        What I'm really dying for is just a close that is sure to close.

        And thats all close() should do!

        close() should throw IOException (not other things)
        close() shouldnt wait on background merges
        close() definitely shouldnt commit anything.

        I don't understand why this is so controversial. Just look at the java.io.Closeable semantics if you want to understand how users expect close() to work.

        Show
        Robert Muir added a comment - What I'm really dying for is just a close that is sure to close. And thats all close() should do! close() should throw IOException (not other things) close() shouldnt wait on background merges close() definitely shouldnt commit anything. I don't understand why this is so controversial. Just look at the java.io.Closeable semantics if you want to understand how users expect close() to work.
        Hide
        Mark Miller added a comment -

        See LUCENE-4638 - I really want a close that means close. If this issue is too tied up, I'd like to start another issue that at least accomplish that goal.

        Show
        Mark Miller added a comment - See LUCENE-4638 - I really want a close that means close. If this issue is too tied up, I'd like to start another issue that at least accomplish that goal.
        Hide
        Robert Muir added a comment -

        Well i think this issue compounds the situation. close() is doing so many things that its actually buggy:
        its not releasing resources (the lock). Releasing resources is the purpose of close()

        Show
        Robert Muir added a comment - Well i think this issue compounds the situation. close() is doing so many things that its actually buggy: its not releasing resources (the lock). Releasing resources is the purpose of close()
        Hide
        Michael McCandless added a comment -

        Maybe we could rename the current "close" to "commitAndClose", which would wait on merges (maybe take a boolean to instead abort merges), commit changes, close (for the simplicty/convenience for basic apps).

        And then "close" simply closes.

        The only issue is this is a trappy migration path for apps ... if they miss the back-compat-break advertisement that IW.close no longer commits then they silently lose changes.

        Show
        Michael McCandless added a comment - Maybe we could rename the current "close" to "commitAndClose", which would wait on merges (maybe take a boolean to instead abort merges), commit changes, close (for the simplicty/convenience for basic apps). And then "close" simply closes. The only issue is this is a trappy migration path for apps ... if they miss the back-compat-break advertisement that IW.close no longer commits then they silently lose changes.
        Hide
        Robert Muir added a comment -

        No, the issue is that close() doesn't do its #1 job.

        The rest of what its doing, it does not need to do.

        Show
        Robert Muir added a comment - No, the issue is that close() doesn't do its #1 job. The rest of what its doing, it does not need to do.
        Hide
        Mark Miller added a comment -

        +1 on addressing this issue!

        Can I just point out that the 'proper' way to close an IndexWriter is perhaps:

         while (true) {
              try {
                super.close();
              } catch (ThreadInterruptedException e) {
                // don't allow interruption
                continue;
              } catch (Exception e) {
                iw.rollback(); // mabye
              }
              break;
            }
        

        Except we don't know if rollback is good enough there yet, and it's a little scary to have the while loop on the interrupted exception without a timeout. Writing a line of code to close the IW is an expert task if you want to get it right. If its currently possible to get it right.

        So perhaps thats what you need for a safe close. How many people do you think are doing that? Can we agree that it's insane that you do have to do that? Have you ever seen anything like it?

        Let's come to some compromise on exactly what close does so that we can un crazify it.

        Show
        Mark Miller added a comment - +1 on addressing this issue! Can I just point out that the 'proper' way to close an IndexWriter is perhaps: while ( true ) { try { super .close(); } catch (ThreadInterruptedException e) { // don't allow interruption continue ; } catch (Exception e) { iw.rollback(); // mabye } break ; } Except we don't know if rollback is good enough there yet, and it's a little scary to have the while loop on the interrupted exception without a timeout. Writing a line of code to close the IW is an expert task if you want to get it right. If its currently possible to get it right. So perhaps thats what you need for a safe close. How many people do you think are doing that? Can we agree that it's insane that you do have to do that? Have you ever seen anything like it? Let's come to some compromise on exactly what close does so that we can un crazify it.
        Hide
        Mark Miller added a comment -

        Sorry - that's not even right for close - I forgot to put in an IW#unlock call a finally around this. I think thats all you need...perhaps also a hammer...and a chainsaw...

        Show
        Mark Miller added a comment - Sorry - that's not even right for close - I forgot to put in an IW#unlock call a finally around this. I think thats all you need...perhaps also a hammer...and a chainsaw...
        Hide
        Mark Miller added a comment -

        Wait...you also need a try catch around the IW#unlock in case its a native lock and you cant actually release it...

        Show
        Mark Miller added a comment - Wait...you also need a try catch around the IW#unlock in case its a native lock and you cant actually release it...
        Hide
        Michael McCandless added a comment -

        The proper way to close IW today is:

        try {
          writer.close();
        } catch (Throwable t) {
          writer.rollback();
        }
        
        Show
        Michael McCandless added a comment - The proper way to close IW today is: try { writer.close(); } catch (Throwable t) { writer.rollback(); }
        Hide
        Mark Miller added a comment - - edited

        Why don't the docs say that then? I'm not even confident that will work.

        But look at the docs - it's telling you to keep calling close, its trying to get you to force unlock in a finally - it doesnt even mention rollback.

        And are you sure rollback will not be interrupted either? Rollback throws IOException and you are not dealing with it. If it's really that simple, why has the javadocs evolved into a mess of bad recommendations and information?

        And if that is the proper way, why doesnt writer.close do that for the user and then throw an exception?

        These are crazy semantics.

        Show
        Mark Miller added a comment - - edited Why don't the docs say that then? I'm not even confident that will work. But look at the docs - it's telling you to keep calling close, its trying to get you to force unlock in a finally - it doesnt even mention rollback. And are you sure rollback will not be interrupted either? Rollback throws IOException and you are not dealing with it. If it's really that simple, why has the javadocs evolved into a mess of bad recommendations and information? And if that is the proper way, why doesnt writer.close do that for the user and then throw an exception? These are crazy semantics.
        Hide
        Robert Muir added a comment -

        These are crazy semantics.

        +1

        The proper way to close IW today is:

        try

        Unknown macro: { writer.close(); }

        catch (Throwable t)

        Unknown macro: { writer.rollback(); }

        Then we should remove java.io.Closeable interface from IndexWriter. This is especially bad for java7 users who have syntactical support from the language for doing things "the wrong way".

        The proper way to close IndexWriter should be: writer.close()

        Show
        Robert Muir added a comment - These are crazy semantics. +1 The proper way to close IW today is: try Unknown macro: { writer.close(); } catch (Throwable t) Unknown macro: { writer.rollback(); } Then we should remove java.io.Closeable interface from IndexWriter. This is especially bad for java7 users who have syntactical support from the language for doing things "the wrong way". The proper way to close IndexWriter should be: writer.close()
        Hide
        Uwe Schindler added a comment - - edited

        Then we should remove java.io.Closeable interface from IndexWriter. This is especially bad for java7 users who have syntactical support from the language for doing things "the wrong way".

        Let's make IndexWrite do nothing additional like merging or similar stuff. I already fixed some leaks in the past, but the ones discussed here are more crazy (like InterruptedException handling).

        -1 to remove Closeable, if the semantics of closeable are correct. This is currently not the case, but one we fix to close without committing and so on, we are fine. The syntactic sugar here is good, because it warns you (when writing tests) that you miss to close. Somebody using a try-with-resources block is a different story, but if somebody wants to do this he can really do that (and its useful for lots of tasks). It is just the "one long running thread keeps IndexWriter open and indexes documents that you may think of. But when its run() method ends, the IndexWriter should be closed, too. So Closeable is perfectly fine. And we cannot prevent people from writing ineffective code, but those people would not use try-with-resources at all.

        Show
        Uwe Schindler added a comment - - edited Then we should remove java.io.Closeable interface from IndexWriter. This is especially bad for java7 users who have syntactical support from the language for doing things "the wrong way". Let's make IndexWrite do nothing additional like merging or similar stuff. I already fixed some leaks in the past, but the ones discussed here are more crazy (like InterruptedException handling). -1 to remove Closeable, if the semantics of closeable are correct. This is currently not the case, but one we fix to close without committing and so on, we are fine. The syntactic sugar here is good, because it warns you (when writing tests) that you miss to close. Somebody using a try-with-resources block is a different story, but if somebody wants to do this he can really do that (and its useful for lots of tasks). It is just the "one long running thread keeps IndexWriter open and indexes documents that you may think of. But when its run() method ends, the IndexWriter should be closed, too. So Closeable is perfectly fine. And we cannot prevent people from writing ineffective code, but those people would not use try-with-resources at all.
        Hide
        Robert Muir added a comment -

        I agree Uwe. I'm just saying that we should either:

        1. fix IndexWriter semantics to be sane so close() actually and only close()s
        2. don't fix crazy semantics, and remove Closeable.

        The first solution is greatly preferred.
        But we shouldn't leave it like today: with the current code I don't think having Closeable is appropriate.

        Show
        Robert Muir added a comment - I agree Uwe. I'm just saying that we should either: fix IndexWriter semantics to be sane so close() actually and only close()s don't fix crazy semantics, and remove Closeable. The first solution is greatly preferred. But we shouldn't leave it like today: with the current code I don't think having Closeable is appropriate.
        Hide
        Uwe Schindler added a comment -

        Robert: Yes it was not 100% clear to me what your comment wanted to say. close() should do at least as possible. And if you use try-with-resources and forget to commit, the cleanup (similar to our IOUtils) would close IW correctly, but don't commit. The use then knows that something is wrong. By the way, this is similar to SQL and transactions with try-with-resources. There were lengthly discussion about that on the JDK mailing list before Java 7. Although close() may do more than just closing (if autocommit is enabled, it may also commit), they decided to add Closeable (e.g., see http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html) - but the discussion here was similar.

        Show
        Uwe Schindler added a comment - Robert: Yes it was not 100% clear to me what your comment wanted to say. close() should do at least as possible. And if you use try-with-resources and forget to commit, the cleanup (similar to our IOUtils) would close IW correctly, but don't commit. The use then knows that something is wrong. By the way, this is similar to SQL and transactions with try-with-resources. There were lengthly discussion about that on the JDK mailing list before Java 7. Although close() may do more than just closing (if autocommit is enabled, it may also commit), they decided to add Closeable (e.g., see http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html ) - but the discussion here was similar.
        Hide
        Michael McCandless added a comment -

        The proper way to close IndexWriter should be: writer.close()

        But I don't like the trap this sets up on apps that upgrade ... so if
        there are pending uncommitted changes, IW.close should throw an exc
        (IllegalStateExc?). Ie, the app must be explicit about discarding or
        committing those changes, before calling close.

        Separately, if there are running merges then we should throw a similar
        exception, otherwise we set up a denial-of-service on biggish merges
        trap: for apps that open a writer, index a batch of docs, and close
        the writer, a given biggish merge would forever kick off when the
        writer is opened and then be aborted when it's closed, wasting CPU/IO
        and never finishing the merge.

        If we handle those two traps then I'm OK with making close "just
        close". The "proper" shutdown sequence for IW would then be:

        w.waitForMerges() OR w.abortMerges()
        w.commit() OR w.rollback()
        w.close()
        

        (And we'd fix rollback to not close).

        Shai objected to how verbose this is for "normal" usage, and I agree,
        so I proposed adding a sugar mehod "commitAndClose" that would just
        call waitForMerges(), commit(), close(). I was hoping to hear from
        Shai whether that's an OK compromise...

        Although close() may do more than just closing (if autocommit is enabled, it may also commit), they decided to add Closeable (e.g., see http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html) - but the discussion here was similar.

        That's an interesting precedent. Uwe do you a pointer to the
        discussion?

        So I think another option here is to leave close as is (it waits for
        merges, commits) except, if an exception is thrown, it suppresses that
        exception, finishes closing, and then throws it. Like IOUtils.close
        ...

        This way on calling IW.close(), if an exception is thrown, the IW will
        in fact have been closed / relased its lock / etc., and then the app
        sees the first exception that was hit while closing.

        Hmm... does java.io.Closeable document semantics on exception? Ie if
        I call RAF.close and hit an exception, is it "really closed"? It does
        document that calling it more than once is fine ...

        Show
        Michael McCandless added a comment - The proper way to close IndexWriter should be: writer.close() But I don't like the trap this sets up on apps that upgrade ... so if there are pending uncommitted changes, IW.close should throw an exc (IllegalStateExc?). Ie, the app must be explicit about discarding or committing those changes, before calling close. Separately, if there are running merges then we should throw a similar exception, otherwise we set up a denial-of-service on biggish merges trap: for apps that open a writer, index a batch of docs, and close the writer, a given biggish merge would forever kick off when the writer is opened and then be aborted when it's closed, wasting CPU/IO and never finishing the merge. If we handle those two traps then I'm OK with making close "just close". The "proper" shutdown sequence for IW would then be: w.waitForMerges() OR w.abortMerges() w.commit() OR w.rollback() w.close() (And we'd fix rollback to not close). Shai objected to how verbose this is for "normal" usage, and I agree, so I proposed adding a sugar mehod "commitAndClose" that would just call waitForMerges(), commit(), close(). I was hoping to hear from Shai whether that's an OK compromise... Although close() may do more than just closing (if autocommit is enabled, it may also commit), they decided to add Closeable (e.g., see http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html ) - but the discussion here was similar. That's an interesting precedent. Uwe do you a pointer to the discussion? So I think another option here is to leave close as is (it waits for merges, commits) except, if an exception is thrown, it suppresses that exception, finishes closing, and then throws it. Like IOUtils.close ... This way on calling IW.close(), if an exception is thrown, the IW will in fact have been closed / relased its lock / etc., and then the app sees the first exception that was hit while closing. Hmm... does java.io.Closeable document semantics on exception? Ie if I call RAF.close and hit an exception, is it "really closed"? It does document that calling it more than once is fine ...
        Hide
        Uwe Schindler added a comment -

        Although close() may do more than just closing (if autocommit is enabled, it may also commit), they decided to add Closeable (e.g., see http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html) - but the discussion here was similar.

        That's an interesting precedent. Uwe do you a pointer to the discussion?

        I am digging... I have no pointer at the moment, I just read blog posts and followed some mailing lists at that time, but that's now almost 2 years ago. The SQL stuff had more problems, which lead finally to the new interface AutoCloseable in Java 7 (java.io.Closeable extends java.lang.AutoCloseable). The reason is: SQLException does not extend IOException so the already existent java.io.Closeable was not applicable, so the new java.lang.Autocloseable was defined to throw any Exception subclass on its close() method, just java.io.Closeable limits this to IOException by subclassing. The try-with-resources magic in Java 7 only affects AUtocloseable, but as Closeable extends Autocloseable, it also applies to Closeable.

        Unless we move to Java 7 we are also limited to only throw IOException on close(), no other checked Exception. In Java 7 we could make IndexWriter implement AutoCloseable and let it throw any other Exception, too.

        One important information about AutoCloseable from JavaDocs: "Implementers of this interface are also strongly advised to not have the close method throw InterruptedException. This exception interacts with a thread's interrupted status, and runtime misbehavior is likely to occur if an InterruptedException is suppressed. More generally, if it would cause problems for an exception to be suppressed, the AutoCloseable.close method should not throw it. Note that unlike the close method of Closeable, this close method is not required to be idempotent. In other words, calling this close method more than once may have some visible side effect, unlike Closeable.close which is required to have no effect if called more than once. However, implementers of this interface are strongly encouraged to make their close methods idempotent."

        This last sentence explains why java.sql.Connection is not required to be idempotent (see below).

        So I think another option here is to leave close as is (it waits for merges, commits) except, if an exception is thrown, it suppresses that exception, finishes closing, and then throws it. Like IOUtils.close.

        I am fine with that, only that it must record suppressed Exceptions (unfortunately the IOUtils in Lucene is in most cases used not in the optimal way, so the supressed exceptions get lost).

        Hmm... does java.io.Closeable document semantics on exception? Ie if I call RAF.close and hit an exception, is it "really closed"? It does document that calling it more than once is fine ...

        It says that you may call close() multiple times, but later calls mst be side-effect free, so after the first call to close it must be closed (otherwise later calls would have side-effects), although an IOException is thrown.

        Show
        Uwe Schindler added a comment - Although close() may do more than just closing (if autocommit is enabled, it may also commit), they decided to add Closeable (e.g., see http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html ) - but the discussion here was similar. That's an interesting precedent. Uwe do you a pointer to the discussion? I am digging... I have no pointer at the moment, I just read blog posts and followed some mailing lists at that time, but that's now almost 2 years ago. The SQL stuff had more problems, which lead finally to the new interface AutoCloseable in Java 7 (java.io.Closeable extends java.lang.AutoCloseable). The reason is: SQLException does not extend IOException so the already existent java.io.Closeable was not applicable, so the new java.lang.Autocloseable was defined to throw any Exception subclass on its close() method, just java.io.Closeable limits this to IOException by subclassing. The try-with-resources magic in Java 7 only affects AUtocloseable, but as Closeable extends Autocloseable, it also applies to Closeable. Unless we move to Java 7 we are also limited to only throw IOException on close(), no other checked Exception. In Java 7 we could make IndexWriter implement AutoCloseable and let it throw any other Exception, too. One important information about AutoCloseable from JavaDocs: "Implementers of this interface are also strongly advised to not have the close method throw InterruptedException. This exception interacts with a thread's interrupted status, and runtime misbehavior is likely to occur if an InterruptedException is suppressed. More generally, if it would cause problems for an exception to be suppressed, the AutoCloseable.close method should not throw it. Note that unlike the close method of Closeable, this close method is not required to be idempotent. In other words, calling this close method more than once may have some visible side effect, unlike Closeable.close which is required to have no effect if called more than once. However, implementers of this interface are strongly encouraged to make their close methods idempotent." This last sentence explains why java.sql.Connection is not required to be idempotent (see below). So I think another option here is to leave close as is (it waits for merges, commits) except, if an exception is thrown, it suppresses that exception, finishes closing, and then throws it. Like IOUtils.close. I am fine with that, only that it must record suppressed Exceptions (unfortunately the IOUtils in Lucene is in most cases used not in the optimal way, so the supressed exceptions get lost). Hmm... does java.io.Closeable document semantics on exception? Ie if I call RAF.close and hit an exception, is it "really closed"? It does document that calling it more than once is fine ... It says that you may call close() multiple times, but later calls mst be side-effect free, so after the first call to close it must be closed (otherwise later calls would have side-effects), although an IOException is thrown.
        Hide
        Shai Erera added a comment -

        Shai objected to how verbose this is for "normal" usage, and I agree,
        so I proposed adding a sugar mehod "commitAndClose" that would just
        call waitForMerges(), commit(), close(). I was hoping to hear from
        Shai whether that's an OK compromise...

        I am ok with commitAndClose. I think it will be valuable for simple apps, not to mention our tests. Complex apps can probably add 2-3 more lines of code to their "close" logic.

        Also, I think that the separation to the different methods give apps more control over what happens on close(). I.e. today, app can choose between close(true) and close(false), both always commit and optionally wait for merges. With the separation, apps could distinguish between "I need to close in a hurry" to "I need to close quickly, but commit changes" to "I have time to close, so finish merges, commit and close". Maybe we should even give these recipes in IW#close or IW class javadocs.

        Show
        Shai Erera added a comment - Shai objected to how verbose this is for "normal" usage, and I agree, so I proposed adding a sugar mehod "commitAndClose" that would just call waitForMerges(), commit(), close(). I was hoping to hear from Shai whether that's an OK compromise... I am ok with commitAndClose. I think it will be valuable for simple apps, not to mention our tests. Complex apps can probably add 2-3 more lines of code to their "close" logic. Also, I think that the separation to the different methods give apps more control over what happens on close(). I.e. today, app can choose between close(true) and close(false), both always commit and optionally wait for merges. With the separation, apps could distinguish between "I need to close in a hurry" to "I need to close quickly, but commit changes" to "I have time to close, so finish merges, commit and close". Maybe we should even give these recipes in IW#close or IW class javadocs.
        Hide
        Yonik Seeley added a comment -

        I'd suggest that close() should remain "commit and close" since it's been that way forever and certainly doesn't make sense to change in 4x, esp since if you go back far enough the only way to commit was to call close(). Many older apps that have been upgraded will start breaking. IMO, the mistake was introduced when close() became partial (i.e. if it throws an exception it doesn't fully clean up). That's the only thing that really needs fixing here.

        We already have rollback for "close immediately even if it means dropping uncommitted changes".

        Show
        Yonik Seeley added a comment - I'd suggest that close() should remain "commit and close" since it's been that way forever and certainly doesn't make sense to change in 4x, esp since if you go back far enough the only way to commit was to call close(). Many older apps that have been upgraded will start breaking. IMO, the mistake was introduced when close() became partial (i.e. if it throws an exception it doesn't fully clean up). That's the only thing that really needs fixing here. We already have rollback for "close immediately even if it means dropping uncommitted changes".
        Hide
        Steve Rowe added a comment -

        I'd like to push this to 4.2. Any objections?

        Show
        Steve Rowe added a comment - I'd like to push this to 4.2. Any objections?
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Simon Willnauer added a comment -

        I want to revisit this for 4.8! We recently fixed rollback and given what rollback looks like now I think it should actually be called close since this is what it does it flushes all stuff and shuts IW down. IMO that is exactly what I would expect close to do. Yet, the simplest solution to this issue would be rename close to commitAndClose() and rollback to close() and we are good to go. I guess it's not going to be that simple but after I ran into the issue today that somebody used a NoMergeScheduler and IW#close() was waiting forever right here:

        - timed waiting on org.apache.lucene.index.IndexWriter@439740e3
        	at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4366)
        	at org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2289)
        	at org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:2273)
        	at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1015)
        	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:932)
        

        I am even more convinced that we have to fix this soon though.

        Show
        Simon Willnauer added a comment - I want to revisit this for 4.8! We recently fixed rollback and given what rollback looks like now I think it should actually be called close since this is what it does it flushes all stuff and shuts IW down. IMO that is exactly what I would expect close to do. Yet, the simplest solution to this issue would be rename close to commitAndClose() and rollback to close() and we are good to go. I guess it's not going to be that simple but after I ran into the issue today that somebody used a NoMergeScheduler and IW#close() was waiting forever right here: - timed waiting on org.apache.lucene.index.IndexWriter@439740e3 at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4366) at org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2289) at org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:2273) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1015) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:932) I am even more convinced that we have to fix this soon though.
        Hide
        Shai Erera added a comment -

        I ran into the issue today that somebody used a NoMergeScheduler and IW#close() was waiting forever

        I've been struck by that too!! and therefore I stopped using NoMergeScheduler, and I use NoMergePolicy instead.

        We recently fixed rollback and given what rollback looks like now I think it should actually be called close since this is what it does it flushes all stuff and shuts IW down

        I think we should separate rollback()/commit() from close(). When you call close, you should not silently lose changes – you have to be explicit about losing them (rollback()) or committing them (commit()). And we should definitely not break existing apps who call close() relying on committing stuff. So if we:

        • Detect in close() that you have uncommitted changes or running merges and throw an exception
        • Let you separately call commit/rollback, waitForMerges/abortMerges and close
        • Add a sugar commitAndClose (which waits for merges? or takes a boolean?)

        Then I think we can make close() just close and releases resources, including the lock, at all costs (basically what was done just recently). We give existing apps a very easy migration path – call commitAndClose(). And we let more expert apps to decide what to do about merges and pending changes separately from each other and closing the writer.

        Show
        Shai Erera added a comment - I ran into the issue today that somebody used a NoMergeScheduler and IW#close() was waiting forever I've been struck by that too!! and therefore I stopped using NoMergeScheduler, and I use NoMergePolicy instead. We recently fixed rollback and given what rollback looks like now I think it should actually be called close since this is what it does it flushes all stuff and shuts IW down I think we should separate rollback()/commit() from close(). When you call close, you should not silently lose changes – you have to be explicit about losing them (rollback()) or committing them (commit()). And we should definitely not break existing apps who call close() relying on committing stuff. So if we: Detect in close() that you have uncommitted changes or running merges and throw an exception Let you separately call commit/rollback, waitForMerges/abortMerges and close Add a sugar commitAndClose (which waits for merges? or takes a boolean?) Then I think we can make close() just close and releases resources, including the lock, at all costs (basically what was done just recently). We give existing apps a very easy migration path – call commitAndClose(). And we let more expert apps to decide what to do about merges and pending changes separately from each other and closing the writer.
        Hide
        Michael McCandless added a comment -

        I want to revisit this for 4.8!

        +1 to have close "really close nomatter what", just like rollback "really rollsback nomatter what" (throw first exc), except I think the user must separately (explicitly) discard pending changes / abort running merges.

        Show
        Michael McCandless added a comment - I want to revisit this for 4.8! +1 to have close "really close nomatter what", just like rollback "really rollsback nomatter what" (throw first exc), except I think the user must separately (explicitly) discard pending changes / abort running merges.
        Hide
        Michael McCandless added a comment -

        Here's a patch with hopefully a workable compromise:

        • IW.close now just calls rollback, with the one difference that if
          1) there were uncommitted changes or still-running merges, and 2)
          IWC's matchVersion is < 5.0, it throws an exception saying so,
          after the close has finished; else, the changes are silently
          dropped. So the IW is closed, but you'll hit an exc explaining
          that you lost stuff... I think this will notify apps that are
          upgrading from 4.x that IW.close() semantics have changed.
        • I added IW.shutdown sugar method, which just flushes, finishes
          merges, commits, closes.

        I also found & fixed & added a test case for a pre-existing issue,
        where DocumentsWriter.numDocsInRAM could become negative, causing
        IW.hasUncommittedChanges to forever return true even right after you
        commit.

        And, I removed IW.closeInternal.

        Show
        Michael McCandless added a comment - Here's a patch with hopefully a workable compromise: IW.close now just calls rollback, with the one difference that if 1) there were uncommitted changes or still-running merges, and 2) IWC's matchVersion is < 5.0, it throws an exception saying so, after the close has finished; else, the changes are silently dropped. So the IW is closed, but you'll hit an exc explaining that you lost stuff... I think this will notify apps that are upgrading from 4.x that IW.close() semantics have changed. I added IW.shutdown sugar method, which just flushes, finishes merges, commits, closes. I also found & fixed & added a test case for a pre-existing issue, where DocumentsWriter.numDocsInRAM could become negative, causing IW.hasUncommittedChanges to forever return true even right after you commit. And, I removed IW.closeInternal.
        Hide
        Simon Willnauer added a comment -

        hey mike I like the compromise we are using here! I have a few comments on this "small" patch

        • In IW#shutdown should we do commit() and close in a try finally manner or rollback if commit throws an exception?
        • I don't think we should use a string to message if we have uncommitted can you use a boolean for this?
        • the pendingCommit check in IW#shutdown() is just double safety / fail early logic right ?
        • so IW#shutdown will wait for merges by default if you don't want this you have to do that yourself, call abort merges, commit & close?

        I think this change clarifies the semantics a lot! Thanks for working on it.

        Show
        Simon Willnauer added a comment - hey mike I like the compromise we are using here! I have a few comments on this "small" patch In IW#shutdown should we do commit() and close in a try finally manner or rollback if commit throws an exception? I don't think we should use a string to message if we have uncommitted can you use a boolean for this? the pendingCommit check in IW#shutdown() is just double safety / fail early logic right ? so IW#shutdown will wait for merges by default if you don't want this you have to do that yourself, call abort merges, commit & close? I think this change clarifies the semantics a lot! Thanks for working on it.
        Hide
        Michael McCandless added a comment -

        Thanks Simon.

        In IW#shutdown should we do commit() and close in a try finally manner or rollback if commit throws an exception?

        OK I'll do that; this way, on exception, the writer is closed and you may have lost stuff.

        I don't think we should use a string to message if we have uncommitted can you use a boolean for this?

        OK I'll cutover to boolean and just throw a generic "pending changes or running merges" message, i.e. I won't distinguish the two.

        the pendingCommit check in IW#shutdown() is just double safety / fail early logic right ?

        Actually, this is a very important check, because otherwise you can silently lose changes on shutdown; see LUCENE-3872. Basically, if you call prepareCommit, then index a bunch of docs, then call shutdown, those docs will be lost because prepareCommit took a point-in-time snapshot before you indexed the docs, and it's that snapshot that is committed by shutdown.

        so IW#shutdown will wait for merges by default if you don't want this you have to do that yourself, call abort merges, commit & close?

        Yeah, I think that's better than adding a boolean to shutdown?

        Show
        Michael McCandless added a comment - Thanks Simon. In IW#shutdown should we do commit() and close in a try finally manner or rollback if commit throws an exception? OK I'll do that; this way, on exception, the writer is closed and you may have lost stuff. I don't think we should use a string to message if we have uncommitted can you use a boolean for this? OK I'll cutover to boolean and just throw a generic "pending changes or running merges" message, i.e. I won't distinguish the two. the pendingCommit check in IW#shutdown() is just double safety / fail early logic right ? Actually, this is a very important check, because otherwise you can silently lose changes on shutdown; see LUCENE-3872 . Basically, if you call prepareCommit, then index a bunch of docs, then call shutdown, those docs will be lost because prepareCommit took a point-in-time snapshot before you indexed the docs, and it's that snapshot that is committed by shutdown. so IW#shutdown will wait for merges by default if you don't want this you have to do that yourself, call abort merges, commit & close? Yeah, I think that's better than adding a boolean to shutdown?
        Hide
        Simon Willnauer added a comment -

        Actually, this is a very important check, because otherwise you can silently lose changes on shutdown; see LUCENE-3872. Basically, if you call prepareCommit, then index a bunch of docs, then call shutdown, those docs will be lost because prepareCommit took a point-in-time snapshot before you indexed the docs, and it's that snapshot that is committed by shutdown.

        ok so this is actually why I ask... in that case I think adding the boolean to shutDown is a good idea since writing this yourself seems more complicated than just call flush(true, true) && abortMerges() && commit && close ?

        Show
        Simon Willnauer added a comment - Actually, this is a very important check, because otherwise you can silently lose changes on shutdown; see LUCENE-3872 . Basically, if you call prepareCommit, then index a bunch of docs, then call shutdown, those docs will be lost because prepareCommit took a point-in-time snapshot before you indexed the docs, and it's that snapshot that is committed by shutdown. ok so this is actually why I ask... in that case I think adding the boolean to shutDown is a good idea since writing this yourself seems more complicated than just call flush(true, true) && abortMerges() && commit && close ?
        Hide
        Michael McCandless added a comment -

        ok so this is actually why I ask... in that case I think adding the boolean to shutDown is a good idea since writing this yourself seems more complicated than just call flush(true, true) && abortMerges() && commit && close ?

        You mean add boolean waitForMerges? We could do that I guess. But app could also just do IW.abortMerges() then call shutdown? I was leaning to that option.

        Show
        Michael McCandless added a comment - ok so this is actually why I ask... in that case I think adding the boolean to shutDown is a good idea since writing this yourself seems more complicated than just call flush(true, true) && abortMerges() && commit && close ? You mean add boolean waitForMerges? We could do that I guess. But app could also just do IW.abortMerges() then call shutdown? I was leaning to that option.
        Hide
        Simon Willnauer added a comment -

        aaah you are right! nevermind that was too obvious I think we should document that on the shutdown method! Other than that +1 to commit. Patch looks good to me!

        Show
        Simon Willnauer added a comment - aaah you are right! nevermind that was too obvious I think we should document that on the shutdown method! Other than that +1 to commit. Patch looks good to me!
        Hide
        Michael McCandless added a comment -

        Actually, it's not so simple, because shutdown does a flush which could trigger a merge even if you aborted merges before ... so I'll add the boolean waitForMerges to shutdown.

        Show
        Michael McCandless added a comment - Actually, it's not so simple, because shutdown does a flush which could trigger a merge even if you aborted merges before ... so I'll add the boolean waitForMerges to shutdown.
        Hide
        Michael McCandless added a comment -

        New patch, folding in feedback. I think it's ready.

        Show
        Michael McCandless added a comment - New patch, folding in feedback. I think it's ready.
        Hide
        ASF subversion and git services added a comment -

        Commit 1585759 from mikemccand@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1585759 ]

        LUCENE-4246: fix IW.close to just close, even on exception

        Show
        ASF subversion and git services added a comment - Commit 1585759 from mikemccand@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1585759 ] LUCENE-4246 : fix IW.close to just close, even on exception
        Hide
        ASF subversion and git services added a comment -

        Commit 1585767 from mikemccand@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1585767 ]

        LUCENE-4246: fix test bug

        Show
        ASF subversion and git services added a comment - Commit 1585767 from mikemccand@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1585767 ] LUCENE-4246 : fix test bug
        Hide
        ASF subversion and git services added a comment -

        Commit 1626151 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1626151 ]

        LUCENE-4246: clarify IW.close change

        Show
        ASF subversion and git services added a comment - Commit 1626151 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1626151 ] LUCENE-4246 : clarify IW.close change
        Hide
        ASF subversion and git services added a comment -

        Commit 1626152 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1626152 ]

        LUCENE-4246: clarify IW.close change

        Show
        ASF subversion and git services added a comment - Commit 1626152 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1626152 ] LUCENE-4246 : clarify IW.close change
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development