Lucene - Core
  1. Lucene - Core
  2. LUCENE-2960

Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. It would be great to be able to control that on a live IndexWriter. Other possible two methods that would be great to bring back are setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other setters can actually be set on the MergePolicy itself, so no need for setters for those (I think).

      1. LUCENE-2960.patch
        28 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Pulling back into 3.1 as blocker.

        Show
        Michael McCandless added a comment - Pulling back into 3.1 as blocker.
        Hide
        Michael McCandless added a comment -

        If you call IWC.setRAMBufferSizeMB even after the IW is init'd, this almost works; I think the only problem is that DocumentsWriter has a private copy of the RAM buffer size. We can simply fix that (it can pull from config, too), and then re-enable the test we used to have that asserts that live changes to the RAM buffer take effect.

        Show
        Michael McCandless added a comment - If you call IWC.setRAMBufferSizeMB even after the IW is init'd, this almost works; I think the only problem is that DocumentsWriter has a private copy of the RAM buffer size. We can simply fix that (it can pull from config, too), and then re-enable the test we used to have that asserts that live changes to the RAM buffer take effect.
        Hide
        Yonik Seeley added a comment -

        IMO, this is such an obscure usecase that we shouldn't introduce any extra complexity to implement it.
        For example, we should not guarantee exactly when a change to setRAMBufferSizeMB will be seen... that could introduce the need for extra synchronization / memory barriers.

        Show
        Yonik Seeley added a comment - IMO, this is such an obscure usecase that we shouldn't introduce any extra complexity to implement it. For example, we should not guarantee exactly when a change to setRAMBufferSizeMB will be seen... that could introduce the need for extra synchronization / memory barriers.
        Hide
        Robert Muir added a comment -

        my opinion (thanks for setting to 3.1) is solely based on how we do releases:

        I just don't think we should deprecate things in 3.1, then un-deprecate in 3.2...

        Really this is our mistake, we shouldn't have committed IWC and left TestIndexWriter.testChangingRamBufferSize or whatever using the deprecated setter without hashing this out further: because this test would fail if it tried to use IWC.

        Show
        Robert Muir added a comment - my opinion (thanks for setting to 3.1) is solely based on how we do releases: I just don't think we should deprecate things in 3.1, then un-deprecate in 3.2... Really this is our mistake, we shouldn't have committed IWC and left TestIndexWriter.testChangingRamBufferSize or whatever using the deprecated setter without hashing this out further: because this test would fail if it tried to use IWC.
        Hide
        Michael McCandless added a comment -

        IMO, this is such an obscure usecase that we shouldn't introduce any extra complexity to implement it.

        That this comes from Shay makes it not an "obscure" use case, IMO.

        Ie, Shay (Elastic Search) is doing awesome things with Lucene, so if some change in Lucene is adversely impacting Elastic Search, we should listen.

        We already don't hear from Shay nearly as often as we should

        For example, we should not guarantee exactly when a change to setRAMBufferSizeMB will be seen... that could introduce the need for extra synchronization / memory barriers.

        That's really an impl. detail. Ie, what really matters is whether or not this change results in any real change to index throughput. If a volatile read per indexed doc (w/ rare volatile write) really costs too much (unlikely) then we can make this "best effort".

        Show
        Michael McCandless added a comment - IMO, this is such an obscure usecase that we shouldn't introduce any extra complexity to implement it. That this comes from Shay makes it not an "obscure" use case, IMO. Ie, Shay (Elastic Search) is doing awesome things with Lucene, so if some change in Lucene is adversely impacting Elastic Search, we should listen. We already don't hear from Shay nearly as often as we should For example, we should not guarantee exactly when a change to setRAMBufferSizeMB will be seen... that could introduce the need for extra synchronization / memory barriers. That's really an impl. detail. Ie, what really matters is whether or not this change results in any real change to index throughput. If a volatile read per indexed doc (w/ rare volatile write) really costs too much (unlikely) then we can make this "best effort".
        Hide
        Michael McCandless added a comment -

        Really this is our mistake, we shouldn't have committed IWC and left TestIndexWriter.testChangingRamBufferSize or whatever using the deprecated setter without hashing this out further: because this test would fail if it tried to use IWC.

        I agree. Really we should not commit API changes w/o also cutting over all but a few (so we still test the deprecated path) usages of the deprecated API to the new one (and addressing whatever issues that uncovers).

        Show
        Michael McCandless added a comment - Really this is our mistake, we shouldn't have committed IWC and left TestIndexWriter.testChangingRamBufferSize or whatever using the deprecated setter without hashing this out further: because this test would fail if it tried to use IWC. I agree. Really we should not commit API changes w/o also cutting over all but a few (so we still test the deprecated path) usages of the deprecated API to the new one (and addressing whatever issues that uncovers).
        Hide
        Uwe Schindler added a comment -

        For the depreciated path we should have backwards tests...

        Show
        Uwe Schindler added a comment - For the depreciated path we should have backwards tests...
        Hide
        Earwin Burrfoot added a comment -

        As I said on the list - if one needs to change IW config, he can always recreate IW with new settings.
        Such changes cannot happen often enough for recreation to affect indexing performance.

        The fact that you can change IW's behaviour post-construction by modifying unrelated IWC instance is frightening. IW should either make a private copy of IWC when constructing, or IWC should be made immutable.

        Show
        Earwin Burrfoot added a comment - As I said on the list - if one needs to change IW config, he can always recreate IW with new settings. Such changes cannot happen often enough for recreation to affect indexing performance. The fact that you can change IW's behaviour post-construction by modifying unrelated IWC instance is frightening. IW should either make a private copy of IWC when constructing, or IWC should be made immutable.
        Hide
        Michael McCandless added a comment -

        As I said on the list - if one needs to change IW config, he can always recreate IW with new settings.

        That's not really true, in general. If you have a large merge running
        then closing the IW can take an unpredictable amount of time. You
        could abort the merges on close, but that's obviously not great.

        Furthermore, closing the IW also forces you to commit, and I don't
        like tying changing of configuration to forcing a commit.

        In fact, it doesn't make sense to me to arbitrarily prevent settings
        from being live, just because we've factored out IWC as a separate
        class. Many of these settings were "naturally" live before the IWC
        cutover, and have no particular reason not to be (besides this API
        change).

        We could also rollback the IWC change. I'm not saying that's a great
        option, but, it should be on the table.

        InfoStream, for example, should remain live: eg, maybe I'm having
        trouble w/ optimize, so, I turn on infoStream and then call optimize.

        The flushing params (maxBufferedDocs/Deletes/RAM) should also remain
        live, since we have a very real user/data point (Shay) relying on
        this.

        But take MergedSegmentWarmer (used to be live but is now unchangeable).
        This is a setting that obviously can easily remain live; there's no
        technical reason for it not to be. So why should we force it to be
        unchangeable? That can only remove freedom, freedom that is perhaps
        valuable to an app somewhere.

        Show
        Michael McCandless added a comment - As I said on the list - if one needs to change IW config, he can always recreate IW with new settings. That's not really true, in general. If you have a large merge running then closing the IW can take an unpredictable amount of time. You could abort the merges on close, but that's obviously not great. Furthermore, closing the IW also forces you to commit, and I don't like tying changing of configuration to forcing a commit. In fact, it doesn't make sense to me to arbitrarily prevent settings from being live, just because we've factored out IWC as a separate class. Many of these settings were "naturally" live before the IWC cutover, and have no particular reason not to be (besides this API change). We could also rollback the IWC change. I'm not saying that's a great option, but, it should be on the table. InfoStream, for example, should remain live: eg, maybe I'm having trouble w/ optimize, so, I turn on infoStream and then call optimize. The flushing params (maxBufferedDocs/Deletes/RAM) should also remain live, since we have a very real user/data point (Shay) relying on this. But take MergedSegmentWarmer (used to be live but is now unchangeable). This is a setting that obviously can easily remain live; there's no technical reason for it not to be. So why should we force it to be unchangeable? That can only remove freedom, freedom that is perhaps valuable to an app somewhere.
        Hide
        Yonik Seeley added a comment -

        InfoStream, for example, should remain live

        Agree - it's logging.

        But take MergedSegmentWarmer (used to be live but is now unchangeable). This is a setting that obviously can easily remain live; there's no technical reason for it not to be.

        Anyone's implementation can be live (i.e. the impl could change it's behavior over time for whatever reason).

        Show
        Yonik Seeley added a comment - InfoStream, for example, should remain live Agree - it's logging. But take MergedSegmentWarmer (used to be live but is now unchangeable). This is a setting that obviously can easily remain live; there's no technical reason for it not to be. Anyone's implementation can be live (i.e. the impl could change it's behavior over time for whatever reason).
        Hide
        Michael McCandless added a comment -

        Anyone's implementation can be live (i.e. the impl could change it's behavior over time for whatever reason).

        Well, that's really cheating. I mean, yes, technically it's an out, so
        it's certainly possible that an app can do the switching inside its
        class... but that's not nice

        EG if an app has LoadsAllDocsWarmer and VisitsAllPostingsWarmer (say)
        and they want to switch between (for some reason)... they'd like have
        to make a SegmentWarmerSwitcher class or something.

        Seems silly because IW could care less if you switch up your warmer.
        It just needs to get the current warmer every time it goes and warms
        a segment...

        Show
        Michael McCandless added a comment - Anyone's implementation can be live (i.e. the impl could change it's behavior over time for whatever reason). Well, that's really cheating. I mean, yes, technically it's an out, so it's certainly possible that an app can do the switching inside its class... but that's not nice EG if an app has LoadsAllDocsWarmer and VisitsAllPostingsWarmer (say) and they want to switch between (for some reason)... they'd like have to make a SegmentWarmerSwitcher class or something. Seems silly because IW could care less if you switch up your warmer. It just needs to get the current warmer every time it goes and warms a segment...
        Hide
        Earwin Burrfoot added a comment -

        Furthermore, closing the IW also forces you to commit, and I don't like tying changing of configuration to forcing a commit.

        Like I said, one isn't going to change his configuration five times a second. It's ok to commit from time to time?

        So why should we force it to be unchangeable? That can only remove freedom, freedom that is perhaps valuable to an app somewhere.

        Each and every live reconfigurable setting adds to complexity.
        At the very least it requires proper synchronization. Take your SegmentWarmer example - you should make the field volatile.
        While it's possible to chicken out on primitive fields (except long/double), as Yonik mentioned earlier, making nonvolatile mutable references introduces you to a world of hard-to-catch unsafe publication bugs (yes, infoStream is currently broken!).
        For more complex cases, certain on-change logic is required. And then you have to support this logic across all possible code rewrites and refactorings.

        Show
        Earwin Burrfoot added a comment - Furthermore, closing the IW also forces you to commit, and I don't like tying changing of configuration to forcing a commit. Like I said, one isn't going to change his configuration five times a second. It's ok to commit from time to time? So why should we force it to be unchangeable? That can only remove freedom, freedom that is perhaps valuable to an app somewhere. Each and every live reconfigurable setting adds to complexity. At the very least it requires proper synchronization. Take your SegmentWarmer example - you should make the field volatile. While it's possible to chicken out on primitive fields ( except long/double ), as Yonik mentioned earlier, making nonvolatile mutable references introduces you to a world of hard-to-catch unsafe publication bugs (yes, infoStream is currently broken!). For more complex cases, certain on-change logic is required. And then you have to support this logic across all possible code rewrites and refactorings.
        Hide
        Shay Banon added a comment -

        Heya,

        If I had to choose between being able to change things in real time to better concurrency thanks to immutability, I would definitely go with better concurrency. I have no problems with closing the writers and reopening them, though, as Mike said, this can come with a big cost.

        The funny thing is that a lot of the setters that were already there on the IndexWriter are still exposed, basically, through settings on the relevant MergePolicy, so I don't think we are talking about that many setter to begin with (I don't think we should bring those back to the IndexWriter).

        I think that the notion of IWC is a good one, and should remain, but only to provide construction time parameters to IW. It should not be consulted once the construction phase of IW is done. If explicit real time parameters are to be set, then IW should expose it as a setter. Now, the question is which, if any, setters should be exposed.

        Going through the list of current setters on IW, my vote is for the setRAMBufferSizeMB one. I am not sure that its that obscure use case. I believe Solr for example has a notion of cores (or something like that), so it can also be adaptive in terms of indexing buffer size dependent on the number of cores running in the VM. Also, one can easily run a system where it does bulk indexing, and then lowers the indexing buffer size for more "streamline" work. Its just a shame to close the writer for that (and having to pause all indexing work while this happens).

        The term interval and divisor, I agree, are such obscure (funnily, I use the divisor quite a lot), that closing the writer and opening it again make sense.

        Show
        Shay Banon added a comment - Heya, If I had to choose between being able to change things in real time to better concurrency thanks to immutability, I would definitely go with better concurrency. I have no problems with closing the writers and reopening them, though, as Mike said, this can come with a big cost. The funny thing is that a lot of the setters that were already there on the IndexWriter are still exposed, basically, through settings on the relevant MergePolicy, so I don't think we are talking about that many setter to begin with (I don't think we should bring those back to the IndexWriter). I think that the notion of IWC is a good one, and should remain, but only to provide construction time parameters to IW. It should not be consulted once the construction phase of IW is done. If explicit real time parameters are to be set, then IW should expose it as a setter. Now, the question is which, if any, setters should be exposed. Going through the list of current setters on IW, my vote is for the setRAMBufferSizeMB one. I am not sure that its that obscure use case. I believe Solr for example has a notion of cores (or something like that), so it can also be adaptive in terms of indexing buffer size dependent on the number of cores running in the VM. Also, one can easily run a system where it does bulk indexing, and then lowers the indexing buffer size for more "streamline" work. Its just a shame to close the writer for that (and having to pause all indexing work while this happens). The term interval and divisor, I agree, are such obscure (funnily, I use the divisor quite a lot), that closing the writer and opening it again make sense.
        Hide
        Michael McCandless added a comment -

        If I had to choose between being able to change things in real time to better concurrency thanks to immutability, I would definitely go with better concurrency.

        We're not talking about better concurrency here – making these fields
        volatile will be in the noise.

        If ever it were not in the noise, I agree better concurrency should
        win.

        I don't think we are talking about that many setter to begin with (I don't think we should bring those back to the IndexWriter).

        True, but we are setting a precedent here. Ie this will apply to
        further settings we add to IWC, apply to any other classes that we
        pull config out of (eg IR), etc. I don't like that factoring out
        config means loss of functionality.

        It should not be consulted once the construction phase of IW is done.

        Why such purity? What do we gain?

        I'm all for purity, but only if it doesn't interfere w/ functionality.
        Here, it's taking away freedom...

        I would prefer to have the config you passed into IW remain "live".
        You can also do IW.getConfig().setXXX later too.

        In fact it should be fine to share an IWC across multiple writers; you
        can change the RAM buffer for all of them at once.

        If explicit real time parameters are to be set, then IW should expose it as a setter.

        But then whenever we change our mind about liveness, we have to change
        the API? I'd like to decouple liveness of a setter (really a semantic
        aspect of that API, documented in the jdocs) from which API is used to
        set it.

        I think a config param should be live by default and only if there's
        some hardship / reason to not have it so, should we make an
        exception. Most of these params are trivial to be live (they were
        before the IWC change).

        making nonvolatile mutable references introduces you to a world of hard-to-catch unsafe publication bugs (yes, infoStream is currently broken!).

        Well, in theory, yes, in practice, no. This is like stating your HTML is
        buggy because it fails to put a closing </html> tag and so some
        browser could fail to render it

        I doubt there's any JVM out there where our lack-of-volatile
        infoStream causes any problems.

        But, of course, we should make them volatile to be safe...

        Each and every live reconfigurable setting adds to complexity.

        That's the exception not the rule. Most of these settings are used at
        certain times – on flushing a new seg, on warming a seg, etc. – and
        there's no added complexity to simply pulling their current value from
        the config.

        For more complex cases, certain on-change logic is required.

        For such cases we can state that changes to the config do not take
        effect; eg IndexingChain is a good example I think. But I think the
        default should be that changes are live, unless otherwise stated

        I don't think we should go out of our way to be making settings live,
        if there's any hair involved. But for the settings where there's no
        hair involved, they should be live.

        Show
        Michael McCandless added a comment - If I had to choose between being able to change things in real time to better concurrency thanks to immutability, I would definitely go with better concurrency. We're not talking about better concurrency here – making these fields volatile will be in the noise. If ever it were not in the noise, I agree better concurrency should win. I don't think we are talking about that many setter to begin with (I don't think we should bring those back to the IndexWriter). True, but we are setting a precedent here. Ie this will apply to further settings we add to IWC, apply to any other classes that we pull config out of (eg IR), etc. I don't like that factoring out config means loss of functionality. It should not be consulted once the construction phase of IW is done. Why such purity? What do we gain? I'm all for purity, but only if it doesn't interfere w/ functionality. Here, it's taking away freedom... I would prefer to have the config you passed into IW remain "live". You can also do IW.getConfig().setXXX later too. In fact it should be fine to share an IWC across multiple writers; you can change the RAM buffer for all of them at once. If explicit real time parameters are to be set, then IW should expose it as a setter. But then whenever we change our mind about liveness, we have to change the API? I'd like to decouple liveness of a setter (really a semantic aspect of that API, documented in the jdocs) from which API is used to set it. I think a config param should be live by default and only if there's some hardship / reason to not have it so, should we make an exception. Most of these params are trivial to be live (they were before the IWC change). making nonvolatile mutable references introduces you to a world of hard-to-catch unsafe publication bugs (yes, infoStream is currently broken!). Well, in theory, yes, in practice, no. This is like stating your HTML is buggy because it fails to put a closing </html> tag and so some browser could fail to render it I doubt there's any JVM out there where our lack-of-volatile infoStream causes any problems. But, of course, we should make them volatile to be safe... Each and every live reconfigurable setting adds to complexity. That's the exception not the rule. Most of these settings are used at certain times – on flushing a new seg, on warming a seg, etc. – and there's no added complexity to simply pulling their current value from the config. For more complex cases, certain on-change logic is required. For such cases we can state that changes to the config do not take effect; eg IndexingChain is a good example I think. But I think the default should be that changes are live, unless otherwise stated I don't think we should go out of our way to be making settings live, if there's any hair involved. But for the settings where there's no hair involved, they should be live.
        Hide
        Shay Banon added a comment -

        Just a note regarding the IWC and being able to consult it for live changes, it feels strange to me that settings something on the config will affect the IW in real time. Maybe its just me, but it feels nicer to have the "live" setters on IW compared to IWC.

        I also like the ability to decouple construction time configuration through IWC, and live settings through setters on IW. It is then very clear what can be set on construction time, and what can be set on a live IW. It also allows for compile time / static check for the code what can be changed at what lifecycle phase.

        Show
        Shay Banon added a comment - Just a note regarding the IWC and being able to consult it for live changes, it feels strange to me that settings something on the config will affect the IW in real time. Maybe its just me, but it feels nicer to have the "live" setters on IW compared to IWC. I also like the ability to decouple construction time configuration through IWC, and live settings through setters on IW. It is then very clear what can be set on construction time, and what can be set on a live IW. It also allows for compile time / static check for the code what can be changed at what lifecycle phase.
        Hide
        Earwin Burrfoot added a comment -

        Why such purity? What do we gain?

        I'm all for purity, but only if it doesn't interfere w/ functionality.
        Here, it's taking away freedom...

        We gain consistency and predictability. And there are a lot of freedoms dangerous for developers.

        In fact it should be fine to share an IWC across multiple writers; you
        can change the RAM buffer for all of them at once.

        You've brought up a purrfect example of how NOT to do things.
        This is called 'action at a distance' and is a damn bug. Very annoying one.
        I've thoroughly experienced it with previous major version of Apache HTTPClient - they had an API that suggested you can set per-request timeouts, while these were actually global for a single Client instance.
        I fried my brain trying to understand why the hell random user requests timeout at hundred times their intended duration.
        Oh! It was an occasional admin request changing the global.

        <irony> You know, you can actually instantiate some DateRangeFilter with a couple of Dates, and then change these Dates (they are writeable) before each request. Isn't it an exciting kind of programming freedom?
        Or, back to our current discussion - we can pass RAMBufferSizeMB as an AtomicDouble, instead of current double, then we can use .set() on an instance we passed, and have our live reconfigurability. What's more - AtomicDouble protects us from word tearing! </irony>

        I doubt there's any JVM out there where our lack-of-volatile infoStream causes any problems.

        Er.. While I have never personally witnessed unsynchronized long/double tearing,
        I've seen the consequence of unsafely publishing a HashMap - an endless loop on get().
        It happened on your run off the mill Sun 1.6 JVM.
        So the bug is there, lying in wait. Maybe nobody ever actually used the freedom to change infoStream in-flight, or the guy was lucky, or in his particular situation the field was guarded by some unrelated sync.

        While I see banishing live reconfiguration from IW as a lost cause, I ask to make IWC immutable at the very least. As Shay said - this will provide a clear barrier between mutable and immutable properties.

        Show
        Earwin Burrfoot added a comment - Why such purity? What do we gain? I'm all for purity, but only if it doesn't interfere w/ functionality. Here, it's taking away freedom... We gain consistency and predictability. And there are a lot of freedoms dangerous for developers. In fact it should be fine to share an IWC across multiple writers; you can change the RAM buffer for all of them at once. You've brought up a purrfect example of how NOT to do things. This is called 'action at a distance' and is a damn bug. Very annoying one. I've thoroughly experienced it with previous major version of Apache HTTPClient - they had an API that suggested you can set per-request timeouts, while these were actually global for a single Client instance. I fried my brain trying to understand why the hell random user requests timeout at hundred times their intended duration. Oh! It was an occasional admin request changing the global. <irony> You know, you can actually instantiate some DateRangeFilter with a couple of Dates, and then change these Dates (they are writeable) before each request. Isn't it an exciting kind of programming freedom? Or, back to our current discussion - we can pass RAMBufferSizeMB as an AtomicDouble, instead of current double, then we can use .set() on an instance we passed, and have our live reconfigurability. What's more - AtomicDouble protects us from word tearing! </irony> I doubt there's any JVM out there where our lack-of-volatile infoStream causes any problems. Er.. While I have never personally witnessed unsynchronized long/double tearing, I've seen the consequence of unsafely publishing a HashMap - an endless loop on get(). It happened on your run off the mill Sun 1.6 JVM. So the bug is there, lying in wait. Maybe nobody ever actually used the freedom to change infoStream in-flight, or the guy was lucky, or in his particular situation the field was guarded by some unrelated sync. While I see banishing live reconfiguration from IW as a lost cause, I ask to make IWC immutable at the very least. As Shay said - this will provide a clear barrier between mutable and immutable properties.
        Hide
        Mark Miller added a comment -

        Maybe its just me, but it feels nicer to have the "live" setters on IW compared to IWC.

        +1.

        I agree that live settings on IWC feels too much 'action at a distance'. It's not a great pattern.

        FWIW, live config on the IW and a static IWC feels right to me. Earwin makes a strong argument for dropping live altogether - though I'm still liking the idea of allowing it where it's easy and especially where it has existed.

        And I've always felt unsafe publication was scary...

        Show
        Mark Miller added a comment - Maybe its just me, but it feels nicer to have the "live" setters on IW compared to IWC. +1. I agree that live settings on IWC feels too much 'action at a distance'. It's not a great pattern. FWIW, live config on the IW and a static IWC feels right to me. Earwin makes a strong argument for dropping live altogether - though I'm still liking the idea of allowing it where it's easy and especially where it has existed. And I've always felt unsafe publication was scary...
        Hide
        Michael McCandless added a comment -

        Er.. While I have never personally witnessed unsynchronized long/double tearing,
        I've seen the consequence of unsafely publishing a HashMap - an endless loop on get().

        I've also seen JMM strike too – it caused one of our unit tests to
        spin forever, because a "volatile" was missing.

        But this will never impact rarely used fields (infoStream,
        termIndexInterval, segmentWarmer, etc.), in practice.

        Really we need an anal Java impl. (or, maybe, CPU) that randomly
        asserts its "rights" under JMM, to hold a cached copy of any field
        that's not volatile for unusual/random lengths of time (basically an
        "adversary" yet still playing by the JMM rules). Such an impl would
        find TONS of JMM bugs in Lucene (and I imagine any other Java
        app/library tested).

        Yet, no "real" Java impl out there will ever do this since doing so
        will simply make that Java impl appear buggy. (Well, and, it'd be bad
        for perf. – obviously the Java impl, CPU cache levels, should cache
        only frequently used things).

        It's exactly why all web browsers today are tolerant to a missing
        </html> tag and no browser could afford to suddenly refuse to render
        because you're missing the </html> tag.

        I'm not saying we shouldn't put in our </html> tags in Lucene; we
        definitely should... we have no choice. But, in practice, these
        missing </html> tags all throughout Lucene are not a problem.

        I ask to make IWC immutable at the very least

        IWC cannot be made immutable – you build it up incrementally (new
        IWC(...).setThis(...).setThat(...)). Its fields cannot be
        final. (Well, one field can and is: analyzer).

        How about this as a compromise: IW continues cloning the incoming IWC
        on init, as it does today. This means any changes to the IWC instance
        you passed to IW will have no effect on IW.

        But, if you want to change something live, you can
        IW.getConfig().setFoo(...). The config instance is a private clone to
        that IW.

        Show
        Michael McCandless added a comment - Er.. While I have never personally witnessed unsynchronized long/double tearing, I've seen the consequence of unsafely publishing a HashMap - an endless loop on get(). I've also seen JMM strike too – it caused one of our unit tests to spin forever, because a "volatile" was missing. But this will never impact rarely used fields (infoStream, termIndexInterval, segmentWarmer, etc.), in practice. Really we need an anal Java impl. (or, maybe, CPU) that randomly asserts its "rights" under JMM, to hold a cached copy of any field that's not volatile for unusual/random lengths of time (basically an "adversary" yet still playing by the JMM rules). Such an impl would find TONS of JMM bugs in Lucene (and I imagine any other Java app/library tested). Yet, no "real" Java impl out there will ever do this since doing so will simply make that Java impl appear buggy. (Well, and, it'd be bad for perf. – obviously the Java impl, CPU cache levels, should cache only frequently used things). It's exactly why all web browsers today are tolerant to a missing </html> tag and no browser could afford to suddenly refuse to render because you're missing the </html> tag. I'm not saying we shouldn't put in our </html> tags in Lucene; we definitely should... we have no choice. But, in practice, these missing </html> tags all throughout Lucene are not a problem. I ask to make IWC immutable at the very least IWC cannot be made immutable – you build it up incrementally (new IWC(...).setThis(...).setThat(...)). Its fields cannot be final. (Well, one field can and is: analyzer). How about this as a compromise: IW continues cloning the incoming IWC on init, as it does today. This means any changes to the IWC instance you passed to IW will have no effect on IW. But, if you want to change something live, you can IW.getConfig().setFoo(...). The config instance is a private clone to that IW.
        Hide
        Yonik Seeley added a comment -

        How about this as a compromise: IW continues cloning the incoming IWC on init, as it does today. This means any changes to the IWC instance you passed to IW will have no effect on IW.

        +1

        Show
        Yonik Seeley added a comment - How about this as a compromise: IW continues cloning the incoming IWC on init, as it does today. This means any changes to the IWC instance you passed to IW will have no effect on IW. +1
        Hide
        Michael McCandless added a comment -

        Patch, implementing the proposed compromise.

        I brought back the two prior unit tests, verifying changes to ram buffer take effect live; they pass.

        Show
        Michael McCandless added a comment - Patch, implementing the proposed compromise. I brought back the two prior unit tests, verifying changes to ram buffer take effect live; they pass.
        Hide
        Yonik Seeley added a comment -

        Hmmm, infoStream is just for debugging... should we really make it volatile?
        Although volatile reads are cheap on current x86, they do prevent optimizations/reorderings across them.
        Too many volatile reads in inner loops can add up.

        infoStream is a PrintStream, which synchronizes anyway, so it should be safe to omit the volatile.

        Show
        Yonik Seeley added a comment - Hmmm, infoStream is just for debugging... should we really make it volatile? Although volatile reads are cheap on current x86, they do prevent optimizations/reorderings across them. Too many volatile reads in inner loops can add up. infoStream is a PrintStream, which synchronizes anyway, so it should be safe to omit the volatile.
        Hide
        Earwin Burrfoot added a comment -

        infoStream is a PrintStream, which synchronizes anyway, so it should be safe to omit the volatile

        You're absolutely right here.

        Yet, no "real" Java impl out there will ever do this since doing so will simply make that Java impl appear buggy.

        Sorry, but "real" Java impls do this. The case with endless get() happened on a map that was never modified after being created and set. Just one of the many JVM instances on many machines got unlucky after restart.

        Well, and, it'd be bad for perf. – obviously the Java impl, CPU cache levels, should cache only frequently used things

        Java impls don't cache things. They do reorderings, they also keep final fields on registers, omitting reloads that happen for non-final ones, but no caching in JMM-related cases. Caching here is done by CPU, and it caches all data read from memory.

        IWC cannot be made immutable – you build it up incrementally (new IWC(...).setThis(...).setThat(...)). Its fields cannot be final.

        Setters can return modified immutable copy of 'this'. So you get both incremental building and immutability.

        How about this as a compromise: IW continues cloning the incoming IWC on init, as it does today. This means any changes to the IWC instance you passed to IW will have no effect on IW.

        What about earlier compromise mentioned by Shay, Mark, me? Keep setters for 'live' properties on IW.
        This clearly draws the line, and you don't have to consult Javadocs for each and every setting to know if you can change it live or not.

        Show
        Earwin Burrfoot added a comment - infoStream is a PrintStream, which synchronizes anyway, so it should be safe to omit the volatile You're absolutely right here. Yet, no "real" Java impl out there will ever do this since doing so will simply make that Java impl appear buggy. Sorry, but "real" Java impls do this. The case with endless get() happened on a map that was never modified after being created and set. Just one of the many JVM instances on many machines got unlucky after restart. Well, and, it'd be bad for perf. – obviously the Java impl, CPU cache levels, should cache only frequently used things Java impls don't cache things. They do reorderings, they also keep final fields on registers, omitting reloads that happen for non-final ones, but no caching in JMM-related cases. Caching here is done by CPU, and it caches all data read from memory. IWC cannot be made immutable – you build it up incrementally (new IWC(...).setThis(...).setThat(...)). Its fields cannot be final. Setters can return modified immutable copy of 'this'. So you get both incremental building and immutability. How about this as a compromise: IW continues cloning the incoming IWC on init, as it does today. This means any changes to the IWC instance you passed to IW will have no effect on IW. What about earlier compromise mentioned by Shay, Mark, me? Keep setters for 'live' properties on IW. This clearly draws the line, and you don't have to consult Javadocs for each and every setting to know if you can change it live or not.
        Hide
        Michael McCandless added a comment -

        Hmmm, infoStream is just for debugging... should we really make it volatile?

        I'll remove its volatile...

        IWC cannot be made immutable – you build it up incrementally (new IWC(...).setThis(...).setThat(...)). Its fields cannot be final.

        Setters can return modified immutable copy of 'this'. So you get both incremental building and immutability.

        Oh yeah. But then we'd clone the full IWC on every set... this seems
        like overkill in the name of "purity".

        What about earlier compromise mentioned by Shay, Mark, me? Keep setters for 'live' properties on IW.
        This clearly draws the line, and you don't have to consult Javadocs for each and every setting to know if you can change it live or not.

        I really don't like that this approach would split IW configuration
        into two places. Like you look at the javadocs for IWC and think that
        you cannot change the RAM buffer size.

        IWC should be the one place you go to see which settings you can
        change about the IW. That some of these settings take effect "live"
        while others do not is really an orthogonal (and I think, secondary,
        ie handled fine w/ jdocs) aspect/concern.

        Show
        Michael McCandless added a comment - Hmmm, infoStream is just for debugging... should we really make it volatile? I'll remove its volatile... IWC cannot be made immutable – you build it up incrementally (new IWC(...).setThis(...).setThat(...)). Its fields cannot be final. Setters can return modified immutable copy of 'this'. So you get both incremental building and immutability. Oh yeah. But then we'd clone the full IWC on every set... this seems like overkill in the name of "purity". What about earlier compromise mentioned by Shay, Mark, me? Keep setters for 'live' properties on IW. This clearly draws the line, and you don't have to consult Javadocs for each and every setting to know if you can change it live or not. I really don't like that this approach would split IW configuration into two places. Like you look at the javadocs for IWC and think that you cannot change the RAM buffer size. IWC should be the one place you go to see which settings you can change about the IW. That some of these settings take effect "live" while others do not is really an orthogonal (and I think, secondary, ie handled fine w/ jdocs) aspect/concern.
        Hide
        Mark Miller added a comment -

        I really don't like that this approach would split IW configuration
        into two places. Like you look at the javadocs for IWC and think that
        you cannot change the RAM buffer size.

        IWC should be the one place you go to see which settings you can
        change about the IW. That some of these settings take effect "live"
        while others do not is really an orthogonal (and I think, secondary,
        ie handled fine w/ jdocs) aspect/concern.

        You can just as easily argue that the javadocs for IWC could explain that live settings are on the IW.

        That pattern just smells wrong.

        But, if you want to change something live, you can
        IW.getConfig().setFoo(...). The config instance is a private clone to
        that IW.

        This is better than nothing.

        Another thought is to offer all settings on the IWC for init convenience and exposure and then add javadoc about updaters on IW for those settings that can be changed on the fly - or one update method and enums...

        Show
        Mark Miller added a comment - I really don't like that this approach would split IW configuration into two places. Like you look at the javadocs for IWC and think that you cannot change the RAM buffer size. IWC should be the one place you go to see which settings you can change about the IW. That some of these settings take effect "live" while others do not is really an orthogonal (and I think, secondary, ie handled fine w/ jdocs) aspect/concern. You can just as easily argue that the javadocs for IWC could explain that live settings are on the IW. That pattern just smells wrong. But, if you want to change something live, you can IW.getConfig().setFoo(...). The config instance is a private clone to that IW. This is better than nothing. Another thought is to offer all settings on the IWC for init convenience and exposure and then add javadoc about updaters on IW for those settings that can be changed on the fly - or one update method and enums...
        Hide
        Steve Rowe added a comment -

        How about an IWC base class, extended by IWCinit and IWClive. IWCinit has setters for everything, and IW.getConfig() returns IWClive, which has no setters for things you can't set on the fly.

        Show
        Steve Rowe added a comment - How about an IWC base class, extended by IWCinit and IWClive. IWCinit has setters for everything, and IW.getConfig() returns IWClive, which has no setters for things you can't set on the fly.
        Hide
        Earwin Burrfoot added a comment -

        Oh yeah. But then we'd clone the full IWC on every set... this seems like overkill in the name of "purity".

        So what? What exactly is overkill? Few wasted bytes and CPU ns for an object that's created a couple of times during application lifetime?
        There are also builders, which are very similar to what Steven is proposing.

        Another thought is to offer all settings on the IWC for init convenience and exposure and then add javadoc about updaters on IW for those settings that can be changed on the fly

        That's exactly how I'd like to see it.

        Show
        Earwin Burrfoot added a comment - Oh yeah. But then we'd clone the full IWC on every set... this seems like overkill in the name of "purity". So what? What exactly is overkill? Few wasted bytes and CPU ns for an object that's created a couple of times during application lifetime? There are also builders, which are very similar to what Steven is proposing. Another thought is to offer all settings on the IWC for init convenience and exposure and then add javadoc about updaters on IW for those settings that can be changed on the fly That's exactly how I'd like to see it.
        Hide
        Robert Muir added a comment -

        Its exactly the lack of consensus we see here, thats why I am 100% against having the setter approach.

        I'm totally against some deprecation/undeprecation loop because we in future releases another setting
        wants to be "live".

        It seems the only way we can avoid this, is for javadoc to be the only specification as to whether a setting
        does or does not take effect "live".

        Show
        Robert Muir added a comment - Its exactly the lack of consensus we see here, thats why I am 100% against having the setter approach. I'm totally against some deprecation/undeprecation loop because we in future releases another setting wants to be "live". It seems the only way we can avoid this, is for javadoc to be the only specification as to whether a setting does or does not take effect "live".
        Hide
        Earwin Burrfoot added a comment -

        You avoid deprecation/undeprecation and binary incompatibility, while incompatibly changing semantics. What do you win?

        Show
        Earwin Burrfoot added a comment - You avoid deprecation/undeprecation and binary incompatibility, while incompatibly changing semantics. What do you win?
        Hide
        Robert Muir added a comment -

        You win the fact that this is such an expert thing, and it should not confuse 99% of users who won't need to change these settings in a live way.

        This is a central API to using lucene, sorry i would rather see IWConfig be reverted completely than see this deprecation/undeprecation loop, it would just cause too much confusion.

        Show
        Robert Muir added a comment - You win the fact that this is such an expert thing, and it should not confuse 99% of users who won't need to change these settings in a live way. This is a central API to using lucene, sorry i would rather see IWConfig be reverted completely than see this deprecation/undeprecation loop, it would just cause too much confusion.
        Hide
        Steve Rowe added a comment -

        How about an IWC base class, extended by IWCinit and IWClive. IWCinit has setters for everything, and IW.getConfig() returns IWClive, which has no setters for things you can't set on the fly.

        I tried to implement this, but couldn't figure out a way to avoid code and javadoc duplication and/or separation for the live setters, which need to be on both the init and live versions. Duplication/separation of this sort would be begging for trouble. (The live setters can't be on the base class because the init and live versions would have to return different types to allow for proper chaining.)

        Show
        Steve Rowe added a comment - How about an IWC base class, extended by IWCinit and IWClive. IWCinit has setters for everything, and IW.getConfig() returns IWClive, which has no setters for things you can't set on the fly. I tried to implement this, but couldn't figure out a way to avoid code and javadoc duplication and/or separation for the live setters, which need to be on both the init and live versions. Duplication/separation of this sort would be begging for trouble. (The live setters can't be on the base class because the init and live versions would have to return different types to allow for proper chaining.)
        Hide
        Steve Rowe added a comment -

        How about an IWC base class, extended by IWCinit and IWClive. IWCinit has setters for everything, and IW.getConfig() returns IWClive, which has no setters for things you can't set on the fly.

        I tried to implement this, but couldn't figure out a way to avoid code and javadoc duplication and/or separation for the live setters, which need to be on both the init and live versions.

        An annotation processor that looks for @Live annotations on setters, then generates source for a LiveIWC class, an instance of which would be returned by IW.getConfig(), would solve the duplication/separation problem. No extension required: LiveIWC could forward all getters and the live setters to a cloned IWC.

        Show
        Steve Rowe added a comment - How about an IWC base class, extended by IWCinit and IWClive. IWCinit has setters for everything, and IW.getConfig() returns IWClive, which has no setters for things you can't set on the fly. I tried to implement this, but couldn't figure out a way to avoid code and javadoc duplication and/or separation for the live setters, which need to be on both the init and live versions. An annotation processor that looks for @Live annotations on setters, then generates source for a LiveIWC class, an instance of which would be returned by IW.getConfig(), would solve the duplication/separation problem. No extension required: LiveIWC could forward all getters and the live setters to a cloned IWC.
        Hide
        Michael McCandless added a comment -

        Oh yeah. But then we'd clone the full IWC on every set... this seems like overkill in the name of "purity".

        So what? What exactly is overkill? Few wasted bytes and CPU ns for an object that's created a couple of times during application lifetime?
        There are also builders, which are very similar to what Steven is proposing.

        I don't like that this'd be an O(N*M) cost API when you use it. Sure,
        N and M are tiny, and you use this API very rarely, but I still don't
        like it

        In addition, because this is all in the name of "purity" which as far
        as I can see has no real value besides "purity". It's... incestuous.
        And, I'm a pragmatist, I guess.

        An annotation processor that looks for @Live annotations on setters, then generates source for a LiveIWC class, an instance of which would be returned by IW.getConfig(), would solve the duplication/separation problem. No extension required: LiveIWC could forward all getters and the live setters to a cloned IWC.

        I think this is overkill? (Ie to have @Live compile to LiveIWC vs
        InitIWC). Though, @Live would be nice for jdocs?

        You win the fact that this is such an expert thing, and it should not confuse 99% of users who won't need to change these settings in a live way.

        Right – simple things should be simple and complex things should be
        possible.

        The current patch achieves this – the 99% of "simple" users that just
        want to config IW and create it find all of the config in one place.
        The 1% complex cases (need to change live settings) are able to do so,
        but must read the jdocs for each setter to verify it's supported. The
        API should be designed around the simple users not the complex ones,
        as the current patch does.

        So... I think the current patch is ready to commit (except, I'll
        remove the </html> tag for infoStream & defaultInfoStream).

        Show
        Michael McCandless added a comment - Oh yeah. But then we'd clone the full IWC on every set... this seems like overkill in the name of "purity". So what? What exactly is overkill? Few wasted bytes and CPU ns for an object that's created a couple of times during application lifetime? There are also builders, which are very similar to what Steven is proposing. I don't like that this'd be an O(N*M) cost API when you use it. Sure, N and M are tiny, and you use this API very rarely, but I still don't like it In addition, because this is all in the name of "purity" which as far as I can see has no real value besides "purity". It's... incestuous. And, I'm a pragmatist, I guess. An annotation processor that looks for @Live annotations on setters, then generates source for a LiveIWC class, an instance of which would be returned by IW.getConfig(), would solve the duplication/separation problem. No extension required: LiveIWC could forward all getters and the live setters to a cloned IWC. I think this is overkill? (Ie to have @Live compile to LiveIWC vs InitIWC). Though, @Live would be nice for jdocs? You win the fact that this is such an expert thing, and it should not confuse 99% of users who won't need to change these settings in a live way. Right – simple things should be simple and complex things should be possible. The current patch achieves this – the 99% of "simple" users that just want to config IW and create it find all of the config in one place. The 1% complex cases (need to change live settings) are able to do so, but must read the jdocs for each setter to verify it's supported. The API should be designed around the simple users not the complex ones, as the current patch does. So... I think the current patch is ready to commit (except, I'll remove the </html> tag for infoStream & defaultInfoStream).
        Hide
        Mark Miller added a comment -

        The current patch achieves this – the 99% of "simple" users that just
        want to config IW and create it find all of the config in one place.
        The 1% complex cases (need to change live settings) are able to do so,
        but must read the jdocs for each setter to verify it's supported.

        The proposed alternatives sound just as good as this? In the proposed compromises, the 99% of simple users will see have one place to config IW as well for the avg 'set up front' use case. Perhaps the complex users could also have an API with a better pattern and it doesn't have to be either or as you seem to lead...

        An IWC that is 'partially' live and can be changed externally after passing to the IW is just an inferior pattern plain and simple, and doesn't gain you much.

        The
        API should be designed around the simple users not the complex ones,
        as the current patch does.

        This really depends. If the simple users can be satisfied, and the API can also be decent for complex users, win/win.

        I guess I would place my bets that there will not be a ton of deprecations loops of settings wanting to be live.

        Show
        Mark Miller added a comment - The current patch achieves this – the 99% of "simple" users that just want to config IW and create it find all of the config in one place. The 1% complex cases (need to change live settings) are able to do so, but must read the jdocs for each setter to verify it's supported. The proposed alternatives sound just as good as this? In the proposed compromises, the 99% of simple users will see have one place to config IW as well for the avg 'set up front' use case. Perhaps the complex users could also have an API with a better pattern and it doesn't have to be either or as you seem to lead... An IWC that is 'partially' live and can be changed externally after passing to the IW is just an inferior pattern plain and simple, and doesn't gain you much. The API should be designed around the simple users not the complex ones, as the current patch does. This really depends. If the simple users can be satisfied, and the API can also be decent for complex users, win/win. I guess I would place my bets that there will not be a ton of deprecations loops of settings wanting to be live.
        Hide
        Mark Miller added a comment -

        Though don't take that I don't agree as a hold up to finishing 3.1.

        Show
        Mark Miller added a comment - Though don't take that I don't agree as a hold up to finishing 3.1.
        Hide
        Steve Rowe added a comment -

        Though don't take that I don't agree as a hold up to finishing 3.1.

        +1

        Show
        Steve Rowe added a comment - Though don't take that I don't agree as a hold up to finishing 3.1. +1
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Unassigned
            Reporter:
            Shay Banon
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development