Solr
  1. Solr
  2. SOLR-647

Do SolrCore.close() in a refcounted way

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.3
    • Component/s: None
    • Labels:
      None

      Description

      The method SolrCore.close() directly closes the core . It can cause Exceptions for in-flight requests. The close() method should just do a decrement on refcount and the actual close must happen when the last request being processed by that core instance is completed

      1. refcount_example.patch
        3 kB
        Yonik Seeley
      2. solr-647.patch
        41 kB
        Yonik Seeley
      3. solr-647.patch
        41 kB
        Yonik Seeley
      4. solr-647.patch
        41 kB
        Yonik Seeley
      5. solr-647.patch
        21 kB
        Henri Biestro
      6. solr-647.patch
        17 kB
        Henri Biestro
      7. solr-647.patch
        16 kB
        Henri Biestro
      8. solr-647.patch
        22 kB
        Henri Biestro
      9. solr-647.patch
        16 kB
        Henri Biestro
      10. solr-647.patch
        13 kB
        Henri Biestro
      11. solr-647.patch
        9 kB
        Henri Biestro
      12. solr-647.patch
        8 kB
        Henri Biestro
      13. solr-647.patch
        8 kB
        Henri Biestro
      14. SOLR-647.patch
        4 kB
        Noble Paul
      15. SOLR-647.patch
        4 kB
        Noble Paul

        Issue Links

          Activity

          Hide
          Shalin Shekhar Mangar added a comment -

          Yonik, what do you feel about adding to to 1.3? Although it's immediate use is in SOLR-561 but it is still an unsafe existing API call.

          Show
          Shalin Shekhar Mangar added a comment - Yonik, what do you feel about adding to to 1.3? Although it's immediate use is in SOLR-561 but it is still an unsafe existing API call.
          Hide
          Yonik Seeley added a comment -

          I took a quick peek.... looks like there are probably race conditions.
          A core could be obtained in thread A, then decRef() could be called in thread B that triggers a real close, then incRef() would be called in thread A (oops).

          Show
          Yonik Seeley added a comment - I took a quick peek.... looks like there are probably race conditions. A core could be obtained in thread A, then decRef() could be called in thread B that triggers a real close, then incRef() would be called in thread A (oops).
          Hide
          Noble Paul added a comment -

          I hope this fixes the race condition

          Show
          Noble Paul added a comment - I hope this fixes the race condition
          Hide
          Henri Biestro added a comment - - edited

          Looking at both versions of the patch, it seems you did not upload the intended one..

          Show
          Henri Biestro added a comment - - edited Looking at both versions of the patch, it seems you did not upload the intended one..
          Hide
          Henri Biestro added a comment -

          Being hit by the same core issue swapping/closing cores, here is another take at it.

          Show
          Henri Biestro added a comment - Being hit by the same core issue swapping/closing cores, here is another take at it.
          Hide
          Grant Ingersoll added a comment -

          Quick glance suggests it needs unit tests.

          Show
          Grant Ingersoll added a comment - Quick glance suggests it needs unit tests.
          Hide
          Henri Biestro added a comment -

          Will do (tomorrow); might be the opportunity to align EmbeddedSolrServer & SolrDispatchFilter behaviors.

          Show
          Henri Biestro added a comment - Will do (tomorrow); might be the opportunity to align EmbeddedSolrServer & SolrDispatchFilter behaviors.
          Hide
          Henri Biestro added a comment -

          A few modifications to make things (hopefully) a little clearer & tests (single & multi threaded).
          This new patch version solely deals with reference counting implementation (not its usage).
          SolrDispatchFilter/EmbeddedSolrServer should be patched through solr-545.
          CoreDescriptor (the reloadCore() method) might be better patched through solr-561 (this is easy to reintroduce if needed though).

          Show
          Henri Biestro added a comment - A few modifications to make things (hopefully) a little clearer & tests (single & multi threaded). This new patch version solely deals with reference counting implementation (not its usage). SolrDispatchFilter/EmbeddedSolrServer should be patched through solr-545. CoreDescriptor (the reloadCore() method) might be better patched through solr-561 (this is easy to reintroduce if needed though).
          Hide
          Henri Biestro added a comment -

          Hopefully last version of it.
          I goofed the MT test on the previous one.
          Re-added CoreDescriptor.reloadCore().
          Note that the same code is included in solr-545.

          Show
          Henri Biestro added a comment - Hopefully last version of it. I goofed the MT test on the previous one. Re-added CoreDescriptor.reloadCore(). Note that the same code is included in solr-545.
          Hide
          Henri Biestro added a comment -

          new version based on Yonik's comment in solr-545.

          Show
          Henri Biestro added a comment - new version based on Yonik's comment in solr-545.
          Hide
          Grant Ingersoll added a comment -

          This doesn't compile. The TimeUnit.MINUTES in SolrCoreTest is a 1.6 constant.

          Show
          Grant Ingersoll added a comment - This doesn't compile. The TimeUnit.MINUTES in SolrCoreTest is a 1.6 constant.
          Hide
          Grant Ingersoll added a comment -

          But, I changed it to awaitTermination(60, TimeUnit.SECONDS) and that's just fine...

          Show
          Grant Ingersoll added a comment - But, I changed it to awaitTermination(60, TimeUnit.SECONDS) and that's just fine...
          Hide
          Henri Biestro added a comment -

          fixed 1.6 dependency (sorry & thanks Grant);
          updated to trunk;
          fixed a bug in SolrDispatchFilter (Multicore.getAdminCore can return a null core);
          introduced calls to acquire/release instead and modified EmbeddedSolrServer (for completeness).

          Show
          Henri Biestro added a comment - fixed 1.6 dependency (sorry & thanks Grant); updated to trunk; fixed a bug in SolrDispatchFilter (Multicore.getAdminCore can return a null core); introduced calls to acquire/release instead and modified EmbeddedSolrServer (for completeness).
          Hide
          Grant Ingersoll added a comment -

          I'd like to see some more documentation on this. Namely, when should SolrCore.Reference be used, etc. This may just be for my own edification. For instance, what's the relationship in MultiCore between getCore and acquireCore and openCore? (Since openCore just calls acquireCore w/ a null reference, we probably should just call it acquireCore as well.)

          Also, when do would I call Reference.release() versus core.close()?

          How does CoreDescriptor.reloadCore factor in? I don't see that it is used. Why would I call that instead of MultiCore.reload()?

          Part of these questions may just be bigger questions related to Multicore all together.

          Show
          Grant Ingersoll added a comment - I'd like to see some more documentation on this. Namely, when should SolrCore.Reference be used, etc. This may just be for my own edification. For instance, what's the relationship in MultiCore between getCore and acquireCore and openCore? (Since openCore just calls acquireCore w/ a null reference, we probably should just call it acquireCore as well.) Also, when do would I call Reference.release() versus core.close()? How does CoreDescriptor.reloadCore factor in? I don't see that it is used. Why would I call that instead of MultiCore.reload()? Part of these questions may just be bigger questions related to Multicore all together.
          Hide
          Noble Paul added a comment -

          How does CoreDescriptor.reloadCore factor in? I don't see that it is used. Why would I call that instead of MultiCore.reload()?

          How else can I reload the core in a single core deployment?

          Show
          Noble Paul added a comment - How does CoreDescriptor.reloadCore factor in? I don't see that it is used. Why would I call that instead of MultiCore.reload()? How else can I reload the core in a single core deployment?
          Hide
          Henri Biestro added a comment -

          Grant;
          getCore() does not protect the core from being closed by another thread (it must either be protected by another mean or earlier or it does not matter).

          For the other cases, calls should be used in pairs with the following convention; if you call an "open", you should call a "close", if you call an "acquire", you should call a "release".

          openCore()/openAdminCore() do protect from the core from a // close and core.close() should be used to decrease the refcount.
          acquireCore()/acquireAdminCore() use a SolrCore.Reference and decreasing the refcount should be done through the ref via ref.release().

          The only difference between openCore()/close() or acquireCore()/release() is the usage of a reference which is just a convenience (at least to me: avoiding to test for null, assigning to final, etc).

          Hope this makes sense.

          Show
          Henri Biestro added a comment - Grant; getCore() does not protect the core from being closed by another thread (it must either be protected by another mean or earlier or it does not matter). For the other cases, calls should be used in pairs with the following convention; if you call an "open", you should call a "close", if you call an "acquire", you should call a "release". openCore()/openAdminCore() do protect from the core from a // close and core.close() should be used to decrease the refcount. acquireCore()/acquireAdminCore() use a SolrCore.Reference and decreasing the refcount should be done through the ref via ref.release(). The only difference between openCore()/close() or acquireCore()/release() is the usage of a reference which is just a convenience (at least to me: avoiding to test for null, assigning to final, etc). Hope this makes sense.
          Hide
          Grant Ingersoll added a comment -

          How else can I reload the core in a single core deployment?

          In email, I said, "OK, I'll buy that, maybe I just don't get Multicore/Single Core relationship. Also, reload is not used anywhere and we've never had the notion of reloading in the single core case, AFAICT."

          But, thinking about it some more, I guess I don't. The descriptor up until this patch, was merely a container for storing core information, hence the name. Now all of a sudden it has reload capabilities (but, it doesn't have open/close capabilities to go with it) so it just doesn't fit in my mind.

          Show
          Grant Ingersoll added a comment - How else can I reload the core in a single core deployment? In email, I said, "OK, I'll buy that, maybe I just don't get Multicore/Single Core relationship. Also, reload is not used anywhere and we've never had the notion of reloading in the single core case, AFAICT." But, thinking about it some more, I guess I don't. The descriptor up until this patch, was merely a container for storing core information, hence the name. Now all of a sudden it has reload capabilities (but, it doesn't have open/close capabilities to go with it) so it just doesn't fit in my mind.
          Hide
          Grant Ingersoll added a comment -

          Henri,

          At a minimum, all that should be documented, but I must admit to still being confused (but I'm no expert in the multicore stuff so, please bear with me)

          getCore() does not protect the core from being closed by another thread (it must either be protected by another mean or earlier or it does not matter).

          Yes, but there appears to be mixed uses, the MultiCoreHandler uses getCore() in places, while the SolrDispatchFilter uses the Reference. Isn't it possible for the core to then be closed behind the MultiCoreHandler?

          Show
          Grant Ingersoll added a comment - Henri, At a minimum, all that should be documented, but I must admit to still being confused (but I'm no expert in the multicore stuff so, please bear with me) getCore() does not protect the core from being closed by another thread (it must either be protected by another mean or earlier or it does not matter). Yes, but there appears to be mixed uses, the MultiCoreHandler uses getCore() in places, while the SolrDispatchFilter uses the Reference. Isn't it possible for the core to then be closed behind the MultiCoreHandler?
          Hide
          Henri Biestro added a comment - - edited

          Grant,

          At a minimum, all that should be documented

          I'll adapt the previous comments accordingly to make the Javadoc more obvious.

          the MultiCoreHandler uses getCore() in places, while the SolrDispatchFilter uses the Reference

          Good catch; the admin core (the one used to generate output) is protected but is this enough?
          The MultiCoreHandler only deals with CoreDescriptor (not SolrCore); it does not perform "real" queries although getStatus comes close to it.
          However, it would seem preferable to avoid manipulating the (multicore) cores map while we are are processing such requests.

          I'll fix it in the next upload.
          Thanks for reviewing this.

          Show
          Henri Biestro added a comment - - edited Grant, At a minimum, all that should be documented I'll adapt the previous comments accordingly to make the Javadoc more obvious. the MultiCoreHandler uses getCore() in places, while the SolrDispatchFilter uses the Reference Good catch; the admin core (the one used to generate output) is protected but is this enough? The MultiCoreHandler only deals with CoreDescriptor (not SolrCore); it does not perform "real" queries although getStatus comes close to it. However, it would seem preferable to avoid manipulating the (multicore) cores map while we are are processing such requests. I'll fix it in the next upload. Thanks for reviewing this.
          Hide
          Yonik Seeley added a comment -

          It seems like some simplifications could be made... see attached refcount_example.patch.

          I think we need to be able to describe in simple terms how the mechanism works:
          Cores are created with a reference count of 1 and put in multicore. As long as a core can
          be obtained from the map, it will have at least a reference count of 1. Thus, if we increment the core's reference during a time when we know that no other core is modifying multicore, we know we have a core that is safe from being asynchronously closed.

          To destroy a core, simply remove it from the map and call close on it.

          There are other issues that are still a bit muddled IMO... like the whole role of CoreDescriptor.

          Show
          Yonik Seeley added a comment - It seems like some simplifications could be made... see attached refcount_example.patch. I think we need to be able to describe in simple terms how the mechanism works: Cores are created with a reference count of 1 and put in multicore. As long as a core can be obtained from the map, it will have at least a reference count of 1. Thus, if we increment the core's reference during a time when we know that no other core is modifying multicore, we know we have a core that is safe from being asynchronously closed. To destroy a core, simply remove it from the map and call close on it. There are other issues that are still a bit muddled IMO... like the whole role of CoreDescriptor.
          Hide
          Noble Paul added a comment -

          This patch is very clear as to what it is doing (and simple).

          Show
          Noble Paul added a comment - This patch is very clear as to what it is doing (and simple).
          Hide
          Henri Biestro added a comment -

          My apologies to all for cluttering the issue.

          New simplified version based on Yonik's example for trunk 685913 (post solr-695 commit), dont let the patch size fool you:
          CoreContainer.getCore() & CoreContainer.getAdminCore() now return an incref-ed ("opened") core;
          core.close() must be called when these 2 have been used.
          SolrCore.Reference is gone.

          SolrCore.open() & SolrCore.close() method are kept because we can retrieve cores in 3 "close-unprotected" ways:
          1 - a call to CoreDescriptor.getCore(), descriptors that can be retrieved through CoreContainer.getCoreDescriptors())
          2 - a list of close-unprotected cores through CoreContainer.getCores().
          3 - SolrCore.getCore() - which is deprecated
          The first 2 can be used in a user-defined filter/servlet after the SolrDispatchFilter falls through the filter-chain, the CoreContainer being
          set as an attribute of the request ("org.apache.solr.CoreContainer").

          Because of this, we are not always synchronized by the CoreContainer#cores when we incref/decref.
          We can thus try to open() a core which is closed and cant use a simple refCount.incrementAndGet().
          Thus the refCount.get()/refCount.compareAndSet() pattern in both open & close.

          The TestHarness is modified to always create a CoreContainer that contains the "unnamed" core so testCoreMT uses
          CoreContainer.getCore("").
          Also touched some tests that were using SolrCore.getCore() when they can use the TestHarness core.

          Show
          Henri Biestro added a comment - My apologies to all for cluttering the issue. New simplified version based on Yonik's example for trunk 685913 (post solr-695 commit), dont let the patch size fool you: CoreContainer.getCore() & CoreContainer.getAdminCore() now return an incref-ed ("opened") core; core.close() must be called when these 2 have been used. SolrCore.Reference is gone. SolrCore.open() & SolrCore.close() method are kept because we can retrieve cores in 3 "close-unprotected" ways: 1 - a call to CoreDescriptor.getCore(), descriptors that can be retrieved through CoreContainer.getCoreDescriptors()) 2 - a list of close-unprotected cores through CoreContainer.getCores(). 3 - SolrCore.getCore() - which is deprecated The first 2 can be used in a user-defined filter/servlet after the SolrDispatchFilter falls through the filter-chain, the CoreContainer being set as an attribute of the request ("org.apache.solr.CoreContainer"). Because of this, we are not always synchronized by the CoreContainer#cores when we incref/decref. We can thus try to open() a core which is closed and cant use a simple refCount.incrementAndGet(). Thus the refCount.get()/refCount.compareAndSet() pattern in both open & close. The TestHarness is modified to always create a CoreContainer that contains the "unnamed" core so testCoreMT uses CoreContainer.getCore(""). Also touched some tests that were using SolrCore.getCore() when they can use the TestHarness core.
          Hide
          Noble Paul added a comment - - edited

          I fail to see the need for this code in SolrCore#close()

              int count;
              do {
                count = refCount.get();
                if (count <= 0)
                  return;
              } while (!refCount.compareAndSet(count, count - 1));
          

          If you do refCount.decrementAndGet() it should be fine.

          similarly in SolrCore$open() also

          you can do a refCount.incrementAndGet() and will be fine.
          What you have written is a duplication of code in AtomicInteger

          Even the RefCounted class does increment and decrement in the same way

          Yonik's patch does it simply and I do not see anything wrong with that.

          Show
          Noble Paul added a comment - - edited I fail to see the need for this code in SolrCore#close() int count; do { count = refCount.get(); if (count <= 0) return ; } while (!refCount.compareAndSet(count, count - 1)); If you do refCount.decrementAndGet() it should be fine. similarly in SolrCore$open() also you can do a refCount.incrementAndGet() and will be fine. What you have written is a duplication of code in AtomicInteger Even the RefCounted class does increment and decrement in the same way Yonik's patch does it simply and I do not see anything wrong with that.
          Hide
          Henri Biestro added a comment -

          Based on Ryan's comment, simplified the patch to strictly focus on the matter at hand.

          Show
          Henri Biestro added a comment - Based on Ryan's comment, simplified the patch to strictly focus on the matter at hand.
          Hide
          Henri Biestro added a comment - - edited

          About the SolrCore.open() & close():

          If we were to do an incrementAndGet(), we could end up opening a closed core;
          We thus must check the refcount is not <=0 first.
          The close could use a decrementAndGet() but the current code ensures the count will never go < 0 and is symmetrical to open.
          In both cases, it is the test in between the get() & compareAndSet() that makes the whole difference with {in,de]crementAndGet.

          My understanding of Yonik's version is that, as a premise, opening a core is always performed under the CoreContainer#cores synchronized protection; as I explained in a previous comment, the assumption can not be strictly met.

          Show
          Henri Biestro added a comment - - edited About the SolrCore.open() & close(): If we were to do an incrementAndGet(), we could end up opening a closed core; We thus must check the refcount is not <=0 first. The close could use a decrementAndGet() but the current code ensures the count will never go < 0 and is symmetrical to open. In both cases, it is the test in between the get() & compareAndSet() that makes the whole difference with {in,de]crementAndGet. My understanding of Yonik's version is that, as a premise, opening a core is always performed under the CoreContainer#cores synchronized protection; as I explained in a previous comment, the assumption can not be strictly met.
          Hide
          Yonik Seeley added a comment -

          he close could use a decrementAndGet() but the current code ensures the count will never go < 0.

          I don't think we should cover this error case up since it could lead to premature closure in other cases.
          I think we should do a simple decrementAndGet() followed by a logged error if the current count is less than zero.

          Show
          Yonik Seeley added a comment - he close could use a decrementAndGet() but the current code ensures the count will never go < 0. I don't think we should cover this error case up since it could lead to premature closure in other cases. I think we should do a simple decrementAndGet() followed by a logged error if the current count is less than zero.
          Hide
          Henri Biestro added a comment -

          Yonik;
          About the close(), you're right: the refcount < 0 being an error, shouldn't we go all the way and throw a runtime exception ?
          Are you in agreement about open()?

          Show
          Henri Biestro added a comment - Yonik; About the close(), you're right: the refcount < 0 being an error, shouldn't we go all the way and throw a runtime exception ? Are you in agreement about open()?
          Hide
          Henri Biestro added a comment -

          updated to reflect Yonik's last suggestion.
          fixed test attempting to close closed core.
          (throwing an exception was useful for quick detection).

          Show
          Henri Biestro added a comment - updated to reflect Yonik's last suggestion. fixed test attempting to close closed core. (throwing an exception was useful for quick detection).
          Hide
          Yonik Seeley added a comment -

          Are you in agreement about open()?

          Yes, if cores obtained through other methods need to be open for some reason, then open() would need to be called. I'm not sure if we have any cases like that.

          Show
          Yonik Seeley added a comment - Are you in agreement about open()? Yes, if cores obtained through other methods need to be open for some reason, then open() would need to be called. I'm not sure if we have any cases like that.
          Hide
          Noble Paul added a comment -

          About the close(), you're right: the refcount < 0 being an error, shouldn't we go all the way and throw a runtime exception ?

          refcount < 0 is an error . but should it throw an exception? I am not too sure. logging a severe error should be fine. because that should not be a reason to crash a server

          Are you in agreement about open()?

          even open() does not need a compareAndSet. just do a refCount.incrementAndget(). if count <=1 return null .because it is already closed or being closed .

          Show
          Noble Paul added a comment - About the close(), you're right: the refcount < 0 being an error, shouldn't we go all the way and throw a runtime exception ? refcount < 0 is an error . but should it throw an exception? I am not too sure. logging a severe error should be fine. because that should not be a reason to crash a server Are you in agreement about open()? even open() does not need a compareAndSet. just do a refCount.incrementAndget(). if count <=1 return null .because it is already closed or being closed .
          Hide
          Yonik Seeley added a comment -

          refcount < 0 is an error . but should it throw an exception? I am not too sure. logging a severe error should be fine. because that should not be a reason to crash a server

          Right... on one hand an exception brings more visibility, but on the other hand it's a recoverable error that's not necessarily tied to the request that would get the exception.

          even open() does not need a compareAndSet. just do a refCount.incrementAndget(). if count <=1 return null .because it is already closed or being closed .

          And then if another thread calls open() at the same time, count will be 2 and it would incorrectly return a closing core. If open() is needed, Henri has the right impl there.

          Show
          Yonik Seeley added a comment - refcount < 0 is an error . but should it throw an exception? I am not too sure. logging a severe error should be fine. because that should not be a reason to crash a server Right... on one hand an exception brings more visibility, but on the other hand it's a recoverable error that's not necessarily tied to the request that would get the exception. even open() does not need a compareAndSet. just do a refCount.incrementAndget(). if count <=1 return null .because it is already closed or being closed . And then if another thread calls open() at the same time, count will be 2 and it would incorrectly return a closing core. If open() is needed, Henri has the right impl there.
          Hide
          Otis Gospodnetic added a comment -

          Right... on one hand an exception brings more visibility, but on the other hand it's a recoverable error that's not necessarily tied to the request that would get the exception.

          Should we then do:
          LOG.severe("XYZ happened. Please report this problem on solr-user@lucene.apache.org");

          Show
          Otis Gospodnetic added a comment - Right... on one hand an exception brings more visibility, but on the other hand it's a recoverable error that's not necessarily tied to the request that would get the exception. Should we then do: LOG.severe("XYZ happened. Please report this problem on solr-user@lucene.apache.org");
          Hide
          Noble Paul added a comment -

          The CoreContainer#create(CoreDescriptor dcore) must close the old core after creating the new one

          Show
          Noble Paul added a comment - The CoreContainer#create(CoreDescriptor dcore) must close the old core after creating the new one
          Hide
          Henri Biestro added a comment -

          Yes, if cores obtained through other methods need to be open for some reason, then open() would need to be called.

          I'm not sure if we have any cases like that.

          No, we don't have any cases today. It could have been useful to someone implementing another filter or servlet.
          Since the same effect can be obtained with:

          CoreDescriptor dcore;
          CoreContainer container;
          ...
          SolrCore opened = container.getCore(dcore.getName());
          

          open() can just be incrementAndGet(). (finally!)

          LOG.severe("XYZ happened. Please report this problem on solr-user@lucene.apache.org")

          Seems to be the consensus; updated.

          The CoreContainer#create(CoreDescriptor dcore) must close the old core after creating the new one

          Correct, fixed.

          Show
          Henri Biestro added a comment - Yes, if cores obtained through other methods need to be open for some reason, then open() would need to be called. I'm not sure if we have any cases like that. No, we don't have any cases today. It could have been useful to someone implementing another filter or servlet. Since the same effect can be obtained with: CoreDescriptor dcore; CoreContainer container; ... SolrCore opened = container.getCore(dcore.getName()); open() can just be incrementAndGet(). (finally! ) LOG.severe("XYZ happened. Please report this problem on solr-user@lucene.apache.org") Seems to be the consensus; updated. The CoreContainer#create(CoreDescriptor dcore) must close the old core after creating the new one Correct, fixed.
          Hide
          Yonik Seeley added a comment -

          I'm thinking of changing the cores map to <String,SolrCore> (from <String,CoreDescriptor>)
          and removing getCore() from CoreDescriptor. Thoughts?

          Show
          Yonik Seeley added a comment - I'm thinking of changing the cores map to <String,SolrCore> (from <String,CoreDescriptor>) and removing getCore() from CoreDescriptor. Thoughts?
          Hide
          Yonik Seeley added a comment -

          Also, is there any reason not to allow the same core to be registered more than once if desired? It really seems like CoreContainer should just map from a String to a SolrCore and not worry about some of the other stuff.

          Show
          Yonik Seeley added a comment - Also, is there any reason not to allow the same core to be registered more than once if desired? It really seems like CoreContainer should just map from a String to a SolrCore and not worry about some of the other stuff.
          Hide
          Ryan McKinley added a comment -

          I'm thinking of changing the cores map to <String,SolrCore> (from <String,CoreDescriptor>) and removing getCore() from CoreDescriptor. Thoughts?

          since every core has a CoreDescriptor, this seems much cleaner then have the (existing) double link.

          is there any reason not to allow the same core to be registered more than once if desired?

          sounds good.

          Show
          Ryan McKinley added a comment - I'm thinking of changing the cores map to <String,SolrCore> (from <String,CoreDescriptor>) and removing getCore() from CoreDescriptor. Thoughts? since every core has a CoreDescriptor, this seems much cleaner then have the (existing) double link. is there any reason not to allow the same core to be registered more than once if desired? sounds good.
          Hide
          Yonik Seeley added a comment -

          FYI, I'm also overhauling some of the synchronization... the lock is held far too long in some cases (like reload when it's held the whole time a new core is being created).

          Show
          Yonik Seeley added a comment - FYI, I'm also overhauling some of the synchronization... the lock is held far too long in some cases (like reload when it's held the whole time a new core is being created).
          Hide
          Yonik Seeley added a comment -

          Also, persistFile looks like it would take enough time that I'm going to change it to synchronize on a separate lock so it doesn't block requests in the meantime.

          Show
          Yonik Seeley added a comment - Also, persistFile looks like it would take enough time that I'm going to change it to synchronize on a separate lock so it doesn't block requests in the meantime.
          Hide
          Yonik Seeley added a comment -

          Here's an updated patch:

          • cores is a Map<String,SolrCore>
          • no prohibition of adding a SolrCore to the CoreContainer multiple times (to be able to access it from multiple URLs, for back compatible migration for example)

          I was going to remove the SolrCore name altogether.... but things like JMX and logging use it.
          That's a weakness in both the way multicore worked (core swap wouldn't swap JMX) and the current patch (core name is independent of how it's mapped via CoreContainer).

          One resolution would be to have a callback on SolrCore whenever it's name is changed, so JMX and logging strings could be appropriately adjusted.

          Show
          Yonik Seeley added a comment - Here's an updated patch: cores is a Map<String,SolrCore> no prohibition of adding a SolrCore to the CoreContainer multiple times (to be able to access it from multiple URLs, for back compatible migration for example) I was going to remove the SolrCore name altogether.... but things like JMX and logging use it. That's a weakness in both the way multicore worked (core swap wouldn't swap JMX) and the current patch (core name is independent of how it's mapped via CoreContainer). One resolution would be to have a callback on SolrCore whenever it's name is changed, so JMX and logging strings could be appropriately adjusted.
          Hide
          Ryan McKinley added a comment -

          I was going to remove the SolrCore name altogether.... but things like JMX and logging use it.

          Can we remove it and always access the name via the CoreDescriptor? Having a name on the core made sense before we added the descriptor...

          Show
          Ryan McKinley added a comment - I was going to remove the SolrCore name altogether.... but things like JMX and logging use it. Can we remove it and always access the name via the CoreDescriptor? Having a name on the core made sense before we added the descriptor...
          Hide
          Yonik Seeley added a comment -

          Here's a slight update:

          • removes the SolrCore finalizer to prevent too many closes exception (we could do a refcount check too)
          • calls SolrCore.setName() when a core is added or renamed in the CoreContainer (it was just too wierd to do a core swap and not have the names change). This does not affect JMX.

          I was going to remove the SolrCore name altogether.... but things like JMX and logging use it.

          Can we remove it and always access the name via the CoreDescriptor? Having a name on the core made sense before we added the descriptor...

          If we did want to change JMX or other things on a name change, it seems like the call needs to be on the SolrCore. Currently this does change the logging string too (probably good for avoiding confusion on a core swap)

          Unless there are objections, I think this is close enough to commit soon and then propose smaller patches off of trunk.

          Show
          Yonik Seeley added a comment - Here's a slight update: removes the SolrCore finalizer to prevent too many closes exception (we could do a refcount check too) calls SolrCore.setName() when a core is added or renamed in the CoreContainer (it was just too wierd to do a core swap and not have the names change). This does not affect JMX. I was going to remove the SolrCore name altogether.... but things like JMX and logging use it. Can we remove it and always access the name via the CoreDescriptor? Having a name on the core made sense before we added the descriptor... If we did want to change JMX or other things on a name change, it seems like the call needs to be on the SolrCore. Currently this does change the logging string too (probably good for avoiding confusion on a core swap) Unless there are objections, I think this is close enough to commit soon and then propose smaller patches off of trunk.
          Hide
          Noble Paul added a comment -

          Also, is there any reason not to allow the same core to be registered more than once if desired?

          How will SolrCore#getCoreDescriptor() work. what will it return?

          should the getCore() have synchronized block?

          can we manage with a ConcurrentHashMap? A lock to be obtained per request looks like too much of a price . The only problem with concurrent map is that if you remove while an iteration is going on you may get an exception (correct me if I am wrong). If we can have a separate lock for just that we can have synchronization only in those places and we can spare the getCore()

          Show
          Noble Paul added a comment - Also, is there any reason not to allow the same core to be registered more than once if desired? How will SolrCore#getCoreDescriptor() work. what will it return? should the getCore() have synchronized block? can we manage with a ConcurrentHashMap? A lock to be obtained per request looks like too much of a price . The only problem with concurrent map is that if you remove while an iteration is going on you may get an exception (correct me if I am wrong). If we can have a separate lock for just that we can have synchronization only in those places and we can spare the getCore()
          Hide
          Henri Biestro added a comment - - edited

          Small err that would affect reload() in CoreContainer.register, ~line 261, it seems this should be:

          log.info( "replacing core: "+name );
          if (!returnPrev) {
                  old.close(); // not core.close()
          }
          

          Harmless, the "alias" attribute feature is folded into "name" so:

          String aliasesStr = DOMUtil.getAttr(node, "aliases", null);
          

          can go.

          Show
          Henri Biestro added a comment - - edited Small err that would affect reload() in CoreContainer.register, ~line 261, it seems this should be: log.info( "replacing core: " +name ); if (!returnPrev) { old.close(); // not core.close() } Harmless, the "alias" attribute feature is folded into "name" so: String aliasesStr = DOMUtil.getAttr(node, "aliases" , null ); can go.
          Hide
          Yonik Seeley added a comment -

          How will SolrCore#getCoreDescriptor() work. what will it return?

          The original core descriptor (the original name).

          should the getCore() have synchronized block?

          CoreContainer.getCore() does.
          I removed CoreDescriptor.getCore()

          can we manage with a ConcurrentHashMap? A lock to be obtained per request looks like too much of a price .

          It could be reworked in the future, but a single synchronized map lookup per top-level request is certainly nothing to worry about - there are probably at least hundreds of synchronized calls per average request.

          Show
          Yonik Seeley added a comment - How will SolrCore#getCoreDescriptor() work. what will it return? The original core descriptor (the original name). should the getCore() have synchronized block? CoreContainer.getCore() does. I removed CoreDescriptor.getCore() can we manage with a ConcurrentHashMap? A lock to be obtained per request looks like too much of a price . It could be reworked in the future, but a single synchronized map lookup per top-level request is certainly nothing to worry about - there are probably at least hundreds of synchronized calls per average request.
          Hide
          Yonik Seeley added a comment -

          I've committed this patch.
          There is probably some more multicore work to be done, but they can get their own issues.

          Show
          Yonik Seeley added a comment - I've committed this patch. There is probably some more multicore work to be done, but they can get their own issues.

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Noble Paul
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development