Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.3
    • Fix Version/s: 6.1, 7.0
    • Component/s: Server
    • Labels:
      None

      Description

      In some cases search components or analysis classes may utilize a large dictionary or other in-memory structure. When multiple cores are loaded with identical configurations utilizing this large in memory structure, each core holds it's own copy in memory. This has been noted in the past and a specific case reported in SOLR-3443. This patch provides a generalized capability, and if accepted, this capability will then be used to fix SOLR-3443.

      1. SOLR-8349.patch
        39 kB
        Gus Heck
      2. SOLR-8349.patch
        26 kB
        Gus Heck
      3. SOLR-8349.patch
        26 kB
        Gus Heck
      4. SOLR-8349.patch
        26 kB
        Gus Heck
      5. SOLR-8349.patch
        10 kB
        Noble Paul
      6. SOLR-8349.patch
        23 kB
        Gus Heck
      7. SOLR-8349.patch
        16 kB
        Gus Heck

        Issue Links

          Activity

          Hide
          gus_heck Gus Heck added a comment - - edited

          Patch implementing general feature (on 6.x). Patch for SOLR-3443 to be attached there shortly.

          Show
          gus_heck Gus Heck added a comment - - edited Patch implementing general feature (on 6.x). Patch for SOLR-3443 to be attached there shortly.
          Hide
          gus_heck Gus Heck added a comment -

          Further background for this feature... a precursor version of this patch (which does not have the interface and thus can't fix SOLR-3443) is in use at a client where we had a ~900MB hash map for looking up lat/lon from custom query parameters. This map was needed for all our cores. This is anticipated to save >35GB of ram since 40+ cores will live on a machine. The original implementation of this lat/lon lookup feature for the client attempted to use a static field, but the independent class loaders (each core gets it's own class loader) loaded fresh copies of the class each with it's own static map.

          It's worth noting that the analyzers such as the hunspell one in SOLR-3443 are not loaded by the core's class loaders and the excess memory there is held in a member field per instance, so a static variable based solution would be possible there. I thought it was better to provide a uniform solution.

          Another possible follow on feature (or perhaps enhancement to this one) would be a means of reference counting the shared resources and removing them. In the present (initial) patch, a long running solr instance where lots of cores are added and removed would potentially have unused container resources hanging around (though they would become used again with no loading time if a core were re-installed that required them). I didn't go into the complexity of removal because I wasn't sure if it would be deemed necessary.

          Show
          gus_heck Gus Heck added a comment - Further background for this feature... a precursor version of this patch (which does not have the interface and thus can't fix SOLR-3443 ) is in use at a client where we had a ~900MB hash map for looking up lat/lon from custom query parameters. This map was needed for all our cores. This is anticipated to save >35GB of ram since 40+ cores will live on a machine. The original implementation of this lat/lon lookup feature for the client attempted to use a static field, but the independent class loaders (each core gets it's own class loader) loaded fresh copies of the class each with it's own static map. It's worth noting that the analyzers such as the hunspell one in SOLR-3443 are not loaded by the core's class loaders and the excess memory there is held in a member field per instance, so a static variable based solution would be possible there. I thought it was better to provide a uniform solution. Another possible follow on feature (or perhaps enhancement to this one) would be a means of reference counting the shared resources and removing them. In the present (initial) patch, a long running solr instance where lots of cores are added and removed would potentially have unused container resources hanging around (though they would become used again with no loading time if a core were re-installed that required them). I didn't go into the complexity of removal because I wasn't sure if it would be deemed necessary.
          Hide
          gus_heck Gus Heck added a comment -

          I'm sure folks have been busy getting stuff into 5.4, but with the RC nearly produced, perhaps someone can review this now? It's intended for 6.x, but I'd like feedback/review so that I have something to chew on (unless of course it's already perfect ). I would definitely like someone who's comfortable with CoreContainer and class loading in Solr to take a peek. I put the definition for ContainerResourceSharing in Lucene package space. It needed to be there so that the hunspell analyzer for SOLR-3443 could leverage it without depending on Solr classes (or any other Lucene class that might load a large file that doesn't change from core to core). This patch adds no implementation in the Lucene world, only for SolrResourceLoader. Folks who are using Lucene in a container other than Solr can implement for their container or not as they desire.

          The intended pattern for usage within Solr (i.e. SearchComponent implementations) is to either navigate to CoreContainer from a core if CoreAware or cast the ResourceLoader if only ResourceLoaderAware. Either path leads to the same place as SolrResourceLoader merely delegates to it's CoreContainer anyway.

          At the Lucene level, code would want to test instanceof ContainerResourceSharing and do the right thing based on the result, retaining existing behavior to support use of Lucene outside of containers on in containers without support. Let me know if there are better ideas here...

          Show
          gus_heck Gus Heck added a comment - I'm sure folks have been busy getting stuff into 5.4, but with the RC nearly produced, perhaps someone can review this now? It's intended for 6.x, but I'd like feedback/review so that I have something to chew on (unless of course it's already perfect ). I would definitely like someone who's comfortable with CoreContainer and class loading in Solr to take a peek. I put the definition for ContainerResourceSharing in Lucene package space. It needed to be there so that the hunspell analyzer for SOLR-3443 could leverage it without depending on Solr classes (or any other Lucene class that might load a large file that doesn't change from core to core). This patch adds no implementation in the Lucene world, only for SolrResourceLoader . Folks who are using Lucene in a container other than Solr can implement for their container or not as they desire. The intended pattern for usage within Solr (i.e. SearchComponent implementations) is to either navigate to CoreContainer from a core if CoreAware or cast the ResourceLoader if only ResourceLoaderAware . Either path leads to the same place as SolrResourceLoader merely delegates to it's CoreContainer anyway. At the Lucene level, code would want to test instanceof ContainerResourceSharing and do the right thing based on the result, retaining existing behavior to support use of Lucene outside of containers on in containers without support. Let me know if there are better ideas here...
          Hide
          gus_heck Gus Heck added a comment -

          Has anyone had time to look at this?

          Show
          gus_heck Gus Heck added a comment - Has anyone had time to look at this?
          Hide
          gus_heck Gus Heck added a comment -

          Can this make it into 6.0? I'm slightly surprised that a feature that can save memory, especially for nodes with large numbers of cores hasn't generated more interest.

          Show
          gus_heck Gus Heck added a comment - Can this make it into 6.0? I'm slightly surprised that a feature that can save memory, especially for nodes with large numbers of cores hasn't generated more interest.
          Hide
          dsmiley David Smiley added a comment -

          Instead of a hand-rolled cache with separate per-key locks I suggest Guava's cache which already does this. See: https://code.google.com/p/guava-libraries/wiki/CachesExplained

          I'll commit your patch to 6 if you use that.

          I'm slightly surprised that a feature that can save memory, especially for nodes with large numbers of cores hasn't generated more interest.

          I think internal plumbing issues like this capture less interest since it's something another Solr hacker can work around. I've solved what you're solving here before for a client but without hacking Solr. It wasn't elegant... but I recall I put a lib on the top level classpath (not per-core) and I used a static member to hold the cache.

          Show
          dsmiley David Smiley added a comment - Instead of a hand-rolled cache with separate per-key locks I suggest Guava's cache which already does this. See: https://code.google.com/p/guava-libraries/wiki/CachesExplained I'll commit your patch to 6 if you use that. I'm slightly surprised that a feature that can save memory, especially for nodes with large numbers of cores hasn't generated more interest. I think internal plumbing issues like this capture less interest since it's something another Solr hacker can work around. I've solved what you're solving here before for a client but without hacking Solr. It wasn't elegant... but I recall I put a lib on the top level classpath (not per-core) and I used a static member to hold the cache.
          Hide
          dsmiley David Smiley added a comment -

          I take back my claim that I'll commit it if only it used Guava. I forgot about the Lucene layer part. I think you should file a Lucene issue on introducing that interface. It'll get input from more core Lucene devs.

          Show
          dsmiley David Smiley added a comment - I take back my claim that I'll commit it if only it used Guava. I forgot about the Lucene layer part. I think you should file a Lucene issue on introducing that interface. It'll get input from more core Lucene devs.
          Hide
          gus_heck Gus Heck added a comment -

          Thanks for the feedback Dave, I generally like guava caches, so this sounds like a good idea. I seem to recall that the Lucene part was motivated by the desire to have this fix SOLR-3443. I suspect it could be pulled out if needed. Should the Lucene ticket simply point to this or do I need to generate a separate patch for that that breaks things out?

          Show
          gus_heck Gus Heck added a comment - Thanks for the feedback Dave, I generally like guava caches, so this sounds like a good idea. I seem to recall that the Lucene part was motivated by the desire to have this fix SOLR-3443 . I suspect it could be pulled out if needed. Should the Lucene ticket simply point to this or do I need to generate a separate patch for that that breaks things out?
          Hide
          dsmiley David Smiley added a comment -

          I put a little more thought into this issue now. The cache at the CoreContainer makes sense but I'm not convinced that the Lucene layer needs a new abstraction. Instead, I think Solr could be enhanced to load some analysis components into this CoreContainer cache. The question is which ones? Not all... some components will be refer to resources that are local to a SolrCore. I'm not sure how easy it would be for Solr to detect that automatically; probably not easy. That leaves the possibility of a new attribute on the core to designate it as globally shared. What do you think?

          Show
          dsmiley David Smiley added a comment - I put a little more thought into this issue now. The cache at the CoreContainer makes sense but I'm not convinced that the Lucene layer needs a new abstraction. Instead, I think Solr could be enhanced to load some analysis components into this CoreContainer cache. The question is which ones? Not all... some components will be refer to resources that are local to a SolrCore. I'm not sure how easy it would be for Solr to detect that automatically; probably not easy. That leaves the possibility of a new attribute on the core to designate it as globally shared. What do you think?
          Hide
          gus_heck Gus Heck added a comment - - edited

          I am not sure I understand what you mean by load the analysis component into the container cache. The analysis component in lucene needs some way to reference the container and ask for the shared resource. My assumption was that it was not ok to add dependencies to solr classes into Lucene. Otherwise I could just cast to SolrResourceLoader in the analysis component in question. Did you look at SOLR-3443 where I use this patch to implement the feature for HunspellStemFilterFactory.java. do this:

          if (loader instanceof ContainerResourceSharing) {
                  resourceSharing = (ContainerResourceSharing) loader;
          

          this then allows me to have a method like:

            public Dictionary getDictionary() {
              return this.dictionary == null ? (Dictionary) resourceSharing.getContainerResource(resourceKey) : this.dictionary;
            }
          

          If I can use a Solr Class, that could just as easily say

          if (loader instanceof SolrResourceLoader) {
                  resourceSharing = (SolrResourceLoader) loader;
          

          One could also add methods to ResourceLoader itself, but then all forms of resource loader have to deal with implementing this method.

          Show
          gus_heck Gus Heck added a comment - - edited I am not sure I understand what you mean by load the analysis component into the container cache. The analysis component in lucene needs some way to reference the container and ask for the shared resource. My assumption was that it was not ok to add dependencies to solr classes into Lucene. Otherwise I could just cast to SolrResourceLoader in the analysis component in question. Did you look at SOLR-3443 where I use this patch to implement the feature for HunspellStemFilterFactory.java. do this: if (loader instanceof ContainerResourceSharing) { resourceSharing = (ContainerResourceSharing) loader; this then allows me to have a method like: public Dictionary getDictionary() { return this .dictionary == null ? (Dictionary) resourceSharing.getContainerResource(resourceKey) : this .dictionary; } If I can use a Solr Class, that could just as easily say if (loader instanceof SolrResourceLoader) { resourceSharing = (SolrResourceLoader) loader; One could also add methods to ResourceLoader itself, but then all forms of resource loader have to deal with implementing this method.
          Hide
          dsmiley David Smiley added a comment -

          First; I propose the container level cache be committed by itself (no Lucene abstraction or anything explicitly using it). It will enable us Solr hackers to at least explicitly use it in our components without resorting to using a static field on a top-level class-loaded class. A positive step forward.

          Now to clear something up; I definitely do not propose that Lucene have any sort of dependency on Solr. And I further propose that no change of any kind is needed to any of Lucene. I'll be more specific now that I see the relevant part of Solr. I propose that, Solr's FieldTypePluginLoader.readAnalyzer (called by IndexSchema) detect a flag attribute on a TokenizerFactory, TokenFilter, or CharFilterFactory that declares that it be global, and if so load it with a CoreContainer ResourceLoader (not a SolrCore one) into the shared cache (if it's not already there). To be specific. This doesn't affect the ResourceLoader abstraction, and components don't need to be written any differently.

          Show
          dsmiley David Smiley added a comment - First; I propose the container level cache be committed by itself (no Lucene abstraction or anything explicitly using it). It will enable us Solr hackers to at least explicitly use it in our components without resorting to using a static field on a top-level class-loaded class. A positive step forward. Now to clear something up; I definitely do not propose that Lucene have any sort of dependency on Solr. And I further propose that no change of any kind is needed to any of Lucene. I'll be more specific now that I see the relevant part of Solr. I propose that, Solr's FieldTypePluginLoader.readAnalyzer (called by IndexSchema) detect a flag attribute on a TokenizerFactory, TokenFilter, or CharFilterFactory that declares that it be global, and if so load it with a CoreContainer ResourceLoader (not a SolrCore one) into the shared cache (if it's not already there). To be specific. This doesn't affect the ResourceLoader abstraction, and components don't need to be written any differently.
          Hide
          gus_heck Gus Heck added a comment -

          Certainly we can punt enabling this for lucene level things to start with, and then tackle that in SOLR-3443 or in a separate Lucene ticket if appropriate. I'm totally fine with that. My motivating use case that got me started on this was a SearchComponent anyway.

          Now that I understand it, your idea is interesting. It generally sounds like it would allow a similar memory savings, though I had assumed promoting things to a higher class loader level would be undesirable. Here's some things that occurred to me as I pondered your suggestion...

          1. My (potentially erroneous) assumption is that the whole reason for the class loader separation is to allow long term stability with loading/unloading cores and not retain references to classes and objects that aren't needed anymore. In the extreme, if cores were loaded in the same class loader as the container this issue would be less complex.
          2. Putting the entire complexity of the custom Filter/Analyzer etc into the cache greatly enhances the chance of a class loader memory leak. Minimizing the complexity of what's cached puts the programmer in the best position to ensure core level classes don't become referenced by objects held in the container. Note that a follow on to this (or perhaps something required for it?) might be to reference count and unload unused keys.
          3. That said, why would we do it one way for analysis classes and another for components? If your direction is selected for analysis classes perhaps we should do components that way too?
          4. If we do that, what's left to be loaded in the core level loader? The non-increasing set of classes never previously loaded as global and whatever is not referenced by any component/analysis class I guess...
          5. I'm a little concerned about how we will manage to automatically create appropriate keys in the cache. The same analysis class or component may be configured multiple times and so we need a key that hashes the important configuration parameters to distinguish identical instances from variants. Automatic determination of "important" seems dicey though we could simply be pessimistic and use every configuration parameter we can find, but then we need to know which fields are representative of configuration parameters (annotation? but then we're modifying lucene again, drat) or intercept this information as we read the configuration, before we create the instance of the class? Do we have classes that configure complex sub components and hold those as fields?

          In SOLR-3443 I do the following to generate a key for the resource cache:

          md5.update(cs.encode(dictionaryFiles).array());
          md5.update(cs.encode(affixFile).array());
          md5.update(cs.encode(String.valueOf(ignoreCase)).array());
          md5.update(cs.encode(String.valueOf(longestOnly)).array());
          
          // ** SNIP ** //
          
          resourceKey = "org.apache.lucene.analysis.hunspell.dictionary." + configHash;
          
          Show
          gus_heck Gus Heck added a comment - Certainly we can punt enabling this for lucene level things to start with, and then tackle that in SOLR-3443 or in a separate Lucene ticket if appropriate. I'm totally fine with that. My motivating use case that got me started on this was a SearchComponent anyway. Now that I understand it, your idea is interesting. It generally sounds like it would allow a similar memory savings, though I had assumed promoting things to a higher class loader level would be undesirable. Here's some things that occurred to me as I pondered your suggestion... My (potentially erroneous) assumption is that the whole reason for the class loader separation is to allow long term stability with loading/unloading cores and not retain references to classes and objects that aren't needed anymore. In the extreme, if cores were loaded in the same class loader as the container this issue would be less complex. Putting the entire complexity of the custom Filter/Analyzer etc into the cache greatly enhances the chance of a class loader memory leak. Minimizing the complexity of what's cached puts the programmer in the best position to ensure core level classes don't become referenced by objects held in the container. Note that a follow on to this (or perhaps something required for it?) might be to reference count and unload unused keys. That said, why would we do it one way for analysis classes and another for components? If your direction is selected for analysis classes perhaps we should do components that way too? If we do that, what's left to be loaded in the core level loader? The non-increasing set of classes never previously loaded as global and whatever is not referenced by any component/analysis class I guess... I'm a little concerned about how we will manage to automatically create appropriate keys in the cache. The same analysis class or component may be configured multiple times and so we need a key that hashes the important configuration parameters to distinguish identical instances from variants. Automatic determination of "important" seems dicey though we could simply be pessimistic and use every configuration parameter we can find, but then we need to know which fields are representative of configuration parameters (annotation? but then we're modifying lucene again, drat) or intercept this information as we read the configuration, before we create the instance of the class? Do we have classes that configure complex sub components and hold those as fields? In SOLR-3443 I do the following to generate a key for the resource cache: md5.update(cs.encode(dictionaryFiles).array()); md5.update(cs.encode(affixFile).array()); md5.update(cs.encode( String .valueOf(ignoreCase)).array()); md5.update(cs.encode( String .valueOf(longestOnly)).array()); // ** SNIP ** // resourceKey = "org.apache.lucene.analysis.hunspell.dictionary." + configHash;
          Hide
          dsmiley David Smiley added a comment -

          1. the classLoader separation is simply because there are libs and conf/ resources per-core, as cores might be for entirely separate purposes with different dependencies that might not even be compatible if it were all together.
          2. RE memory-leak: I propose the weakValues() feature of Guava's cache. It's a nice cache No reference counting needed; we're in a VM that GCs.
          3. you're right; this isn't just about analysis components. But it would probably be incompatible with anything that implements CoreAware or SchemaAware as both those things are tied to a SolrCore.
          4. RE "what's left to be loaded in the core level loader": I think this re-use feature should be explicitly opt-in. Most stuff will cotinue as-is – core level loader. Docs for HunspellFactory on the ref guide could be augmented to suggest flagging global-reuse because we know it uses a lot of memory.
          5. RE keys: I think just the W3C DOM Node.java instance is probably fine. We might want to double-check it's equals() method makes sense and doesn't refer to the parent, or if it does find a way to clone/detach it. Actually I think it does; so we'd call cloneNode(true) and use the result of that. I'm not concerned on the efficiency of the keys; this cache isn't going to be hit hard.

          Any way, lets continue this discussion if need be on a follow-on issue for actually using the cache that this issue will create. Cool?

          Show
          dsmiley David Smiley added a comment - 1. the classLoader separation is simply because there are libs and conf/ resources per-core, as cores might be for entirely separate purposes with different dependencies that might not even be compatible if it were all together. 2. RE memory-leak: I propose the weakValues() feature of Guava's cache. It's a nice cache No reference counting needed; we're in a VM that GCs. 3. you're right; this isn't just about analysis components. But it would probably be incompatible with anything that implements CoreAware or SchemaAware as both those things are tied to a SolrCore. 4. RE "what's left to be loaded in the core level loader": I think this re-use feature should be explicitly opt-in. Most stuff will cotinue as-is – core level loader. Docs for HunspellFactory on the ref guide could be augmented to suggest flagging global-reuse because we know it uses a lot of memory. 5. RE keys: I think just the W3C DOM Node.java instance is probably fine. We might want to double-check it's equals() method makes sense and doesn't refer to the parent, or if it does find a way to clone/detach it. Actually I think it does; so we'd call cloneNode(true) and use the result of that. I'm not concerned on the efficiency of the keys; this cache isn't going to be hit hard. Any way, lets continue this discussion if need be on a follow-on issue for actually using the cache that this issue will create. Cool?
          Hide
          gus_heck Gus Heck added a comment -

          Guava sounded like a great idea. I love guava caches and have used them frequently in the past. When you mentioned it I thought to myself why didn't I think of that... eventually I remembered that I had, but decided not to go there because I didn't want to give control of memory issues to a library. In reflection that may have been a bit too draconian, so I gave guava caches a whirl. For the first time in my experience, I think I've found a use case they don't cover well. My design for this feature desires the following behavior:

          1. get should return null if the resource is requested when no attempt has been made to load it.
          2. load a resource only once, no provision for update or replacement is presently required, so first one in wins is just fine.
          3. subsequent attempts to load the resource are a non-blocking no-op, allowing cores 2 through N that require the resource to continue to configure themselves while resource is being loaded (possibly starting the loading of resources for other components).
          4. loading will be complete before the server completes startup and begins servicing requests. If a resource was supposed to be loaded and there was no error during loading, the component can rely on the existence of the resource at query (or update) time.
          5. never allow a query or update to solr to initiate the loading for obvious latency reasons.

          I realized that the unit test I supplied doesn't fully test #3 above so I modified it like this:

          +    final String[] lastThread = new String[1];
               // now add something to the map while it's under load
               try {
                 Thread[] set1 = new Thread[100];
          
          Then
          
                     calls.incrementAndGet();
          -        });
          +          lastThread[0] = Thread.currentThread().getName();
          +        }, "thread"+i); // give the threads names like thread0 thread1 thread2 etc
                 }
                 for (Thread thread : set1) {
                   thread.start();
          +        Thread.sleep(10);
                 }
                 while (calls.get() < 100) {
                   Thread.sleep(100);
                 }
          
          Then
          
                 Thread.sleep(100);
                 long count1b = counter[0];
                 long count2b = counter2[0];
          +      
          +      // make sure other loaders didn't wait and thread0 completed last
          +      assertEquals("thread0", lastThread[0]);
          

          This modified test still passes just fine with my original code. But so far I haven't made it pass with guava. The naive first attempt was to use get(key,loader) and ignore the return value and use getIfPresent(key) to service get requests (handling case #1) but this was not good:

          1. get(key,loader) ignoring the return value will block all cores that depend on this resource. The test fails with loader99 updating the array last.
          2. wraping the get(key,loader) in a thread fails because now they all complete before loading completes and the test gets null when it checks the assert that that the value was set. (it would eventually become set, but this is just like completing server startup before loading is complete, and then failing queries, probably with an NPE). Also, failing server startup when loading fails becomes problematic since we have to get the exception back out of the thread.

          After those attempts I found myself getting into tracking what keys have already had loaders supplied and trying to coordinate blocking/pass through on my own, and of course the access to the collection recording what I've seen has to be synchronized... etc. This winds up negating most of the benefit of the guava cache. This is too bad because at first it looked like guava would replace a bunch of my code with one liners. My guess is that the guava folks didn't supply put(k,loader) because most of the time you aren't accepting arbitrary loaders in the first place. If the loader is known in advance, then LoadingCache works nicely. Unfortunately, for us, the loader is not known in advance. I don't see any reasonable way to write "one loader to rule them all" to allow us to use a LoadingCache either, this would just move all my tracking code into TheOneLoader.java . Possibly a good reorganization code wise but we are still writing threaded code and not getting much simplification out of using guava caches that way. If we are going to write synchronized code, we should just make it do the right thing directly I think.

          Finally, weakValues() (or some equivalent) sounds good, but the problem is that the component supplies a loader and the container loads it at time A, and then at time B the component asks for the result of the load. Even if the component caches the results of that first request, the time between A and B leaves the object just loaded vulnerable to GC, since the only thing referencing it is the weak values cache. One could require the loader to set a reference, but that just presumes that that particular core won't be unloaded before some other core finally receives a request that causes it to fetch and cache the resource (remember only one loader ever runs, so only one component would get the hard reference initially). It seems that softValues() could be used, but without a hard reference too we are still potentially in trouble.

          1. All cores with hard references are unloaded, core X remains which has yet to acquire a hard reference.
          2. when memory gets low, the soft resource is GC'd
          3. request arrives to core X requiring resource.
          4. either goal #4 or #5 must be violated now.

          It's clearly the case that my design goals make things difficult. If I drop goal #3 (not blocking other cores), guava will drop right in nicely by using get(key,loader) and ignoring the return value. Which do you feel is more important, using guava and keeping the code simple or preserving the startup efficiency for the server when this feature is in use? I found this comment in CoreContainer which clearly indicates that the intent is to load in parallel, but also it seems that there's some degree of waiting going on already in cloud mode:

              // setup executor to load cores in parallel
              // do not limit the size of the executor in zk mode since cores may try and wait for each other.
              ExecutorService coreLoadExecutor = ExecutorUtil.newMDCAwareFixedThreadPool(
          
          Show
          gus_heck Gus Heck added a comment - Guava sounded like a great idea. I love guava caches and have used them frequently in the past. When you mentioned it I thought to myself why didn't I think of that... eventually I remembered that I had, but decided not to go there because I didn't want to give control of memory issues to a library. In reflection that may have been a bit too draconian, so I gave guava caches a whirl. For the first time in my experience, I think I've found a use case they don't cover well. My design for this feature desires the following behavior: get should return null if the resource is requested when no attempt has been made to load it. load a resource only once, no provision for update or replacement is presently required, so first one in wins is just fine. subsequent attempts to load the resource are a non-blocking no-op, allowing cores 2 through N that require the resource to continue to configure themselves while resource is being loaded (possibly starting the loading of resources for other components). loading will be complete before the server completes startup and begins servicing requests. If a resource was supposed to be loaded and there was no error during loading, the component can rely on the existence of the resource at query (or update) time. never allow a query or update to solr to initiate the loading for obvious latency reasons. I realized that the unit test I supplied doesn't fully test #3 above so I modified it like this: + final String [] lastThread = new String [1]; // now add something to the map while it's under load try { Thread [] set1 = new Thread [100]; Then calls.incrementAndGet(); - }); + lastThread[0] = Thread .currentThread().getName(); + }, "thread" +i); // give the threads names like thread0 thread1 thread2 etc } for ( Thread thread : set1) { thread.start(); + Thread .sleep(10); } while (calls.get() < 100) { Thread .sleep(100); } Then Thread .sleep(100); long count1b = counter[0]; long count2b = counter2[0]; + + // make sure other loaders didn't wait and thread0 completed last + assertEquals( "thread0" , lastThread[0]); This modified test still passes just fine with my original code. But so far I haven't made it pass with guava. The naive first attempt was to use get(key,loader) and ignore the return value and use getIfPresent(key) to service get requests (handling case #1) but this was not good: get(key,loader) ignoring the return value will block all cores that depend on this resource. The test fails with loader99 updating the array last. wraping the get(key,loader) in a thread fails because now they all complete before loading completes and the test gets null when it checks the assert that that the value was set. (it would eventually become set, but this is just like completing server startup before loading is complete, and then failing queries, probably with an NPE). Also, failing server startup when loading fails becomes problematic since we have to get the exception back out of the thread. After those attempts I found myself getting into tracking what keys have already had loaders supplied and trying to coordinate blocking/pass through on my own, and of course the access to the collection recording what I've seen has to be synchronized... etc. This winds up negating most of the benefit of the guava cache. This is too bad because at first it looked like guava would replace a bunch of my code with one liners. My guess is that the guava folks didn't supply put(k,loader) because most of the time you aren't accepting arbitrary loaders in the first place. If the loader is known in advance, then LoadingCache works nicely. Unfortunately, for us, the loader is not known in advance. I don't see any reasonable way to write "one loader to rule them all" to allow us to use a LoadingCache either, this would just move all my tracking code into TheOneLoader.java . Possibly a good reorganization code wise but we are still writing threaded code and not getting much simplification out of using guava caches that way. If we are going to write synchronized code, we should just make it do the right thing directly I think. Finally, weakValues() (or some equivalent) sounds good, but the problem is that the component supplies a loader and the container loads it at time A, and then at time B the component asks for the result of the load. Even if the component caches the results of that first request, the time between A and B leaves the object just loaded vulnerable to GC, since the only thing referencing it is the weak values cache. One could require the loader to set a reference, but that just presumes that that particular core won't be unloaded before some other core finally receives a request that causes it to fetch and cache the resource (remember only one loader ever runs, so only one component would get the hard reference initially). It seems that softValues() could be used, but without a hard reference too we are still potentially in trouble. All cores with hard references are unloaded, core X remains which has yet to acquire a hard reference. when memory gets low, the soft resource is GC'd request arrives to core X requiring resource. either goal #4 or #5 must be violated now. It's clearly the case that my design goals make things difficult. If I drop goal #3 (not blocking other cores), guava will drop right in nicely by using get(key,loader) and ignoring the return value. Which do you feel is more important, using guava and keeping the code simple or preserving the startup efficiency for the server when this feature is in use? I found this comment in CoreContainer which clearly indicates that the intent is to load in parallel, but also it seems that there's some degree of waiting going on already in cloud mode: // setup executor to load cores in parallel // do not limit the size of the executor in zk mode since cores may try and wait for each other. ExecutorService coreLoadExecutor = ExecutorUtil.newMDCAwareFixedThreadPool(
          Hide
          gus_heck Gus Heck added a comment - - edited

          Executive summary of the last overly long comment:

          1. Guava's cache is indeed a cool cache,
          2. It doesn't support arbitrary loaders in a way that is consistent with my design,
          3. We can do one of these things (AFAICT):
            1. Use the working code I wrote (no guava cache)
            2. Change our goals, and use guava (allow cores loading the same resource to all block each other until loading is done)
            3. Use guava and wrap it in additional loader management code of similar complexity as my original code.
          4. Weak/soft values require someone to hold the strong reference.

          New thought this morning: I could probably add methods and a list to the SolrCore object for the purpose of giving it a reference to the resource at load time, thus tying the life-cycle of the resource to the object we want it to live and die with. Then weak values would probably work fine.

          Show
          gus_heck Gus Heck added a comment - - edited Executive summary of the last overly long comment: Guava's cache is indeed a cool cache, It doesn't support arbitrary loaders in a way that is consistent with my design, We can do one of these things (AFAICT): Use the working code I wrote (no guava cache) Change our goals, and use guava (allow cores loading the same resource to all block each other until loading is done) Use guava and wrap it in additional loader management code of similar complexity as my original code. Weak/soft values require someone to hold the strong reference. New thought this morning: I could probably add methods and a list to the SolrCore object for the purpose of giving it a reference to the resource at load time, thus tying the life-cycle of the resource to the object we want it to live and die with. Then weak values would probably work fine.
          Hide
          dsmiley David Smiley added a comment -

          Change our goals, and use guava (allow cores loading the same resource to all block each other until loading is done)

          Seems totally fine to me provided this blocking is limited to the particular resource being loaded – what Guava gives you via this:
          http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html#build(com.google.common.cache.CacheLoader)

          It's a shame you spent so long on that looong write-up; as I read I wanted to say, just stop please, stop already, just go see CacheBuilder.build(CacheLoader)

          Show
          dsmiley David Smiley added a comment - Change our goals, and use guava (allow cores loading the same resource to all block each other until loading is done) Seems totally fine to me provided this blocking is limited to the particular resource being loaded – what Guava gives you via this: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html#build(com.google.common.cache.CacheLoader ) It's a shame you spent so long on that looong write-up; as I read I wanted to say, just stop please, stop already, just go see CacheBuilder.build(CacheLoader)
          Hide
          gus_heck Gus Heck added a comment -

          I did see .build(CacheLoader). My writeup was an explanation of why that (and other tactics) don't work (for the defined goals).

          What would you pass into build(CacheLoader)? How will it know how to load the data for a custom component that hasn't even been written yet. Furthermore, that pattern is meant to load on the first get(String) for a key, which (without further code to cause a warming-get before core init completes) violates Goal 5, and could block a query or update for the time it takes to load (~15-20 sec if I employ all 16 CPU with parallel streams in my case).

          If you do use build(CacheLoader) you will have to make your implementation of CacheLoader a wrapper object than can accept arbitrary loaders from yet to be written components, and then manage the acceptance and invocation of those individual loaders on it's own. One of the reasons I had the goal of not blocking is the case that motivated this for a client of mine was one with 40 cores in a node all using this one resource, but I can accept that this might be an unusual case.

          I'll submit a patch backing off on the "don't block loading other cores" goal soon.

          Show
          gus_heck Gus Heck added a comment - I did see .build(CacheLoader). My writeup was an explanation of why that (and other tactics) don't work (for the defined goals). What would you pass into build(CacheLoader)? How will it know how to load the data for a custom component that hasn't even been written yet. Furthermore, that pattern is meant to load on the first get(String) for a key, which (without further code to cause a warming-get before core init completes) violates Goal 5, and could block a query or update for the time it takes to load (~15-20 sec if I employ all 16 CPU with parallel streams in my case). If you do use build(CacheLoader) you will have to make your implementation of CacheLoader a wrapper object than can accept arbitrary loaders from yet to be written components, and then manage the acceptance and invocation of those individual loaders on it's own. One of the reasons I had the goal of not blocking is the case that motivated this for a client of mine was one with 40 cores in a node all using this one resource, but I can accept that this might be an unusual case. I'll submit a patch backing off on the "don't block loading other cores" goal soon.
          Hide
          dsmiley David Smiley added a comment -

          I see your point on CacheLoader needing to define its loading up-front; the caller of get() doesn't specify. I was confused with JDK's ConcurrentHashMap which I think in JDK 8 has a Function arg overloaded version. One way around this is for the Key to be be some new class hypothetically named say LoadingCacheKey that contains the real intended key (e.g. Node.java) plus a reference to a JDK 8 Function (or Supplier or something like that) that the CacheLoader will call. Then the CacheLoader would null out that function in the key. Admittedly that a bit's hacky, but nonetheless puts most of the tricky concurrency code into a dependent library where others maintain it; not us. That's my intent with this.

          I'll comment more later; gotta go right now.

          Show
          dsmiley David Smiley added a comment - I see your point on CacheLoader needing to define its loading up-front; the caller of get() doesn't specify. I was confused with JDK's ConcurrentHashMap which I think in JDK 8 has a Function arg overloaded version. One way around this is for the Key to be be some new class hypothetically named say LoadingCacheKey that contains the real intended key (e.g. Node.java) plus a reference to a JDK 8 Function (or Supplier or something like that) that the CacheLoader will call. Then the CacheLoader would null out that function in the key. Admittedly that a bit's hacky, but nonetheless puts most of the tricky concurrency code into a dependent library where others maintain it; not us. That's my intent with this. I'll comment more later; gotta go right now.
          Hide
          dsmiley David Smiley added a comment -

          One of the reasons I had the goal of not blocking is the case that motivated this for a client of mine was one with 40 cores in a node all using this one resource, but I can accept that this might be an unusual case.

          I think you could work around that by having the Search Component lazily load the resource in a background thread, returning a Future. Then when the Search Component is invoked in a search, it grabs what's in the Future, blocking if necessary.

          Show
          dsmiley David Smiley added a comment - One of the reasons I had the goal of not blocking is the case that motivated this for a client of mine was one with 40 cores in a node all using this one resource, but I can accept that this might be an unusual case. I think you could work around that by having the Search Component lazily load the resource in a background thread, returning a Future. Then when the Search Component is invoked in a search, it grabs what's in the Future, blocking if necessary.
          Hide
          gus_heck Gus Heck added a comment -

          Actually in my particular case since all cores need it anyway I'm just loading using parallel streaming and enough threads to soak up all CPU on the box until processing finishes... That's probably not a good solution for the general use case though.

          Anywho, after punting Goal 3 guava works easy as pie, updating to merge with latest if any now.

          Show
          gus_heck Gus Heck added a comment - Actually in my particular case since all cores need it anyway I'm just loading using parallel streaming and enough threads to soak up all CPU on the box until processing finishes... That's probably not a good solution for the general use case though. Anywho, after punting Goal 3 guava works easy as pie, updating to merge with latest if any now.
          Hide
          gus_heck Gus Heck added a comment -

          Changes since original patch:

          1. No interface added to lucene, and no support for lucene analyzers to use this feature at this time. (defer to subsequent enhancement)
          2. Caching now done using a Guava cache. This simplifies code, but means that a core loading a given resource will now block the loading of other cores that also require that resource.
          3. As suggested by Dave, the cache now uses weak references to the values avoiding the possibility of a ClassLoader memory leak. The SolrCore object now gets a hard reference to the object. This means that there is now some chance that a resource will be unloaded if the last core using it is unloaded and then loaded again. This differs from the previous code where it was guaranteed to not be unloaded. To guarantee an deterministic behavior we likely would need to use hard references in the cache as before, or add something along the lines of a reference counting scheme to know when the last core stopped using it.
          4. To ensure that the SolrCore is given a reference to the loaded object, a ContainerResourceAware interface was added and is treated similarly to SolrCoreAware and ResourceLoaderAware. This allows components to give us the loading code before we give them the core, and thus we can coordinate the loading such that no there is always a hard reference to the resource (until the last core using it is unloaded).
          Show
          gus_heck Gus Heck added a comment - Changes since original patch: No interface added to lucene, and no support for lucene analyzers to use this feature at this time. (defer to subsequent enhancement) Caching now done using a Guava cache. This simplifies code, but means that a core loading a given resource will now block the loading of other cores that also require that resource. As suggested by Dave, the cache now uses weak references to the values avoiding the possibility of a ClassLoader memory leak. The SolrCore object now gets a hard reference to the object. This means that there is now some chance that a resource will be unloaded if the last core using it is unloaded and then loaded again. This differs from the previous code where it was guaranteed to not be unloaded. To guarantee an deterministic behavior we likely would need to use hard references in the cache as before, or add something along the lines of a reference counting scheme to know when the last core stopped using it. To ensure that the SolrCore is given a reference to the loaded object, a ContainerResourceAware interface was added and is treated similarly to SolrCoreAware and ResourceLoaderAware. This allows components to give us the loading code before we give them the core, and thus we can coordinate the loading such that no there is always a hard reference to the resource (until the last core using it is unloaded).
          Hide
          dsmiley David Smiley added a comment -

          I now see Guava Cache.get(key,Callable loader) (and you use it in the patch), so that helps a lot to keep things simple.

          In #3 what do you mean by "deterministic behavior"? Do you mean, having code that implements finalize() or uses a ReferenceQueue, or basically needs to be closed in some way? Normally, we don't care the exact means of when the VM literally frees memory, so I doubt you mean that. For objects that implement AutoCloseable, we might want to throw an exception to state we don't support closing the object. Support for that would be nice; maybe via putting hard references to just those in the CoreContainer and having them be iterated and closed when the CoreContainer is closed.

          #4: I'd rather avoid new abstractions unless it's clearly useful, and so I question the utility of the interfaces ContainerResourceAware & ContainerResourceProvider. A component that wants the cache could implement SolrCoreAware and call core.getCoreDescriptor().getCoreContainer() which could expose one method like getOrLoadCached(key,lambdaLoader). For components that can't implement SolrCoreAware, it could implement ResourceLoaderAware and cast the loader to SolrResourceLoader and we could expose access to the CoreContainer from there with a new method. Is there some sequencing issue that makes this impossible?

          Furthermore, there's some complexity in your patch in trying to guarantee a hard reference, but I don't see that as necessary. If some component calls this cache method, it will then immediately puts it some place like a field of the component (be it SearchComponent or TokenFilterFactory or whatever). Similar story for the hypothetical future issue to propose having the SolrCore put entire SearchComponents and/or TokenFilters (etc.) into this cache.

          Can we use Object keys please? Remember Node.java example earlier.

          Show
          dsmiley David Smiley added a comment - I now see Guava Cache.get(key,Callable loader) (and you use it in the patch), so that helps a lot to keep things simple. In #3 what do you mean by "deterministic behavior"? Do you mean, having code that implements finalize() or uses a ReferenceQueue, or basically needs to be closed in some way? Normally, we don't care the exact means of when the VM literally frees memory, so I doubt you mean that. For objects that implement AutoCloseable, we might want to throw an exception to state we don't support closing the object. Support for that would be nice; maybe via putting hard references to just those in the CoreContainer and having them be iterated and closed when the CoreContainer is closed. #4: I'd rather avoid new abstractions unless it's clearly useful, and so I question the utility of the interfaces ContainerResourceAware & ContainerResourceProvider. A component that wants the cache could implement SolrCoreAware and call core.getCoreDescriptor().getCoreContainer() which could expose one method like getOrLoadCached(key,lambdaLoader). For components that can't implement SolrCoreAware, it could implement ResourceLoaderAware and cast the loader to SolrResourceLoader and we could expose access to the CoreContainer from there with a new method. Is there some sequencing issue that makes this impossible? Furthermore, there's some complexity in your patch in trying to guarantee a hard reference, but I don't see that as necessary. If some component calls this cache method, it will then immediately puts it some place like a field of the component (be it SearchComponent or TokenFilterFactory or whatever). Similar story for the hypothetical future issue to propose having the SolrCore put entire SearchComponents and/or TokenFilters (etc.) into this cache. Can we use Object keys please? Remember Node.java example earlier.
          Hide
          gus_heck Gus Heck added a comment -

          WRT #3/derministic behavior: Here's the use case:

          1. server is started, it loads a component that loads a file and creates resource A version 1 into memory
          2. some time later the file is updated, and these updates need to be deployed
          3. the new version 2 of the file is deployed to the server and the core is unloaded
          4. the core is then loaded again and brought on line and made available to users.

          We now cannot predict which version of the resource is available to the users. If GC occured and the resource was collected between steps 3 and 4 the new resource will become available as the user would expect. If not, the old resource will show up on calls to getResource() until a GC occurs in which the JVM decides to clear the weak reference to it. If the component caches a (hard) reference to the resource, the new version of the resource will never get loaded. The previous system without weak references did not allow the old resource to ever be unloaded (and hence was deterministic). Now the behavior is a product of GC timing and the internal aspects of how the component was programmed. I would like to subsequently (in some later patch) make it possible to refresh the resource in a predictable manner without restarting the whole node.

          WRT hard references: I want people to have success not missteps and re-implementation using my feature . For this reason I really like the weak references suggestion you made, but I want to manage it for them and not burden them with handling it properly. The submitted approach was meant to not bite the user who writes a component that never holds a reference to the resource. This would be a reasonable naive implementation for someone who knows nothing about the internals of solr and assumed they shouldn't hold the reference to ensure that the same resource was always seen everywhere.

          WRT the abstraction: it's there to get the loading code added to the deferredCallables list. SolrResourceLoader has no knowledge of the SolrCore until the core calls inform(core) on it. Unfortunately inform(resourceLoader) gets called before that. So any attempt to cast and do ((SolrResourceLoader)loader).getCore().getContainer() in the implementation of ResourceLoader#inform(loader) will throw an NPE. That's why the deferredCallables list exists. I chose to add the abstraction to enable the loader/core to manage hard references and allow the processing to become uniform with all loads being deferred. I wanted the folks attempting to use this to have a clear intuitive path to do so and the interfaces are meant to guide them into doing the right thing without needing to know all the details.

          It's worth noting that if the goal is a simple patch, the way to eliminate the MOST complexity from the patch is to have the component author manage references, and change:

                resourceLoader.inform(resourceLoader);
                resourceLoader.inform(this); // last call before the latch is released.
          

          to

                resourceLoader.inform(this); 
                resourceLoader.inform(resourceLoader); // last call before the latch is released.
          

          In that case, casting and navigating to the container in inform(ResourceLoader) will work and we can loose the abstractions, the deferred callables and associated latch/synchronization, and the object reference code goes away too... but I definitely don't feel qualified to change the order in which components are made aware of things. I have no idea if any code out there would be relying on this order of inform() calls in some way.

          Lastly, Object key's are certainly possible, though this does reintroduce a vector for class loader memory leakages as previously discussed. I left this out because we were not supporting the lucene analyzers yet, and I wasn't yet adding "automatic" keys from configuration nodes. Automatic keys would be a nice feature to improve the feature and ensure implementors don't need to think so hard to use it. I'm amenable to try adding that now if you like, though the option to supply one's own key should remain.

          Show
          gus_heck Gus Heck added a comment - WRT #3/derministic behavior : Here's the use case: server is started, it loads a component that loads a file and creates resource A version 1 into memory some time later the file is updated, and these updates need to be deployed the new version 2 of the file is deployed to the server and the core is unloaded the core is then loaded again and brought on line and made available to users. We now cannot predict which version of the resource is available to the users. If GC occured and the resource was collected between steps 3 and 4 the new resource will become available as the user would expect. If not, the old resource will show up on calls to getResource() until a GC occurs in which the JVM decides to clear the weak reference to it. If the component caches a (hard) reference to the resource, the new version of the resource will never get loaded. The previous system without weak references did not allow the old resource to ever be unloaded (and hence was deterministic). Now the behavior is a product of GC timing and the internal aspects of how the component was programmed. I would like to subsequently (in some later patch) make it possible to refresh the resource in a predictable manner without restarting the whole node. WRT hard references : I want people to have success not missteps and re-implementation using my feature . For this reason I really like the weak references suggestion you made, but I want to manage it for them and not burden them with handling it properly. The submitted approach was meant to not bite the user who writes a component that never holds a reference to the resource. This would be a reasonable naive implementation for someone who knows nothing about the internals of solr and assumed they shouldn't hold the reference to ensure that the same resource was always seen everywhere. WRT the abstraction : it's there to get the loading code added to the deferredCallables list. SolrResourceLoader has no knowledge of the SolrCore until the core calls inform(core) on it. Unfortunately inform(resourceLoader) gets called before that. So any attempt to cast and do ((SolrResourceLoader)loader).getCore().getContainer() in the implementation of ResourceLoader#inform(loader) will throw an NPE. That's why the deferredCallables list exists. I chose to add the abstraction to enable the loader/core to manage hard references and allow the processing to become uniform with all loads being deferred. I wanted the folks attempting to use this to have a clear intuitive path to do so and the interfaces are meant to guide them into doing the right thing without needing to know all the details. It's worth noting that if the goal is a simple patch, the way to eliminate the MOST complexity from the patch is to have the component author manage references, and change: resourceLoader.inform(resourceLoader); resourceLoader.inform( this ); // last call before the latch is released. to resourceLoader.inform( this ); resourceLoader.inform(resourceLoader); // last call before the latch is released. In that case, casting and navigating to the container in inform(ResourceLoader) will work and we can loose the abstractions, the deferred callables and associated latch/synchronization, and the object reference code goes away too... but I definitely don't feel qualified to change the order in which components are made aware of things. I have no idea if any code out there would be relying on this order of inform() calls in some way. Lastly, Object key's are certainly possible, though this does reintroduce a vector for class loader memory leakages as previously discussed. I left this out because we were not supporting the lucene analyzers yet, and I wasn't yet adding "automatic" keys from configuration nodes. Automatic keys would be a nice feature to improve the feature and ensure implementors don't need to think so hard to use it. I'm amenable to try adding that now if you like, though the option to supply one's own key should remain.
          Hide
          noble.paul Noble Paul added a comment - - edited

          Gus Heck There is another easy existing solution for this problem

          How to do this

          1) Store your large file in the blob store
          2) Use blobRef = JarRepository.getJarIncRef(name) to get the content (we will change the method names for it to make sense for you)
          3) Make your component register as a closehook
          4) In the postClose(), do a blobref.decrementJarRefCount()

          The advantages of this solution are,

          1) You get a version managed store for your large files without screwing up your ZK
          2) There are APIs to manage the blob
          3) It is already refcounted etc

          caveats are

          1) It does not work for Standalone. We can extend it to do that
          2) You will have to make your component startup=lazy

          Show
          noble.paul Noble Paul added a comment - - edited Gus Heck There is another easy existing solution for this problem How to do this 1) Store your large file in the blob store 2) Use blobRef = JarRepository.getJarIncRef(name) to get the content (we will change the method names for it to make sense for you) 3) Make your component register as a closehook 4) In the postClose() , do a blobref.decrementJarRefCount() The advantages of this solution are, 1) You get a version managed store for your large files without screwing up your ZK 2) There are APIs to manage the blob 3) It is already refcounted etc caveats are 1) It does not work for Standalone. We can extend it to do that 2) You will have to make your component startup=lazy
          Hide
          gus_heck Gus Heck added a comment -

          That sounds like a very cool way to store the source data, but this issue isn't about access to the blob/file itself. It's about access to the in memory java object created from the file/blob/whatever, so each core does not have to hold it's own identical copy in RAM. In our case we had a csv file that was translated into a HashMap (for fast access during query processing) that had >900MB of memory usage. When we fired up 40 cores (20 collections each with a main core and one replica) that used this component RAM usage was through the roof with no data in the index yet.

          A similar situation is faced by Analyzers that have dictionaries etc (see SOLR-3443). The current patch no longer supports a solution to that issue, as support for Lucene layer stuff has been deferred per discussion above.

          It is interesting to see that there is an implementation of the ref counting to use as a model, and your notes about hooks on the component will be useful too. That may come in handy if/when a refresh mechanism is desired.

          Show
          gus_heck Gus Heck added a comment - That sounds like a very cool way to store the source data, but this issue isn't about access to the blob/file itself. It's about access to the in memory java object created from the file/blob/whatever, so each core does not have to hold it's own identical copy in RAM. In our case we had a csv file that was translated into a HashMap (for fast access during query processing) that had >900MB of memory usage. When we fired up 40 cores (20 collections each with a main core and one replica) that used this component RAM usage was through the roof with no data in the index yet. A similar situation is faced by Analyzers that have dictionaries etc (see SOLR-3443 ). The current patch no longer supports a solution to that issue, as support for Lucene layer stuff has been deferred per discussion above. It is interesting to see that there is an implementation of the ref counting to use as a model, and your notes about hooks on the component will be useful too. That may come in handy if/when a refresh mechanism is desired.
          Hide
          noble.paul Noble Paul added a comment - - edited

          OK , I got the point. The framework can be easily extended to make the decoder pluggable. So the cache can just keep the decoded object in memory instead of ByteBuffer

          So the API would look like

          MyCsvDecoder<MyCustomObject> csvDecoder = null;//initiate your decoder that would convert your csv to MyCustomObject
          ObjectRef<MyCustomObject> ref =     BlobRepository.getObjectIncRef(name, csvDecoder );
          
          Show
          noble.paul Noble Paul added a comment - - edited OK , I got the point. The framework can be easily extended to make the decoder pluggable. So the cache can just keep the decoded object in memory instead of ByteBuffer So the API would look like MyCsvDecoder<MyCustomObject> csvDecoder = null ; //initiate your decoder that would convert your csv to MyCustomObject ObjectRef<MyCustomObject> ref = BlobRepository.getObjectIncRef(name, csvDecoder );
          Hide
          noble.paul Noble Paul added a comment - - edited

          That may come in handy if/when a refresh mechanism is desired.

          The refresh mechanism is built into the system . The name is a combination of name and version example: myCsvFile/1 (where '1' is the first version) Every component can be updated using the API. If you add a new version of the blob, you should just simply update your component using the config API to use myCsvFile/2. The rest is automatically taken care of

          Show
          noble.paul Noble Paul added a comment - - edited That may come in handy if/when a refresh mechanism is desired. The refresh mechanism is built into the system . The name is a combination of name and version example: myCsvFile/1 (where '1' is the first version) Every component can be updated using the API. If you add a new version of the blob, you should just simply update your component using the config API to use myCsvFile/2 . The rest is automatically taken care of
          Hide
          gus_heck Gus Heck added a comment -

          I do see how what you suggest could be made to work if a decoder was supplied and the decoded version was what was cached.

          So are you proposing changing JarRepository, or adding a similar class called BlobRepository? JarRepository does seem very Jar focused. (returning JarContent, JarContentRef, etc..) It would for example no longer be possible to return a ByteBuffer type from getByteBuffer, since returning the bytes from which to decode a HashMap won't help. Seems like fairly major surgery is required to make the JarRepository class fully generic.

          I do like the versioned resource and config API bit though, that's nice.

          I need to better understand the lazy="true" bit you mentioned, because I very much don't ever want a user query to be the thing that causes the resource to load (a process that could take minutes). The resource should be available and ready to go when the system starts accepting queries.

          Show
          gus_heck Gus Heck added a comment - I do see how what you suggest could be made to work if a decoder was supplied and the decoded version was what was cached. So are you proposing changing JarRepository, or adding a similar class called BlobRepository? JarRepository does seem very Jar focused. (returning JarContent, JarContentRef, etc..) It would for example no longer be possible to return a ByteBuffer type from getByteBuffer, since returning the bytes from which to decode a HashMap won't help. Seems like fairly major surgery is required to make the JarRepository class fully generic. I do like the versioned resource and config API bit though, that's nice. I need to better understand the lazy="true" bit you mentioned, because I very much don't ever want a user query to be the thing that causes the resource to load (a process that could take minutes). The resource should be available and ready to go when the system starts accepting queries.
          Hide
          gus_heck Gus Heck added a comment - - edited

          hmm looks like I didn't realize my ide had navigated me to another class getByteBuffer is on MemClassLoader... Ah I see no I confused the method name when I wrote the comment I meant to say getFileContent (on JarContent, or it's equivalent) couldn't return a ByteBuffer

          Show
          gus_heck Gus Heck added a comment - - edited hmm looks like I didn't realize my ide had navigated me to another class getByteBuffer is on MemClassLoader... Ah I see no I confused the method name when I wrote the comment I meant to say getFileContent (on JarContent, or it's equivalent) couldn't return a ByteBuffer
          Hide
          noble.paul Noble Paul added a comment - - edited

          So are you proposing changing JarRepository, or adding a similar class called BlobRepository?

          No, I would rename it to BlobRepository

          Seems like fairly major surgery is required to make the JarRepository class fully generic.

          I shall put up a patch which can do this

          I need to better understand the lazy="true" bit you mentioned,

          I understand the problem with startup=lazy we probably should make a new interface called BlobStoreAware which loads the component when the BlobStore is available. But let's keep it separate

          Show
          noble.paul Noble Paul added a comment - - edited So are you proposing changing JarRepository, or adding a similar class called BlobRepository? No, I would rename it to BlobRepository Seems like fairly major surgery is required to make the JarRepository class fully generic. I shall put up a patch which can do this I need to better understand the lazy="true" bit you mentioned, I understand the problem with startup=lazy we probably should make a new interface called BlobStoreAware which loads the component when the BlobStore is available. But let's keep it separate
          Hide
          noble.paul Noble Paul added a comment -

          Gus Heck untested patch . should be good enough for you to go ahead

          Show
          noble.paul Noble Paul added a comment - Gus Heck untested patch . should be good enough for you to go ahead
          Hide
          dsmiley David Smiley added a comment -

          This is a nice suggestion Noble Paul; it looks remarkably close. +1 to enhancing it to be more generic and subsume this use-case.

          Show
          dsmiley David Smiley added a comment - This is a nice suggestion Noble Paul ; it looks remarkably close. +1 to enhancing it to be more generic and subsume this use-case.
          Hide
          gus_heck Gus Heck added a comment -

          Looked at the patch some today, Mostly it looks easy enough to adapt to my client's code, but I think there might be some holes WRT decoders decoding the same content more than once when multiple cores are loaded, and it seems we hold onto the ByteBuffer after the decoding, which doubles memory usage. Will comment more and provide suggestions (+ patch) tomorrow. Since decoding our file takes significant time and pegs the cpu's, I really don't want that repeating itself for all 40 cores .

          Loading code in my decoder will look something like this:

                ForkJoinPool pool = new ForkJoinPool(Runtime.getRuntime().availableProcessors());
                try (Stream<String> lines = new BufferedReader(new InputStreamReader(inputStream, Charset.forName("UTF-8"))).lines()) {
                  try {
                    pool.submit(() -> lines.parallel().forEach(this::processSimpleCsvRow)).get();
                  } catch (InterruptedException | ExecutionException e) {
                    throw new IOException(e);
                  }
                } catch (IOException e) {
                  throw new RuntimeException("Cannot load  csv " , e);
                } finally {
                  pool.shutdownNow();
                }
          
          Show
          gus_heck Gus Heck added a comment - Looked at the patch some today, Mostly it looks easy enough to adapt to my client's code, but I think there might be some holes WRT decoders decoding the same content more than once when multiple cores are loaded, and it seems we hold onto the ByteBuffer after the decoding, which doubles memory usage. Will comment more and provide suggestions (+ patch) tomorrow. Since decoding our file takes significant time and pegs the cpu's, I really don't want that repeating itself for all 40 cores . Loading code in my decoder will look something like this: ForkJoinPool pool = new ForkJoinPool( Runtime .getRuntime().availableProcessors()); try (Stream< String > lines = new BufferedReader( new InputStreamReader(inputStream, Charset.forName( "UTF-8" ))).lines()) { try { pool.submit(() -> lines.parallel().forEach( this ::processSimpleCsvRow)).get(); } catch (InterruptedException | ExecutionException e) { throw new IOException(e); } } catch (IOException e) { throw new RuntimeException( "Cannot load csv " , e); } finally { pool.shutdownNow(); }
          Hide
          noble.paul Noble Paul added a comment -

          but I think there might be some holes WRT decoders decoding the same content more than once when multiple cores are loaded,

          You can easily solve it by synchronizing the decode() code at your end

          Show
          noble.paul Noble Paul added a comment - but I think there might be some holes WRT decoders decoding the same content more than once when multiple cores are loaded, You can easily solve it by synchronizing the decode() code at your end
          Hide
          gus_heck Gus Heck added a comment -

          Sure I could, but other folks who try to use this after us will likely stumble into that pitfall. I've played around with it and simplified BlobContent, moved the PluginBag specific stuff to PluginBag and provided some javadoc and a user friendly method on SolrCore which will ensure that a close hook is also created. It seems all my goals except #3 are satisfied and we've exceeded expectations for #2 making it possible to reliably update the content on the fly with no danger of memory leaks (yay). I definitely like this approach better. Attaching patch #4. The patch contains a class named org.apache.solr.handler.component.XXCustomComponent demonstrating usage (redacted and cleansed version of what I'm using for a client's server, adapted to this patch) This class obviously should not be included in the commit.

          Show
          gus_heck Gus Heck added a comment - Sure I could, but other folks who try to use this after us will likely stumble into that pitfall. I've played around with it and simplified BlobContent, moved the PluginBag specific stuff to PluginBag and provided some javadoc and a user friendly method on SolrCore which will ensure that a close hook is also created. It seems all my goals except #3 are satisfied and we've exceeded expectations for #2 making it possible to reliably update the content on the fly with no danger of memory leaks (yay). I definitely like this approach better. Attaching patch #4. The patch contains a class named org.apache.solr.handler.component.XXCustomComponent demonstrating usage (redacted and cleansed version of what I'm using for a client's server, adapted to this patch) This class obviously should not be included in the commit.
          Hide
          noble.paul Noble Paul added a comment -

          Are you working on the latest trunk? I have already committed my other patch in SOLR-8719

          Show
          noble.paul Noble Paul added a comment - Are you working on the latest trunk? I have already committed my other patch in SOLR-8719
          Hide
          gus_heck Gus Heck added a comment -

          I had updated on the 22nd. Should have pulled before I created the patch, but usually a one day lag isn't a problem. I didn't know you were going to create another issue and check it in . I'll recreate the patch.

          Show
          gus_heck Gus Heck added a comment - I had updated on the 22nd. Should have pulled before I created the patch, but usually a one day lag isn't a problem. I didn't know you were going to create another issue and check it in . I'll recreate the patch.
          Hide
          gus_heck Gus Heck added a comment -

          patch vs up to date master

          Show
          gus_heck Gus Heck added a comment - patch vs up to date master
          Hide
          noble.paul Noble Paul added a comment - - edited

          One problem I see with the patch is, with decoding the bytebuffer in two different ways . What if core1 has decoder1 and core2 has decoder2. Then the second call gets the output of first decoder. That's why I kept a map internally so that it is possible to deal with that usecase. It may be unusual to do so , but, for sake of correctness we have to do it

          Show
          noble.paul Noble Paul added a comment - - edited One problem I see with the patch is, with decoding the bytebuffer in two different ways . What if core1 has decoder1 and core2 has decoder2. Then the second call gets the output of first decoder. That's why I kept a map internally so that it is possible to deal with that usecase. It may be unusual to do so , but, for sake of correctness we have to do it
          Hide
          gus_heck Gus Heck added a comment -

          Patch with support for multiple decodings of the same blob

          Show
          gus_heck Gus Heck added a comment - Patch with support for multiple decodings of the same blob
          Hide
          noble.paul Noble Paul added a comment -

          really? I still think it doesn't take care of multiple decoders.

          Show
          noble.paul Noble Paul added a comment - really? I still think it doesn't take care of multiple decoders.
          Hide
          gus_heck Gus Heck added a comment -

          Perhaps you were looking for a second hash map? We don't need one if we have more descriptive keys. I should have explained my latest patch. I was hasty.

          Implementations of the Decoder interface can now (optionally) give their decoders names.

           public interface Decoder<T> {
          
              /**
               * A name by which to distinguish this decoding. This only needs to be implemented if you want to support
               * decoding the same blob content with more than one decoder.
               * 
               * @return The name of the decoding, defaults to empty string.
               */
              default String getName() { return ""; }
          
              /**
               * A routine that knows how to convert the stream of bytes from the blob into a Java object.
               * 
               * @param inputStream the bytes from a blob
               * @return A Java object of the specified type.
               */
              T decode(InputStream inputStream);
            }
          
          

          The internal hashmap that holds blob content objects (be they decoded or not) will append this name to the key for put and get operations in getIncrementRef and store the appropriate
          value in the key field of BlobContent

            BlobContentRef<Object> getBlobIncRef(String key, Decoder<Object> decoder) {
              return getBlobIncRef(key.concat(decoder.getName()), () -> addBlob(key,decoder));
            }
          
          in  private <T> BlobContentRef<T> getBlobIncRef(String key, Callable<BlobContent<T>> blobCreator) ...
          
                  aBlob = blobs.get(key);
                  if (aBlob == null) {
                    try {
                      aBlob = blobCreator.call();
          
          

          Note that the unmodified key was supplied to the lambda invoking addBlob...

            // for use cases sharing java objects
            private BlobContent<Object> addBlob(String key, Decoder<Object> decoder) {  
              ByteBuffer b = fetchBlob(key);                    
              String  keyPlusName = key + decoder.getName();
              BlobContent<Object> aBlob = new BlobContent<>(keyPlusName, b, decoder);
              blobs.put(keyPlusName, aBlob);
              return aBlob;
            }
          

          Thus the BlobContent object is holding the more descriptive key when we get to decrementBlobRefCount and do this:

                if (ref.blob.references.isEmpty()) {
                  blobs.remove(ref.blob.key);
                }
          

          Also note that the differing method signatures distinguish when we don't have a decoder because we are interested in caching the raw bytes, not the decoded form and a different path is taken... In this case the key in the map matches the key in the blob store ensuring that we can't cache the raw bytes twice.

            public BlobContentRef<ByteBuffer> getBlobIncRef(String key) {
             return getBlobIncRef(key, () -> addBlob(key));
            }
          
            // For use cases sharing raw bytes
            private BlobContent<ByteBuffer> addBlob(String key) {
              ByteBuffer b = fetchBlob(key);
              BlobContent<ByteBuffer> aBlob  = new BlobContent<>(key, b);
              blobs.put(key, aBlob);
              return aBlob;
            }
          

          Looking at it again today, It occurs to me that there is a small chance that if someone wants to share the raw bytes and also share a decoded form of the same blob and they fail to name their Decoder we would have a race condition. This should be avoided by defaulting to "!D!" (or some other "reserved" string) instead of empty string in the above Decoder interface (and document this in the javadoc). It is intentionally left up to the implementor to decide whether or not they provide names that allow/prevent collisions.

          Show
          gus_heck Gus Heck added a comment - Perhaps you were looking for a second hash map? We don't need one if we have more descriptive keys. I should have explained my latest patch. I was hasty. Implementations of the Decoder interface can now (optionally) give their decoders names. public interface Decoder<T> { /** * A name by which to distinguish this decoding. This only needs to be implemented if you want to support * decoding the same blob content with more than one decoder. * * @ return The name of the decoding, defaults to empty string. */ default String getName() { return ""; } /** * A routine that knows how to convert the stream of bytes from the blob into a Java object. * * @param inputStream the bytes from a blob * @ return A Java object of the specified type. */ T decode(InputStream inputStream); } The internal hashmap that holds blob content objects (be they decoded or not) will append this name to the key for put and get operations in getIncrementRef and store the appropriate value in the key field of BlobContent BlobContentRef< Object > getBlobIncRef( String key, Decoder< Object > decoder) { return getBlobIncRef(key.concat(decoder.getName()), () -> addBlob(key,decoder)); } in private <T> BlobContentRef<T> getBlobIncRef( String key, Callable<BlobContent<T>> blobCreator) ... aBlob = blobs.get(key); if (aBlob == null ) { try { aBlob = blobCreator.call(); Note that the unmodified key was supplied to the lambda invoking addBlob... // for use cases sharing java objects private BlobContent< Object > addBlob( String key, Decoder< Object > decoder) { ByteBuffer b = fetchBlob(key); String keyPlusName = key + decoder.getName(); BlobContent< Object > aBlob = new BlobContent<>(keyPlusName, b, decoder); blobs.put(keyPlusName, aBlob); return aBlob; } Thus the BlobContent object is holding the more descriptive key when we get to decrementBlobRefCount and do this: if (ref.blob.references.isEmpty()) { blobs.remove(ref.blob.key); } Also note that the differing method signatures distinguish when we don't have a decoder because we are interested in caching the raw bytes, not the decoded form and a different path is taken... In this case the key in the map matches the key in the blob store ensuring that we can't cache the raw bytes twice. public BlobContentRef<ByteBuffer> getBlobIncRef( String key) { return getBlobIncRef(key, () -> addBlob(key)); } // For use cases sharing raw bytes private BlobContent<ByteBuffer> addBlob( String key) { ByteBuffer b = fetchBlob(key); BlobContent<ByteBuffer> aBlob = new BlobContent<>(key, b); blobs.put(key, aBlob); return aBlob; } Looking at it again today, It occurs to me that there is a small chance that if someone wants to share the raw bytes and also share a decoded form of the same blob and they fail to name their Decoder we would have a race condition. This should be avoided by defaulting to "!D!" (or some other "reserved" string) instead of empty string in the above Decoder interface (and document this in the javadoc). It is intentionally left up to the implementor to decide whether or not they provide names that allow/prevent collisions.
          Hide
          noble.paul Noble Paul added a comment -

          It's fine . But the patch doesn't have this extra method for decoder

          Show
          noble.paul Noble Paul added a comment - It's fine . But the patch doesn't have this extra method for decoder
          Hide
          noble.paul Noble Paul added a comment -

          Gus Heck can u post the updated patch?

          Show
          noble.paul Noble Paul added a comment - Gus Heck can u post the updated patch?
          Hide
          gus_heck Gus Heck added a comment -

          Oooops! sorry, it seems I uploaded the wrong file (and missed your response). Thx for following up!

          Show
          gus_heck Gus Heck added a comment - Oooops! sorry, it seems I uploaded the wrong file (and missed your response). Thx for following up!
          Hide
          gus_heck Gus Heck added a comment -

          Actual patch that supports multiple decodings.

          Show
          gus_heck Gus Heck added a comment - Actual patch that supports multiple decodings.
          Hide
          noble.paul Noble Paul added a comment -

          The patch looks fine. If we have a testcase for this w could commit this right away

          Show
          noble.paul Noble Paul added a comment - The patch looks fine. If we have a testcase for this w could commit this right away
          Hide
          gus_heck Gus Heck added a comment -

          Hmm, Ive been wondering how I might write a good test case for it. Are there some tests out there that fire up a cloud mode solr and interact with multiple cores that I can use as example? I can probably come up with mock object based tests that exercise some of the methods, but that rather misses the cross core sharing aspect of this and the usage by components. I expect SolrTestcaseJ4 will probably hit the else block in this code:

              if (this.coreContainer.isZooKeeperAware()) {
                // stuff we want to test is here!
              } else {
                throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Blob loading is not supported in non-cloud mode");
              }
          
          Show
          gus_heck Gus Heck added a comment - Hmm, Ive been wondering how I might write a good test case for it. Are there some tests out there that fire up a cloud mode solr and interact with multiple cores that I can use as example? I can probably come up with mock object based tests that exercise some of the methods, but that rather misses the cross core sharing aspect of this and the usage by components. I expect SolrTestcaseJ4 will probably hit the else block in this code: if ( this .coreContainer.isZooKeeperAware()) { // stuff we want to test is here! } else { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Blob loading is not supported in non-cloud mode" ); }
          Hide
          gus_heck Gus Heck added a comment -

          Ah it seems a brand new class for cloud unit testing has magically appeared in SOLR-8758 just today.

          Show
          gus_heck Gus Heck added a comment - Ah it seems a brand new class for cloud unit testing has magically appeared in SOLR-8758 just today.
          Hide
          gus_heck Gus Heck added a comment -

          Unit tests, both cloud style and mock. Everything passed precommit before I updated... after update the changes to the spatial stuff broke my IDE and precommit, but there were no conflicts with my code so this should still be ok

          Show
          gus_heck Gus Heck added a comment - Unit tests, both cloud style and mock. Everything passed precommit before I updated... after update the changes to the spatial stuff broke my IDE and precommit, but there were no conflicts with my code so this should still be ok
          Hide
          noble.paul Noble Paul added a comment - - edited

          Thanks. the tests are fine. But for one thing,

          The blob store is not guaranteed to be available at core load time (the .system collection ). So , your component should implement BlobStoreAware and only in the callback for that interface , the class should try to load resources . The BlobStoreAware interface is not yet implemented SOLR-8772

          Show
          noble.paul Noble Paul added a comment - - edited Thanks. the tests are fine. But for one thing, The blob store is not guaranteed to be available at core load time (the .system collection ). So , your component should implement BlobStoreAware and only in the callback for that interface , the class should try to load resources . The BlobStoreAware interface is not yet implemented SOLR-8772
          Hide
          gus_heck Gus Heck added a comment -

          Right I got around that by creating .system first and uploading before I created a collection with a config that uses the blob store. I think some configurations that NEED the blob to function properly will want to fail if the blob store (and/or the blob) is not available. Both Using and not Using BlobStoreAware might be reasonable? Or maybe the interface should have a flag identifying if it is allowable to open the server without it?

          Show
          gus_heck Gus Heck added a comment - Right I got around that by creating .system first and uploading before I created a collection with a config that uses the blob store. I think some configurations that NEED the blob to function properly will want to fail if the blob store (and/or the blob) is not available. Both Using and not Using BlobStoreAware might be reasonable? Or maybe the interface should have a flag identifying if it is allowable to open the server without it?
          Hide
          gus_heck Gus Heck added a comment -

          Would be good to sort out what needs to be done for SOLR-8772 in time for 6.1.

          Show
          gus_heck Gus Heck added a comment - Would be good to sort out what needs to be done for SOLR-8772 in time for 6.1.
          Hide
          noble.paul Noble Paul added a comment -

          If you thing it is good enough, I can review and go ahead

          Show
          noble.paul Noble Paul added a comment - If you thing it is good enough, I can review and go ahead
          Hide
          gus_heck Gus Heck added a comment -

          re-applied patch to trunk (no conflicts) and ran tests, and hit this... I had seen something like this before but I hadn't noted the seed at the time and it hadn't happened in a while so I assumed it had been fixed by something I did. Looks a little like it might be the blob store not being ready (SOLR-8772). This seed seems to reproduce it reliably, though the test passes most of the time for me. (Ran it 10 times, saw this twice).

           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=BlobRepositoryCloudTest -Dtests.seed=387A9881AC04D1FB -Dtests.slow=true -Dtests.locale=mk -Dtests.timezone=America/Lower_Princes -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   0.00s | BlobRepositoryCloudTest (suite) <<<
             [junit4]    > Throwable #1: java.net.SocketException: Connection reset
             [junit4]    >        at __randomizedtesting.SeedInfo.seed([387A9881AC04D1FB]:0)
             [junit4]    >        at java.net.SocketInputStream.read(SocketInputStream.java:209)
             [junit4]    >        at java.net.SocketInputStream.read(SocketInputStream.java:141)
             [junit4]    >        at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
             [junit4]    >        at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
             [junit4]    >        at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
             [junit4]    >        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
             [junit4]    >        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
             [junit4]    >        at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
             [junit4]    >        at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
             [junit4]    >        at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
             [junit4]    >        at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
             [junit4]    >        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
             [junit4]    >        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
             [junit4]    >        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
             [junit4]    >        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
             [junit4]    >        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
             [junit4]    >        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
             [junit4]    >        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
             [junit4]    >        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
             [junit4]    >        at org.apache.solr.core.BlobRepositoryCloudTest.postBlob(BlobRepositoryCloudTest.java:104)
             [junit4]    >        at org.apache.solr.core.BlobRepositoryCloudTest.setupCluster(BlobRepositoryCloudTest.java:59)
             [junit4]    >        at java.lang.Thread.run(Thread.java:745)
             [junit4] Completed [1/1 (1!)] in 8.94s, 0 tests, 1 error <<< FAILURES!
          
          Show
          gus_heck Gus Heck added a comment - re-applied patch to trunk (no conflicts) and ran tests, and hit this... I had seen something like this before but I hadn't noted the seed at the time and it hadn't happened in a while so I assumed it had been fixed by something I did. Looks a little like it might be the blob store not being ready ( SOLR-8772 ). This seed seems to reproduce it reliably, though the test passes most of the time for me. (Ran it 10 times, saw this twice). [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=BlobRepositoryCloudTest -Dtests.seed=387A9881AC04D1FB -Dtests.slow= true -Dtests.locale=mk -Dtests.timezone=America/Lower_Princes -Dtests.asserts= true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.00s | BlobRepositoryCloudTest (suite) <<< [junit4] > Throwable #1: java.net.SocketException: Connection reset [junit4] > at __randomizedtesting.SeedInfo.seed([387A9881AC04D1FB]:0) [junit4] > at java.net.SocketInputStream.read(SocketInputStream.java:209) [junit4] > at java.net.SocketInputStream.read(SocketInputStream.java:141) [junit4] > at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139) [junit4] > at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155) [junit4] > at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284) [junit4] > at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) [junit4] > at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) [junit4] > at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) [junit4] > at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165) [junit4] > at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167) [junit4] > at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) [junit4] > at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) [junit4] > at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271) [junit4] > at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) [junit4] > at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) [junit4] > at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) [junit4] > at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) [junit4] > at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) [junit4] > at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) [junit4] > at org.apache.solr.core.BlobRepositoryCloudTest.postBlob(BlobRepositoryCloudTest.java:104) [junit4] > at org.apache.solr.core.BlobRepositoryCloudTest.setupCluster(BlobRepositoryCloudTest.java:59) [junit4] > at java.lang. Thread .run( Thread .java:745) [junit4] Completed [1/1 (1!)] in 8.94s, 0 tests, 1 error <<< FAILURES!
          Hide
          gus_heck Gus Heck added a comment -

          Actually looking closer it seems to be happening when I try to add the blob, and not in the code that relies on the blob, so this might be a separate issue. Adding a 2 second sleep after startup of the cluster in the test does not alleviate it so this test seems to be creating a condition where the blob store is not able to receive a blob. Whether that is an artifact of what's going on ins SolrCloudTestCase or not, I don't know.

          Show
          gus_heck Gus Heck added a comment - Actually looking closer it seems to be happening when I try to add the blob, and not in the code that relies on the blob, so this might be a separate issue. Adding a 2 second sleep after startup of the cluster in the test does not alleviate it so this test seems to be creating a condition where the blob store is not able to receive a blob. Whether that is an artifact of what's going on ins SolrCloudTestCase or not, I don't know.
          Hide
          noble.paul Noble Paul added a comment -

          The patch is not up to date with the master. Can you please update it

          Show
          noble.paul Noble Paul added a comment - The patch is not up to date with the master. Can you please update it
          Hide
          gus_heck Gus Heck added a comment -

          Patch vs origin/master. Only conflict was a change to use signature for an updated library version in code I am replacing anyway.

          Show
          gus_heck Gus Heck added a comment - Patch vs origin/master. Only conflict was a change to use signature for an updated library version in code I am replacing anyway.
          Hide
          gus_heck Gus Heck added a comment -

          also deleted the misnamed patch since it seems to sort to the top.

          Show
          gus_heck Gus Heck added a comment - also deleted the misnamed patch since it seems to sort to the top.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9a1880aee821d4e6e96a8ff2fb15062b1e4c9eb1 in lucene-solr's branch refs/heads/master from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9a1880a ]

          SOLR-8349: Allow sharing of large in memory data structures across cores

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9a1880aee821d4e6e96a8ff2fb15062b1e4c9eb1 in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9a1880a ] SOLR-8349 : Allow sharing of large in memory data structures across cores
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 489acdb5092676965e1dda067d35b58aeee8cf7d in lucene-solr's branch refs/heads/branch_6x from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=489acdb ]

          SOLR-8349: Allow sharing of large in memory data structures across cores

          Show
          jira-bot ASF subversion and git services added a comment - Commit 489acdb5092676965e1dda067d35b58aeee8cf7d in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=489acdb ] SOLR-8349 : Allow sharing of large in memory data structures across cores
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5b680de13a40021825b71d35c04a790dde2d7f6a in lucene-solr's branch refs/heads/branch_6x from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5b680de ]

          SOLR-8349: Allow sharing of large in memory data structures across cores

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5b680de13a40021825b71d35c04a790dde2d7f6a in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5b680de ] SOLR-8349 : Allow sharing of large in memory data structures across cores
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 456d5c04c87744659a241f85d0fa04c683c81a2c in lucene-solr's branch refs/heads/master from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=456d5c0 ]

          SOLR-8349: trying to address test failures

          Show
          jira-bot ASF subversion and git services added a comment - Commit 456d5c04c87744659a241f85d0fa04c683c81a2c in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=456d5c0 ] SOLR-8349 : trying to address test failures
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ffbbfbbe107cf42de6299c6c94b032bb21fe716f in lucene-solr's branch refs/heads/branch_6x from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ffbbfbb ]

          SOLR-8349: trying to address test failures

          Show
          jira-bot ASF subversion and git services added a comment - Commit ffbbfbbe107cf42de6299c6c94b032bb21fe716f in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ffbbfbb ] SOLR-8349 : trying to address test failures
          Hide
          hossman Hoss Man added a comment -

          Manually correcting fixVersion per Step #S5 of LUCENE-7271

          Show
          hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271

            People

            • Assignee:
              noble.paul Noble Paul
              Reporter:
              gus_heck Gus Heck
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development