Solr
  1. Solr
  2. SOLR-182

register SolrRequestHandlers at runtime / lazy loading

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2
    • Component/s: None
    • Labels:
      None

      Description

      It would be useful to be able to register handlers after SolrCore has been initialized initialized. It is also useful to be able to ask what handlers are registered and to where. This patch adds the following functions to SolrCore:

      SolrRequestHandler registerRequestHandler(String handlerName, SolrRequestHandler handler);
      Collection<SolrRequestHandler> getRequestHandlers(Class<? extends SolrRequestHandler> clazz);

      It also guarantees that request handlers will be initialized with an argument saying what path it is registered to. RequestHandlerBase gets a bean for the registered path.

      While discussing this, Yonik suggested making it possible to defer initialization of some handlers that will be infrequently used. I added the 'LazyRequestHandlerWrapper' (if taking this out makes the patch any easier to commit - it can get its own issue)

      check:
      http://www.nabble.com/Dynamic-RequestHandler-loading-tf3351707.html

      1. SOLR-182-RuntimeRequestHandlers.patch
        59 kB
        Ryan McKinley
      2. SOLR-182-RuntimeRequestHandlers.patch
        22 kB
        Ryan McKinley
      3. SOLR-182-RuntimeRequestHandlers.patch
        18 kB
        Ryan McKinley
      4. SOLR-182-RuntimeRequestHandlers.patch
        22 kB
        Ryan McKinley
      5. SOLR-182-RuntimeRequestHandlers.patch
        23 kB
        Ryan McKinley
      6. SOLR-182-RuntimeRequestHandlers.patch
        23 kB
        Ryan McKinley
      7. SOLR-182-RuntimeRequestHandlers.patch
        23 kB
        Ryan McKinley

        Issue Links

          Activity

          Hide
          Ryan McKinley added a comment -

          If you all are more comfortable with

          Collection<SolrRequestHandler> getRequestHandlers()

          then:

          Collection<SolrRequestHandler> getRequestHandlers(Class<? extends SolrRequestHandler> clazz)

          that is an easy change. Likewise we can postpone the Lazy bit if it makes anything easier.

          I included tests for everything i think is testable about these changes, and added nice javadocs.

          Show
          Ryan McKinley added a comment - If you all are more comfortable with Collection<SolrRequestHandler> getRequestHandlers() then: Collection<SolrRequestHandler> getRequestHandlers(Class<? extends SolrRequestHandler> clazz) that is an easy change. Likewise we can postpone the Lazy bit if it makes anything easier. I included tests for everything i think is testable about these changes, and added nice javadocs.
          Hide
          Ryan McKinley added a comment -

          changed this so that LazyRequestHandlerWrapper is not a public class. That seems cleaner as it is not something that should be used directly

          Show
          Ryan McKinley added a comment - changed this so that LazyRequestHandlerWrapper is not a public class. That seems cleaner as it is not something that should be used directly
          Hide
          Ryan McKinley added a comment -

          Updated in response to Hoss' comments:
          http://www.nabble.com/Re%3A-Dynamic-RequestHandler-loading-p9337978.html

          1. gets rid of the get by class thing
          2. adds Map<> getRequestHandlers()
          3. gets rid of the extra param to init()

          Show
          Ryan McKinley added a comment - Updated in response to Hoss' comments: http://www.nabble.com/Re%3A-Dynamic-RequestHandler-loading-p9337978.html 1. gets rid of the get by class thing 2. adds Map<> getRequestHandlers() 3. gets rid of the extra param to init()
          Hide
          Hoss Man added a comment -

          two outstanding questions from the email discussion...

          1) yonik seemed to be concerned about synchronization issues involved with geting a request handler by name now that they can be registered dynamicly on the fly ... flarify that.

          2) if we want request handlers to be able to find out what nam(es) they are registered with (now that anyone can call core.registerRequestHandler I might give the exact same identicle instance multiple names) they should be able to do that from the init method ... since we aren't changing the API of the init method, we should probably make sure that registering a handler causes...
          handler = clazz.newInstance();
          this.register( name, handler );
          handler.init( args );
          ...to happen in that order.

          i would even argue that when registering multiple handlers (ie: from the config) we may want the psuedocode to be...

          foreach (handlerconfig)

          { handler = clazz.newInstance(); this.register( name, handler ); }

          foreach (key in registry)

          { lookup(key).init( args ); }

          ...so that all handlernames are defined before any init methods are called.

          Show
          Hoss Man added a comment - two outstanding questions from the email discussion... 1) yonik seemed to be concerned about synchronization issues involved with geting a request handler by name now that they can be registered dynamicly on the fly ... flarify that. 2) if we want request handlers to be able to find out what nam(es) they are registered with (now that anyone can call core.registerRequestHandler I might give the exact same identicle instance multiple names) they should be able to do that from the init method ... since we aren't changing the API of the init method, we should probably make sure that registering a handler causes... handler = clazz.newInstance(); this.register( name, handler ); handler.init( args ); ...to happen in that order. i would even argue that when registering multiple handlers (ie: from the config) we may want the psuedocode to be... foreach (handlerconfig) { handler = clazz.newInstance(); this.register( name, handler ); } foreach (key in registry) { lookup(key).init( args ); } ...so that all handlernames are defined before any init methods are called.
          Hide
          Ryan McKinley added a comment -

          #2, smart! yes

          Show
          Ryan McKinley added a comment - #2, smart! yes
          Hide
          Ryan McKinley added a comment -

          This update loads handlers in the style suggested by Hoss.

          It makes sure everything is instanciated and registered before calling init() on any handlres registered in solrconfig.xml

          It calls init() on handlers in the order they were defined.

          The only open issue is if SolrCore.getRequestHandler() should be synchronized. I can't think of any potential problems if it isn't but i could be missing something.

          I'll let whoever commits this decide if it should be synchronized or not.

          Show
          Ryan McKinley added a comment - This update loads handlers in the style suggested by Hoss. It makes sure everything is instanciated and registered before calling init() on any handlres registered in solrconfig.xml It calls init() on handlers in the order they were defined. The only open issue is if SolrCore.getRequestHandler() should be synchronized. I can't think of any potential problems if it isn't but i could be missing something. I'll let whoever commits this decide if it should be synchronized or not.
          Hide
          Yonik Seeley added a comment -

          w.r.t. synchronization of getRequestHandler(), I was just thinking ahead to when registerRequestHandler() may be called after the constructor for SolrCore.

          Registration before initialization is interesting, but again, this only works easily if registerRequestHandler() is restricted to the SolrCore constructor. If this were to change in the future, it would expose un-initialized handlers to requests.

          Show
          Yonik Seeley added a comment - w.r.t. synchronization of getRequestHandler(), I was just thinking ahead to when registerRequestHandler() may be called after the constructor for SolrCore. Registration before initialization is interesting, but again, this only works easily if registerRequestHandler() is restricted to the SolrCore constructor. If this were to change in the future, it would expose un-initialized handlers to requests.
          Hide
          Ryan McKinley added a comment -

          The API in this patch lets you call SolrCofe.registerRequestHandler( path, handler ) at any point. If you use this api, you are responsible to make sure the handler is initalized – this may or may not require calling init( NamedList ) depending on the handler implementation.

          The "Registration before initialization" is only save for SolrCore to use within its constructor. RequestHandlers.registerHandlers( NodeList nodes ) - is package private and only called from the SolrCore constructor.

          I still don't see how synchronization becomes an issue - unless someone is trying a bizarro use case where someone registers a handler within SolrRequestHandler.handleRequest() and expects that exactly the next request use the newly registered handler.

          In my use case, I have a custom filter that extends SolrRequestDispatcher. This filter initializes solr normally, then inspects what was registered and automatically sets up the rest of the environment.

          Show
          Ryan McKinley added a comment - The API in this patch lets you call SolrCofe.registerRequestHandler( path, handler ) at any point. If you use this api, you are responsible to make sure the handler is initalized – this may or may not require calling init( NamedList ) depending on the handler implementation. The "Registration before initialization" is only save for SolrCore to use within its constructor. RequestHandlers.registerHandlers( NodeList nodes ) - is package private and only called from the SolrCore constructor. I still don't see how synchronization becomes an issue - unless someone is trying a bizarro use case where someone registers a handler within SolrRequestHandler.handleRequest() and expects that exactly the next request use the newly registered handler. In my use case, I have a custom filter that extends SolrRequestDispatcher. This filter initializes solr normally, then inspects what was registered and automatically sets up the rest of the environment.
          Hide
          Yonik Seeley added a comment -

          If you modify the map in one thread and read from it in another thread, that requires synchronization to work correctly (even if it's a different entry being accessed).

          Show
          Yonik Seeley added a comment - If you modify the map in one thread and read from it in another thread, that requires synchronization to work correctly (even if it's a different entry being accessed).
          Hide
          Ryan McKinley added a comment -

          But what is the failure mode?

          Suppose, in thread A, I call:
          SolrCore.getSolrCore().registerRequestHandler( "/path/to/handler", handler );

          then 5 seconds later in thread D, I call:
          SolrCore.getSolrCore().getRequestHandler( "/path/to/handler" )

          Can we be sure the new handler will be returned? Is there any chance of anything exploding? Is it only in the microseconds around touching the map that things are undefined?

          If it is a graceful error (null or whatever was there before), i don't think this case needs to be synchronized. If it is something else could happen, it should be.

          Show
          Ryan McKinley added a comment - But what is the failure mode? Suppose, in thread A, I call: SolrCore.getSolrCore().registerRequestHandler( "/path/to/handler", handler ); then 5 seconds later in thread D, I call: SolrCore.getSolrCore().getRequestHandler( "/path/to/handler" ) Can we be sure the new handler will be returned? Is there any chance of anything exploding? Is it only in the microseconds around touching the map that things are undefined? If it is a graceful error (null or whatever was there before), i don't think this case needs to be synchronized. If it is something else could happen, it should be.
          Hide
          Yonik Seeley added a comment -

          > But what is the failure mode?

          Any number of modes of failure.... it's very tough to predict them (I think you'd have to be Doug Lea

          1)
          thread #1 does map.put("/path/to/handler", handler)
          thread #2 iterates over the map and gets a ConcurrentModificationException

          2)
          thread #1 does map.put("/path/to/handler", handler)
          thread #2 does map.put("/path/to/handler2", handler2)
          a) If they hash to the same bucket, one could overwrite the other
          b) one or both could cause resize() to be invoked... ouch! many different modes of failure there

          3)
          thread #1 does map.put("/path/to/handler", handler) causing resize to be called()
          thread #2 does a map.get("/myexistinghandler") and it gets back null

          I'd agree with you if the only mode of failure was to get back null for the current object being put in the map (since it's a race between threads anyway, null is a valid view - one should synchronize externally in that case anyway). But, any insert can mess up all other reads.

          Show
          Yonik Seeley added a comment - > But what is the failure mode? Any number of modes of failure.... it's very tough to predict them (I think you'd have to be Doug Lea 1) thread #1 does map.put("/path/to/handler", handler) thread #2 iterates over the map and gets a ConcurrentModificationException 2) thread #1 does map.put("/path/to/handler", handler) thread #2 does map.put("/path/to/handler2", handler2) a) If they hash to the same bucket, one could overwrite the other b) one or both could cause resize() to be invoked... ouch! many different modes of failure there 3) thread #1 does map.put("/path/to/handler", handler) causing resize to be called() thread #2 does a map.get("/myexistinghandler") and it gets back null I'd agree with you if the only mode of failure was to get back null for the current object being put in the map (since it's a race between threads anyway, null is a valid view - one should synchronize externally in that case anyway). But, any insert can mess up all other reads.
          Hide
          Ryan McKinley added a comment -

          synchronized it is!

          Rather then synchronizing each function call, I'm using a synchonized map:

          private final Map<String, SolrRequestHandler> handlers = Collections.synchronizedMap(
          new HashMap<String,SolrRequestHandler>() );

          Show
          Ryan McKinley added a comment - synchronized it is! Rather then synchronizing each function call, I'm using a synchonized map: private final Map<String, SolrRequestHandler> handlers = Collections.synchronizedMap( new HashMap<String,SolrRequestHandler>() );
          Hide
          Hoss Man added a comment -

          related note i'm typing before i forget...

          in SOLR-81 i tried to call SolrCore.getSolrCore.getDataDir() in the init method of a requestHandler and got an infinite loop. I can't remember if this type of situation would be prevented by this patch or not ... if it isn't that doesn't mean this patch shouldn't be committed, it just means we should probably open a separate bug to try and detect/prevent/error in that situation.

          Show
          Hoss Man added a comment - related note i'm typing before i forget... in SOLR-81 i tried to call SolrCore.getSolrCore.getDataDir() in the init method of a requestHandler and got an infinite loop. I can't remember if this type of situation would be prevented by this patch or not ... if it isn't that doesn't mean this patch shouldn't be committed, it just means we should probably open a separate bug to try and detect/prevent/error in that situation.
          Hide
          Ryan McKinley added a comment -

          yes, that situation is handled by this patch. This was one of my primary reasons for writing it!

          This patch lets you do call SolrCore.getCore() and inspect the schema/index/whatever. Without it, you need to do some sort of lazy loading after the first request.

          Show
          Ryan McKinley added a comment - yes, that situation is handled by this patch. This was one of my primary reasons for writing it! This patch lets you do call SolrCore.getCore() and inspect the schema/index/whatever. Without it, you need to do some sort of lazy loading after the first request.
          Hide
          Yonik Seeley added a comment -

          A couple of comments...

          • For lazy loading, you don't even want to load the class if it's not used (loaded classes take up resources, and there may be optional jars that will cause errors).
          • it really seems like init() must be called before any calls to handleRequest. To ensure this, I don't think we can do the registration inbetween. This isn't just a hypothetical problem... think about when a new web page is published that causes a new type of request to start hitting an existing Solr server... 10s to 100s of requests per second for a new hander all of a sudden. The likelihood becomes very high that another request will cause handleRequest() to be called before or concurrently with init().
          Show
          Yonik Seeley added a comment - A couple of comments... For lazy loading, you don't even want to load the class if it's not used (loaded classes take up resources, and there may be optional jars that will cause errors). it really seems like init() must be called before any calls to handleRequest. To ensure this, I don't think we can do the registration inbetween. This isn't just a hypothetical problem... think about when a new web page is published that causes a new type of request to start hitting an existing Solr server... 10s to 100s of requests per second for a new hander all of a sudden. The likelihood becomes very high that another request will cause handleRequest() to be called before or concurrently with init().
          Hide
          Ryan McKinley added a comment -

          > - For lazy loading, you don't even want to load the class if it's not used (loaded classes take up
          > resources, and there may be optional jars that will cause errors).

          Ok - I'm a little nervous about that because I like things to fail loudly at startup rather then wait to tell you they were configured incorrectly. (SOLR-179) But if you are using lazy loading, that is probably the behavior you would expect.

          I'll change it so that the LazyRequestHandlerWrapper stores the string value for the class name rather then the Class itself.

          >
          > - it really seems like init() must be called before any calls to handleRequest.
          >

          yes, init() must be called before any call to handleRequest() - absolutely

          Correct me if I have the lifecycle wrong, but I think it is ok:

          1. SolrDispatchFilter.init() calls SolrCore.getSolrCore()
          2. SolrCore.getSolrCore() calls SolrCore constructor
          3. SolrCore constructor initalizes schema, listeners, index and writers
          4. then calls reqHandlers.initHandlersFromConfig( SolrConfig.config )

          this function:
          a. creates each handler for solrconfig.xml and registers it
          b. calls init() on each handler - (since register was called first, each handler knows what else exists, but it may or may not be initialized)

          5. initialize searcher / updateHandler
          6. SolrDispatchFilter.init() finishes and solr starts accepting requests.

          All handlers call init() before they could possibly receive any requests. No requests can hit solr during the limbo period (a-b), It is only in the "unstable" state in the SolrCore constructor - I think the benefits of handlers knowing what else is registered during their init() method is worth the slightly awkward construction.

          The public interface:
          SolrCore.register( String handlerName, SolrRequestHandler handler )
          assumes that the handler is properly initialized. As soon as this is called it can immediately start accepting requests. I will make the javadoc more clear on this point.

          The only potentially dangerous function is (4) initHandlersFromConfig. This is a package private function that defiantly should not be called anywhere else. Calling this function twice is not normal, if someone does it, they are asking for trouble.

          Show
          Ryan McKinley added a comment - > - For lazy loading, you don't even want to load the class if it's not used (loaded classes take up > resources, and there may be optional jars that will cause errors). Ok - I'm a little nervous about that because I like things to fail loudly at startup rather then wait to tell you they were configured incorrectly. ( SOLR-179 ) But if you are using lazy loading, that is probably the behavior you would expect. I'll change it so that the LazyRequestHandlerWrapper stores the string value for the class name rather then the Class itself. > > - it really seems like init() must be called before any calls to handleRequest. > yes, init() must be called before any call to handleRequest() - absolutely Correct me if I have the lifecycle wrong, but I think it is ok: 1. SolrDispatchFilter.init() calls SolrCore.getSolrCore() 2. SolrCore.getSolrCore() calls SolrCore constructor 3. SolrCore constructor initalizes schema, listeners, index and writers 4. then calls reqHandlers.initHandlersFromConfig( SolrConfig.config ) this function: a. creates each handler for solrconfig.xml and registers it b. calls init() on each handler - (since register was called first, each handler knows what else exists, but it may or may not be initialized) 5. initialize searcher / updateHandler 6. SolrDispatchFilter.init() finishes and solr starts accepting requests. All handlers call init() before they could possibly receive any requests. No requests can hit solr during the limbo period (a-b), It is only in the "unstable" state in the SolrCore constructor - I think the benefits of handlers knowing what else is registered during their init() method is worth the slightly awkward construction. The public interface: SolrCore.register( String handlerName, SolrRequestHandler handler ) assumes that the handler is properly initialized. As soon as this is called it can immediately start accepting requests. I will make the javadoc more clear on this point. The only potentially dangerous function is (4) initHandlersFromConfig. This is a package private function that defiantly should not be called anywhere else. Calling this function twice is not normal, if someone does it, they are asking for trouble.
          Hide
          Ryan McKinley added a comment -

          1. Changed the LazyRequestHandlerWrapper to hang on to a string rather then a class and does not access the class until it is needed. (saves memory, but delays errors)

          2. Added more explicit documentation

          initHandlersFromConfig still registers all handlers before initializing them - i am confident this is ok unless it is called outside of the solr core constructor.

          Show
          Ryan McKinley added a comment - 1. Changed the LazyRequestHandlerWrapper to hang on to a string rather then a class and does not access the class until it is needed. (saves memory, but delays errors) 2. Added more explicit documentation initHandlersFromConfig still registers all handlers before initializing them - i am confident this is ok unless it is called outside of the solr core constructor.
          Hide
          Yonik Seeley added a comment -

          > (saves memory, but delays errors)
          Delays errors can also be a feature (if things need to be configured first, or jars need to be dropped in the right spot, etc).

          I think getWrappedHandler() needs to by synchronized or else

          • multiple instances could be instantiated
          • an instantiated instance could be handed back to a different thread before or during the handler's init()
          • general spookiness even after init() finishes due to lack of synchronization (initialized data won't necessarily be seen correctly in a different thread)
          Show
          Yonik Seeley added a comment - > (saves memory, but delays errors) Delays errors can also be a feature (if things need to be configured first, or jars need to be dropped in the right spot, etc). I think getWrappedHandler() needs to by synchronized or else multiple instances could be instantiated an instantiated instance could be handed back to a different thread before or during the handler's init() general spookiness even after init() finishes due to lack of synchronization (initialized data won't necessarily be seen correctly in a different thread)
          Hide
          Ryan McKinley added a comment -

          One line change adding synchronized to:
          public synchronized SolrRequestHandler getWrappedHandler()

          thanks yonik

          • - - - - - - - - -

          >> (saves memory, but delays errors)
          > Delays errors can also be a feature (if things need to be configured first, or jars need to be dropped in the right spot, etc).
          >

          I'm convinced. With SOLR-179 you can configure things to stop after errors - if you want some things to stop while otheres continue, you can make them lazy loaded.

          Show
          Ryan McKinley added a comment - One line change adding synchronized to: public synchronized SolrRequestHandler getWrappedHandler() thanks yonik - - - - - - - - - >> (saves memory, but delays errors) > Delays errors can also be a feature (if things need to be configured first, or jars need to be dropped in the right spot, etc). > I'm convinced. With SOLR-179 you can configure things to stop after errors - if you want some things to stop while otheres continue, you can make them lazy loaded.
          Hide
          Yonik Seeley added a comment -

          Committed. Thanks Ryan!

          Show
          Yonik Seeley added a comment - Committed. Thanks Ryan!

            People

            • Assignee:
              Unassigned
              Reporter:
              Ryan McKinley
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development