Solr
  1. Solr
  2. SOLR-972

EventListener-s creation changed from a per request ( full / delta-imports) scenario to once through the lifetime of the DIH plugin.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Labels:
      None
    • Environment:

      Java 6, Tomcat 6

      Description

      The EventListener plugin for notification of start / end import events (SOLR-938) creates an instance of EventListener before every notification. This has 2 drawbacks.

      • No state is stored between successive invocations of events as it is a new object
      • When writing plugins for delta imports - it is very inefficient to do a class loader lookup by reflection / instantiate an instance and call a method on the same.

      Attached patch has one EventListener through the lifetime of the DIH plugin .

      Also EventListener is changed to an interface rather than an abstract class for better decoupling (especially painful when the start/end eventlistener has an independent hierarchy by itself ).

      By default, a no-op listener is registered to avoid boiler plate code to check if there is a start / end listener specified. Efficient JRE impls should be able to optimize the no-op for minimum overhead compared to checking the reference for null and branching out.

      Specifying an onImportStart / onImportEnd overrides the default handler though.

        Issue Links

          Activity

          Karthik K created issue -
          Karthik K made changes -
          Field Original Value New Value
          Fix Version/s 1.4 [ 12313351 ]
          Karthik K made changes -
          Attachment SOLR-972.patch [ 12398266 ]
          Karthik K made changes -
          Link This issue is related to SOLR-938 [ SOLR-938 ]
          Karthik K made changes -
          Summary EventListener are not created per request ( full / delta-imports) but rather instantiated once per lifetime of the application - maintaining state + efficient EventListener-s creation changed from a per request ( full / delta-imports) scenario to once through the lifetime of the DIH plugin.
          Karthik K made changes -
          Link This issue is related to SOLR-935 [ SOLR-935 ]
          Hide
          Noble Paul added a comment -

          Also EventListener is changed to an interface rather than an abstract class for better decoupling

          interfaces are generally discouraged in Lucene/Solr because it can make the API hard to modify. Though there is a slight inconvenience of creating a separate class i guess it should be fine.

          No state is stored between successive invocations of events as it is a new object

          We must find a better way to do this than making it a part of the object lifecycle. We can add an extra scope to the Context#setSessionAttribute which can persist between multiple invocations. This can hekp all the components to share the same object

          - it is very inefficient to do a class loader lookup by reflection / instantiate an instance and call a method on the same.

          can we quantify that ?

          Show
          Noble Paul added a comment - Also EventListener is changed to an interface rather than an abstract class for better decoupling interfaces are generally discouraged in Lucene/Solr because it can make the API hard to modify. Though there is a slight inconvenience of creating a separate class i guess it should be fine. No state is stored between successive invocations of events as it is a new object We must find a better way to do this than making it a part of the object lifecycle. We can add an extra scope to the Context#setSessionAttribute which can persist between multiple invocations. This can hekp all the components to share the same object - it is very inefficient to do a class loader lookup by reflection / instantiate an instance and call a method on the same. can we quantify that ?
          Hide
          Karthik K added a comment -
          interfaces are generally discouraged in Lucene/Solr because it can make the API hard to modify. Though there is a slight inconvenience of creating a separate class i guess it should be fine.

          The rules about modification apply well enough for interfaces and abstract classes. Having an abstract class limits the object hierarchy of the handler unnecessary when it could be more flexible through the interfaces.

          We must find a better way to do this than making it a part of the object lifecycle. We can add an extra scope to the Context#setSessionAttribute which can persist between multiple invocations. This can hekp all the components to share the same object

          The persistence that I am concerned about is about the EventListener and less about Context ( which is bound to be specific to be an event for most of the time ). If the code in EventListener cannot store any state (fields ) - it is not so intuitive to the developer of the plugin and not very efficient either.

          can we quantify that ?

          Use of the reflection API ( Class.forName() , Class.newInstance() ) in any piece of code that is going to be executed frequently ( delta import in this case ) is hardly the best way to scale. I can come up with a sample unit test case for that - but I think it is moot.

          Show
          Karthik K added a comment - interfaces are generally discouraged in Lucene/Solr because it can make the API hard to modify. Though there is a slight inconvenience of creating a separate class i guess it should be fine. The rules about modification apply well enough for interfaces and abstract classes. Having an abstract class limits the object hierarchy of the handler unnecessary when it could be more flexible through the interfaces. We must find a better way to do this than making it a part of the object lifecycle. We can add an extra scope to the Context#setSessionAttribute which can persist between multiple invocations. This can hekp all the components to share the same object The persistence that I am concerned about is about the EventListener and less about Context ( which is bound to be specific to be an event for most of the time ). If the code in EventListener cannot store any state (fields ) - it is not so intuitive to the developer of the plugin and not very efficient either. can we quantify that ? Use of the reflection API ( Class.forName() , Class.newInstance() ) in any piece of code that is going to be executed frequently ( delta import in this case ) is hardly the best way to scale. I can come up with a sample unit test case for that - but I think it is moot.
          Hide
          Noble Paul added a comment -

          The persistence that I am concerned about is about the EventListener and less about Context

          The context is the single interaction point for all components with the system. So it is better to be there. DIH does not yet guarantee any state persistence between two imports . If it is a part of the Context API and documented it becomes a standard way of storing data for all components. EventListener cannot be seen as a special component.

          All components are created on a per import basis. If we must cache , then we must think of it for or all the the components

          Show
          Noble Paul added a comment - The persistence that I am concerned about is about the EventListener and less about Context The context is the single interaction point for all components with the system. So it is better to be there. DIH does not yet guarantee any state persistence between two imports . If it is a part of the Context API and documented it becomes a standard way of storing data for all components. EventListener cannot be seen as a special component. All components are created on a per import basis. If we must cache , then we must think of it for or all the the components
          Hide
          Karthik K added a comment -
          All components are created on a per import basis. If we must cache , then we must think of it for or all the the components

          I am not sure if we can apply such generality since Context is bound to be different for different invocations (at least , most of the time) whereas EventListener usually take the form of register / unregister but are not created every time. If EventListener-s are going to be recreated everytime then it would mean that the impl. has no state ( fields ) in it - which seems not so intuitive to the programmer.

          Show
          Karthik K added a comment - All components are created on a per import basis. If we must cache , then we must think of it for or all the the components I am not sure if we can apply such generality since Context is bound to be different for different invocations (at least , most of the time) whereas EventListener usually take the form of register / unregister but are not created every time. If EventListener-s are going to be recreated everytime then it would mean that the impl. has no state ( fields ) in it - which seems not so intuitive to the programmer.
          Hide
          Noble Paul added a comment - - edited

          I am not sure if we can apply such generality since Context is bound to be different for different invocations

          The Context itself will not be same for different invocations but the behavior of the new one is going to be consistent

          imagine we have a scope "dataimporter" and you can set an attribute at that scope by calling

          context.setAttribute("any_name",the_value,"dataimporter");
          //some time later you can get back the same object
          Object the_value =  context.getAttribute("any_name","dataimporter");
          //do my ops here
          

          It is still possible for you to share Objects in a static variable in your EventListener.

          The design is modeled like the servlet API. This is akin to storing and retrieving data from the servletContext,session,request etc .

          Show
          Noble Paul added a comment - - edited I am not sure if we can apply such generality since Context is bound to be different for different invocations The Context itself will not be same for different invocations but the behavior of the new one is going to be consistent imagine we have a scope "dataimporter" and you can set an attribute at that scope by calling context.setAttribute( "any_name" ,the_value, "dataimporter" ); //some time later you can get back the same object Object the_value = context.getAttribute( "any_name" , "dataimporter" ); // do my ops here It is still possible for you to share Objects in a static variable in your EventListener. The design is modeled like the servlet API. This is akin to storing and retrieving data from the servletContext,session,request etc .
          Hide
          Karthik K added a comment -

          I agree with the comment regarding a custom scope for a Context . For those Context-s that need to be reused this could still be in scope before fetching to avoid recreating them again but I am more concerned about recreating EventListener-s .

          It is still possible for you to share Objects in a static variable in your EventListener.
          The design is modeled like the servlet API. This is akin to storing and retrieving data from the servletContext,session,request etc .

          Sharing via static variables does not seem to be the cleanest way to design . (What if there are 2 eventListeners one for start and another for end inheriting from a common class that has shared attributes. Sharing via static variables ( in the base class) brings unpredictable behavior / and a code difficult to maintain . )

          Imagine a requesthandler in the Servlet having no state and being instantiated every time for every request. By recreating EventListener-s , we have a similar analogy . That would make (it already does and I am working on patched version of Solr) development harder to share any state between successive calls.

          Show
          Karthik K added a comment - I agree with the comment regarding a custom scope for a Context . For those Context-s that need to be reused this could still be in scope before fetching to avoid recreating them again but I am more concerned about recreating EventListener-s . It is still possible for you to share Objects in a static variable in your EventListener. The design is modeled like the servlet API. This is akin to storing and retrieving data from the servletContext,session,request etc . Sharing via static variables does not seem to be the cleanest way to design . (What if there are 2 eventListeners one for start and another for end inheriting from a common class that has shared attributes. Sharing via static variables ( in the base class) brings unpredictable behavior / and a code difficult to maintain . ) Imagine a requesthandler in the Servlet having no state and being instantiated every time for every request. By recreating EventListener-s , we have a similar analogy . That would make (it already does and I am working on patched version of Solr) development harder to share any state between successive calls.
          Hide
          Karthik K added a comment -

          Any idea when this patch is going to be committed.

          Show
          Karthik K added a comment - Any idea when this patch is going to be committed.
          Hide
          Noble Paul added a comment -

          hi Kay, I am -1 on giving special treatment to an EventListener. The behavior has to be consistent w/ that of other components.

          Show
          Noble Paul added a comment - hi Kay, I am -1 on giving special treatment to an EventListener. The behavior has to be consistent w/ that of other components.
          Noble Paul made changes -
          Link This issue relates to SOLR-988 [ SOLR-988 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          The rules about modification apply well enough for interfaces and abstract classes. Having an abstract class limits the object hierarchy of the handler unnecessary when it could be more flexible through the interfaces.

          Not really, since you don't necessarily need to keep the new methods abstract, they can be a no-op. Thus you maintain binary compatibility with old code. One doesn't have this luxury with interfaces.

          Imagine a requesthandler in the Servlet having no state and being instantiated every time for every request. By recreating EventListener-s , we have a similar analogy. That would make (it already does and I am working on patched version of Solr) development harder to share any state between successive calls.

          I don't think that is an apples to apples comparison. Servlets are supposed to be called a lot many times per second. Also, containers may create many instances of servlets.

          Use of the reflection API ( Class.forName() , Class.newInstance() ) in any piece of code that is going to be executed frequently ( delta import in this case ) is hardly the best way to scale. I can come up with a sample unit test case for that - but I think it is moot.

          I try to stay away from micro-optimizations (which is what I felt this issue was about) as they don't add any value to the system. One wouldn't even be able to see any measurable performance difference unless one calls delta-imports hundreds of times per second. Even if someone did, the bottleneck would probably be the database rather than creation of event listeners.

          If this issue is more about maintaining state rather than the optimization, can you see if SOLR-988 can solve your problem? Most objects in DIH are created per import request and I'd like to keep it that way. I agree with Noble that EventListener cannot be seen as an exception. There are enough inconsistencies already and I'd like to cut them down rather than add one more exception.

          Show
          Shalin Shekhar Mangar added a comment - The rules about modification apply well enough for interfaces and abstract classes. Having an abstract class limits the object hierarchy of the handler unnecessary when it could be more flexible through the interfaces. Not really, since you don't necessarily need to keep the new methods abstract, they can be a no-op. Thus you maintain binary compatibility with old code. One doesn't have this luxury with interfaces. Imagine a requesthandler in the Servlet having no state and being instantiated every time for every request. By recreating EventListener-s , we have a similar analogy. That would make (it already does and I am working on patched version of Solr) development harder to share any state between successive calls. I don't think that is an apples to apples comparison. Servlets are supposed to be called a lot many times per second. Also, containers may create many instances of servlets. Use of the reflection API ( Class.forName() , Class.newInstance() ) in any piece of code that is going to be executed frequently ( delta import in this case ) is hardly the best way to scale. I can come up with a sample unit test case for that - but I think it is moot. I try to stay away from micro-optimizations (which is what I felt this issue was about) as they don't add any value to the system. One wouldn't even be able to see any measurable performance difference unless one calls delta-imports hundreds of times per second. Even if someone did, the bottleneck would probably be the database rather than creation of event listeners. If this issue is more about maintaining state rather than the optimization, can you see if SOLR-988 can solve your problem? Most objects in DIH are created per import request and I'd like to keep it that way. I agree with Noble that EventListener cannot be seen as an exception. There are enough inconsistencies already and I'd like to cut them down rather than add one more exception.
          Hide
          Shalin Shekhar Mangar added a comment -

          An alternative exists for maintaining state and EventListeners should not be treated in a special manner. Therefore I'm marking this as "Won't Fix".

          Show
          Shalin Shekhar Mangar added a comment - An alternative exists for maintaining state and EventListeners should not be treated in a special manner. Therefore I'm marking this as "Won't Fix".
          Shalin Shekhar Mangar made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]
          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4
          Grant Ingersoll made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Karthik K
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development