Solr
  1. Solr
  2. SOLR-938

DataImportHandler: Add close hooks to the completion of a full-import process

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Labels:
      None
    • Environment:

      JRE 6, Tomcat 6

      Description

      Adding a new feature that we need for continuation of a workflow based on a full-import procedure. When a full-import completes , the pre-defined hook method is called at the end of the same.

      Implementations that need to notified , need to implement the new interface - DataImportCloseHook and register themselves in the data config file as follows.

      <dataConfig>

      .
      .
      .
      .
      <closeHook type="com.myorg.fullimporter.closeHookImpl1" />
      <closeHook type="com.myorg.fullimporter.closeHookImpl2" />
      </dataConfig>

      A single instance of them is created (as part of DataImporter) during instantiation and the same insance is reused across multiple invocations.

      Since the dataImporter is protected by a thread-lock - there is no thred-safe guarantee for the implementations of the full-import close hook.

      1. SOLR-938.patch
        6 kB
        Shalin Shekhar Mangar
      2. SOLR-938.patch
        9 kB
        Noble Paul
      3. SOLR-938.patch
        7 kB
        Noble Paul
      4. SOLR-938.patch
        10 kB
        Karthik K

        Issue Links

          Activity

          Hide
          Karthik K added a comment -

          DataImportCloseHook.java : New interface to be implemented for handling closing hooks

          DataImporter.java / DataConfig.java : add new close hook interfaces.

          TestDocBuilder.java : have a sample schema (optional closehook type=" " ) and let TestDocBuilder parse the same.

          Show
          Karthik K added a comment - DataImportCloseHook.java : New interface to be implemented for handling closing hooks DataImporter.java / DataConfig.java : add new close hook interfaces. TestDocBuilder.java : have a sample schema (optional closehook type=" " ) and let TestDocBuilder parse the same.
          Hide
          Shalin Shekhar Mangar added a comment -

          Kay, you can also extend JdbcDataSource, override the close method and use this data source for your root entity.

          Show
          Shalin Shekhar Mangar added a comment - Kay, you can also extend JdbcDataSource, override the close method and use this data source for your root entity.
          Hide
          Karthik K added a comment -

          I am a little bit apprehensive about inheriting from JdbcDataSource since inheritance brings in tighter coupling. (The most important assumption about JdbcDataSource overriding is that the sub class' close() method calls super.close() - to avoid resource leaks).

          And also - doing a full-import a given query is different from closing the jdbc connection altogether ( even though the impl. might converge in their meaning ).

          Instead - from a workflow perspective - all I need is a pluggable independent hook without disrupting DIH but a notification from the same after its completion to move onto the next.

          Show
          Karthik K added a comment - I am a little bit apprehensive about inheriting from JdbcDataSource since inheritance brings in tighter coupling. (The most important assumption about JdbcDataSource overriding is that the sub class' close() method calls super.close() - to avoid resource leaks). And also - doing a full-import a given query is different from closing the jdbc connection altogether ( even though the impl. might converge in their meaning ). Instead - from a workflow perspective - all I need is a pluggable independent hook without disrupting DIH but a notification from the same after its completion to move onto the next.
          Hide
          Noble Paul added a comment - - edited

          I wish to recommend the following. It is better not to limit this to close event. I try to minimize the no:of public interfaces. DIH is designed keeping non-java users also in mind. As far as possible users should not be forced to use java.

          It can better be called an EventListener

          public interface EventListener {
              public void onEvent(Context ctx );    
          }
          

          It could be registered as something like

          <document onImportEnd="script:endImport" onImportStart="com.foo.Foo" >
           <!-- entitiies go here-->.
          </document>
          

          Single method interface is preferred because it should be possible to write the implementation in scripting languages

          Show
          Noble Paul added a comment - - edited I wish to recommend the following. It is better not to limit this to close event. I try to minimize the no:of public interfaces. DIH is designed keeping non-java users also in mind. As far as possible users should not be forced to use java. It can better be called an EventListener public interface EventListener { public void onEvent(Context ctx ); } It could be registered as something like <document onImportEnd= "script:endImport" onImportStart= "com.foo.Foo" > <!-- entitiies go here-->. </document> Single method interface is preferred because it should be possible to write the implementation in scripting languages
          Hide
          Noble Paul added a comment -

          script is not supported

          Show
          Noble Paul added a comment - script is not supported
          Hide
          Karthik K added a comment -

          Noble - In the revised patch - did you mean to add EventListener to the source control as well since it would not compile at this point.

          Also - with invokeEvent - I believe it might be better to reuse the EventListener instance instead of recreating them again, for every notification, since the ClassLoading and the instance creation might be expensive (especially Class Loading). Storing the information during the runtime might be useful for various event notification purposes.

          Show
          Karthik K added a comment - Noble - In the revised patch - did you mean to add EventListener to the source control as well since it would not compile at this point. Also - with invokeEvent - I believe it might be better to reuse the EventListener instance instead of recreating them again, for every notification, since the ClassLoading and the instance creation might be expensive (especially Class Loading). Storing the information during the runtime might be useful for various event notification purposes.
          Hide
          Noble Paul added a comment -

          I missed the file. this patch has it

          Also - with invokeEvent - I believe it might be better to reuse the EventListener instance instead of recreating them again, for every notification, since the ClassLoading ...

          I guess caching is pointless for a class which is used only twice for a full import

          Show
          Noble Paul added a comment - I missed the file. this patch has it Also - with invokeEvent - I believe it might be better to reuse the EventListener instance instead of recreating them again, for every notification, since the ClassLoading ... I guess caching is pointless for a class which is used only twice for a full import
          Hide
          Karthik K added a comment -
          I guess caching is pointless for a class which is used only twice for a full import

          We have a much larger query , for which we do a full-import . We split the data by the primary key range and perform multiple smaller 'full-import's consolidated together to form a full-import . Hence the necessity for retaining the instance for the event handlers.

          But I am ok with tracking that with a separate jira - if needed. When can we get this committed the current patch onto the trunk for now.

          Show
          Karthik K added a comment - I guess caching is pointless for a class which is used only twice for a full import We have a much larger query , for which we do a full-import . We split the data by the primary key range and perform multiple smaller 'full-import's consolidated together to form a full-import . Hence the necessity for retaining the instance for the event handlers. But I am ok with tracking that with a separate jira - if needed. When can we get this committed the current patch onto the trunk for now.
          Hide
          Shalin Shekhar Mangar added a comment -
          1. Bringing patch in sync with trunk.
          2. Added a test case

          I plan to commit shortly.

          Show
          Shalin Shekhar Mangar added a comment - Bringing patch in sync with trunk. Added a test case I plan to commit shortly.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 732695.

          Thanks Kay and Noble!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 732695. Thanks Kay and Noble!
          Hide
          Karthik K added a comment -

          Thanks Shalin / Noble.

          Full Import and Delta Import are functionally 2 different things. We probably need a way to distinguish the event handlers as onFullImportStart/End and onDeltaImportStart/End since they represent different events. May be there could be a separate jira to track this.

          Show
          Karthik K added a comment - Thanks Shalin / Noble. Full Import and Delta Import are functionally 2 different things. We probably need a way to distinguish the event handlers as onFullImportStart/End and onDeltaImportStart/End since they represent different events. May be there could be a separate jira to track this.
          Hide
          Noble Paul added a comment - - edited

          =Context#currentProcess() can tell you what is going on now.

          Show
          Noble Paul added a comment - - edited =Context#currentProcess() can tell you what is going on now.

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Karthik K
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development