Solr
  1. Solr
  2. SOLR-1426

Allow delta-import to run continously until aborted

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.4
    • Fix Version/s: None
    • Labels:
      None

      Description

      Modify the delta-import so that it takes a perpetual flag that makes it run continuously until its aborted.

      http://localhost:8985/solr/select/?command=delta-import&clean=false&qt=/dataimport&commit=true&perpetual=true

      perpetual means the delta import will keep running and pause for a few seconds when running queries.

      The only way to stop delta import will be to explicitly issue an abort like so:-

      http://localhost:8985/solr/tickets/select/?command=abort

        Activity

        Hide
        James Dyer added a comment -

        Closing for now as this doesn't seem like the best approach to handling NRT with DIH. Can be reopened if someone wants to pursue this again.

        Show
        James Dyer added a comment - Closing for now as this doesn't seem like the best approach to handling NRT with DIH. Can be reopened if someone wants to pursue this again.
        Hide
        Abdul Chaudhry added a comment -

        I would avoid using the DIH for incremental updates. You need to be careful synchronizing with an eventually consistent database. I would go to the 'source' that manages the updates/inserts before they even get to the database and push to SOLR from the source.

        Show
        Abdul Chaudhry added a comment - I would avoid using the DIH for incremental updates. You need to be careful synchronizing with an eventually consistent database. I would go to the 'source' that manages the updates/inserts before they even get to the database and push to SOLR from the source.
        Hide
        James Dyer added a comment -

        This sort of thing is needed for sure, especially now that we have such good NRT support in 4.0. But the patch here is shortsighted as it works only with the delta import command (you can do incremental updates with "full-import" ; often its the better choice). I'm also not sure I like the approach of putting the DIH handler thread in a perpetual loop and having it sleep a few seconds in between each iteration.

        Unless someone objects, I want to mark this as "won't fix/duplicate" and I think we need to work on SOLR-2305 or something like it instead.

        Show
        James Dyer added a comment - This sort of thing is needed for sure, especially now that we have such good NRT support in 4.0. But the patch here is shortsighted as it works only with the delta import command (you can do incremental updates with "full-import" ; often its the better choice). I'm also not sure I like the approach of putting the DIH handler thread in a perpetual loop and having it sleep a few seconds in between each iteration. Unless someone objects, I want to mark this as "won't fix/duplicate" and I think we need to work on SOLR-2305 or something like it instead.
        Hide
        Hoss Man added a comment -
        • There is no indication that anyone is actively working on this issue, so removing 4.0 from the fixVersion.
        • Assigning in hopes he can assess the current patch to possible revisit the issue
        Show
        Hoss Man added a comment - There is no indication that anyone is actively working on this issue, so removing 4.0 from the fixVersion. Assigning in hopes he can assess the current patch to possible revisit the issue
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Hoss Man added a comment -

        bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

        Show
        Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
        Hide
        Robert Muir added a comment -

        3.4 -> 3.5

        Show
        Robert Muir added a comment - 3.4 -> 3.5
        Hide
        Robert Muir added a comment -

        Bulk move 3.2 -> 3.3

        Show
        Robert Muir added a comment - Bulk move 3.2 -> 3.3
        Hide
        Hoss Man added a comment -

        Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

        A unique token for finding these 240 issues in the future: hossversioncleanup20100527

        Show
        Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
        Hide
        Noble Paul added a comment -

        $

        {dataimporter.last_index_time}

        also should work

        Show
        Noble Paul added a comment - $ {dataimporter.last_index_time} also should work
        Hide
        Abdul Chaudhry added a comment -

        The SOLR-783 fix seems to force you to use the entity name with the last_index_time

        My fix for this was to change the deltaQuery like so :-

        WHERE updated_at > DATE_SUB('$

        {dataimporter.[name of entity].last_index_time}

        ',INTERVAL 10 SECOND)

        Show
        Abdul Chaudhry added a comment - The SOLR-783 fix seems to force you to use the entity name with the last_index_time My fix for this was to change the deltaQuery like so :- WHERE updated_at > DATE_SUB('$ {dataimporter.[name of entity].last_index_time} ',INTERVAL 10 SECOND)
        Hide
        Abdul Chaudhry added a comment -

        NOTE: the last_index_time is broken with the perpetual patch

        I hacked around this by changing the data-config.xml file for the deltaQuery to do something like this:-

        WHERE updated_at > DATE_SUB('$

        {dataimporter.last_index_time}',INTERVAL 10 SECOND)

        This is because of the time discrepancy between the sleep and the writers last_index_time.

        However, it looks like the delta-import is broken in the latest build of solr trunk revision 820731. It looks like the lastIndexTime in the DataImporter is not populated after a delta and so if you used ${dataimporter.last_index_time}

        then the deltaQuery uses the wrong time.

        I am going to wait until delta-import is fixed before I update a patch.

        Show
        Abdul Chaudhry added a comment - NOTE: the last_index_time is broken with the perpetual patch I hacked around this by changing the data-config.xml file for the deltaQuery to do something like this:- WHERE updated_at > DATE_SUB('$ {dataimporter.last_index_time}',INTERVAL 10 SECOND) This is because of the time discrepancy between the sleep and the writers last_index_time. However, it looks like the delta-import is broken in the latest build of solr trunk revision 820731. It looks like the lastIndexTime in the DataImporter is not populated after a delta and so if you used ${dataimporter.last_index_time} then the deltaQuery uses the wrong time. I am going to wait until delta-import is fixed before I update a patch.
        Hide
        Abdul Chaudhry added a comment -

        The perpetual option only makes sense for one command; that is the delta-import command. I could not see a compelling use case for using perpetual with any other command.

        The abort should stop any in-flight delta-import which is the current behaviour with the patch.

        The sleep interval should be set using something like "perpetual.delay" and default to a reasonable value such as 3 secs.

        Show
        Abdul Chaudhry added a comment - The perpetual option only makes sense for one command; that is the delta-import command. I could not see a compelling use case for using perpetual with any other command. The abort should stop any in-flight delta-import which is the current behaviour with the patch. The sleep interval should be set using something like "perpetual.delay" and default to a reasonable value such as 3 secs.
        Hide
        Noble Paul added a comment -

        I am not agiuanst the idea itself. I am just calling for a consensus. This can be something we can consider.
        There are a few things to consider.

        • the time interval as you mentioned ,
        • There should be a way to stop any perpetual operation (without aborting the existing one)
        • it should not be just for one command .It should be independent of the command name
        Show
        Noble Paul added a comment - I am not agiuanst the idea itself. I am just calling for a consensus. This can be something we can consider. There are a few things to consider. the time interval as you mentioned , There should be a way to stop any perpetual operation (without aborting the existing one) it should not be just for one command .It should be independent of the command name
        Hide
        Abdul Chaudhry added a comment -

        You can run a crontab every minute but I need near real-time changes mirrored from a set of tables in a database to a search index.

        You should be aware that Lucene 2.9 includes what it calls near realtime search capabilities and if you include these into solr 1.4 then the use case for delta-import will probably change from running every few hours and minutes (which is probably what you are used to right now) and quickly move to running every few seconds. In that case running a crontab every minute is too long to wait and writing a script to call curl every few seconds will seem like an excessive use of system resources.

        So, in answer to your question, it's probably is not a common use case now but with lucene 2.9 it will become a common use case.

        Anyway, Its your call - take it or leave it.

        Show
        Abdul Chaudhry added a comment - You can run a crontab every minute but I need near real-time changes mirrored from a set of tables in a database to a search index. You should be aware that Lucene 2.9 includes what it calls near realtime search capabilities and if you include these into solr 1.4 then the use case for delta-import will probably change from running every few hours and minutes (which is probably what you are used to right now) and quickly move to running every few seconds. In that case running a crontab every minute is too long to wait and writing a script to call curl every few seconds will seem like an excessive use of system resources. So, in answer to your question, it's probably is not a common use case now but with lucene 2.9 it will become a common use case. Anyway, Its your call - take it or leave it.
        Hide
        Noble Paul added a comment -

        isn't it easily achieved by a cron job which continuously fires a delta-import?

        is this a common enough usecase which requires to make it a part of DIH

        anyway I shall move it to 1.5 , because we are only doing ug-fixes for 1.4 now

        Show
        Noble Paul added a comment - isn't it easily achieved by a cron job which continuously fires a delta-import? is this a common enough usecase which requires to make it a part of DIH anyway I shall move it to 1.5 , because we are only doing ug-fixes for 1.4 now
        Hide
        Abdul Chaudhry added a comment -

        Uploaded a patch that implements this feature.

        Ran all unit tests on my tree and they pass.

        The only thing I have hard-coded is the sleep interval which is :-
        Thread.sleep(3000)

        This should probably be configurable.

        Show
        Abdul Chaudhry added a comment - Uploaded a patch that implements this feature. Ran all unit tests on my tree and they pass. The only thing I have hard-coded is the sleep interval which is :- Thread.sleep(3000) This should probably be configurable.

          People

          • Assignee:
            James Dyer
            Reporter:
            Abdul Chaudhry
          • Votes:
            4 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development