Solr
  1. Solr
  2. SOLR-793

set a commit time bounds in the <add> command

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: update
    • Labels:
      None

      Description

      Currently there are two options for how to handle commiting documents:
      1. the client explicitly starts the commit via <commit/>
      2. set an auto commit value on the server – clients can assume all documents will be commited within that time.

      However, this does not help in the case where the clients know what documents need updating quickly and others that could wait. I suggest adding:

       <add commitWithin="100">...
      

      to the update syntax so the client can schedule commits explicitly.

      1. SOLR-793-commitWithin.patch
        8 kB
        Ryan McKinley
      2. SOLR-793-commitWithin.patch
        8 kB
        Ryan McKinley
      3. SOLR-793-deadlock.patch
        1 kB
        Ryan McKinley

        Issue Links

          Activity

          Hide
          Ryan McKinley added a comment -

          Here is a patch that adds the functionality to the AddUpdateCommand. It is only supported in DirectUpdateHandler2.

          The only bits I'm not sure about are the thread synchronization issues in CommitTracker

            public void addedDocument( int commitWithin ) {
                docsSinceCommit++;
                lastAddedTime = System.currentTimeMillis();
                // maxDocs-triggered autoCommit
                if( docsUpperBound > 0 && (docsSinceCommit > docsUpperBound) ) {
                  scheduleCommitWithin( DOC_COMMIT_DELAY_MS );
                }
                
                // maxTime-triggered autoCommit
                long ctime = timeUpperBound;
                if( commitWithin > 0 && (ctime < 0 || commitWithin < ctime) ) {
                  ctime = commitWithin;
                }
                if( ctime > 0 ) {
                  scheduleCommitWithin( ctime );
                }
              }
          

          previously, the code to schedule a commit was inline – now this delegates to a synchronized method. I don't think it is an issue (and tests pass), but I want to make sure that gets double checked

          ------

          Also, any better ideas on what the parameter/argument should be called?

          Show
          Ryan McKinley added a comment - Here is a patch that adds the functionality to the AddUpdateCommand. It is only supported in DirectUpdateHandler2. The only bits I'm not sure about are the thread synchronization issues in CommitTracker public void addedDocument( int commitWithin ) { docsSinceCommit++; lastAddedTime = System .currentTimeMillis(); // maxDocs-triggered autoCommit if ( docsUpperBound > 0 && (docsSinceCommit > docsUpperBound) ) { scheduleCommitWithin( DOC_COMMIT_DELAY_MS ); } // maxTime-triggered autoCommit long ctime = timeUpperBound; if ( commitWithin > 0 && (ctime < 0 || commitWithin < ctime) ) { ctime = commitWithin; } if ( ctime > 0 ) { scheduleCommitWithin( ctime ); } } previously, the code to schedule a commit was inline – now this delegates to a synchronized method. I don't think it is an issue (and tests pass), but I want to make sure that gets double checked ------ Also, any better ideas on what the parameter/argument should be called?
          Hide
          Ryan McKinley added a comment -

          no real change, updating for /trunk

          I'd like to commit this soon, but would love for someone with better threading chops to look it over first

          Show
          Ryan McKinley added a comment - no real change, updating for /trunk I'd like to commit this soon, but would love for someone with better threading chops to look it over first
          Hide
          Noble Paul added a comment - - edited

          I somehow am confused with the usecase

          <add commitWithin="100">...
          

          If I have 10 docs with with a commitWithin="100" This may mean that there will be 10 commits because each <add> happened at different time.

          Wouldn't the user want to commit all the important documents at once . or wouldn't they want to commit tsay doc x,y,z first and then the rest of the docs?

          Show
          Noble Paul added a comment - - edited I somehow am confused with the usecase <add commitWithin= "100" >... If I have 10 docs with with a commitWithin="100" This may mean that there will be 10 commits because each <add> happened at different time. Wouldn't the user want to commit all the important documents at once . or wouldn't they want to commit tsay doc x,y,z first and then the rest of the docs?
          Hide
          Ryan McKinley added a comment -

          perhaps it is better to think of it as selectivly enabling the autoCommit feature.

          If I have 10 docs with with a commitWithin="100" This may mean that there will be 10 commits because each <add> happened at different time.

          assuming the last of the 10 docs were sent to solr within 100ms, then all of them would be committed at once. The commtWithin time is a maximum time, not a minimum.

          This is identical to how autoCommit works now – the advantage is that various documents could require different time bounds.

          Show
          Ryan McKinley added a comment - perhaps it is better to think of it as selectivly enabling the autoCommit feature. If I have 10 docs with with a commitWithin="100" This may mean that there will be 10 commits because each <add> happened at different time. assuming the last of the 10 docs were sent to solr within 100ms, then all of them would be committed at once. The commtWithin time is a maximum time, not a minimum. This is identical to how autoCommit works now – the advantage is that various documents could require different time bounds.
          Hide
          Mike Klaas added a comment -

          Hey Ryan,

          I think this is good functionality and will take a look at the synchro stuff in the next day or so. I feel somewhat reponsible, being the one who inflicted it on everyone

          Show
          Mike Klaas added a comment - Hey Ryan, I think this is good functionality and will take a look at the synchro stuff in the next day or so. I feel somewhat reponsible, being the one who inflicted it on everyone
          Hide
          Ryan McKinley added a comment -

          I feel somewhat reponsible, being the one who inflicted it on everyone

          Hey, I claim equal responsibility

          Show
          Ryan McKinley added a comment - I feel somewhat reponsible, being the one who inflicted it on everyone Hey, I claim equal responsibility
          Hide
          Mike Klaas added a comment -

          I don't see any issue with the code: adddedDocument is always called within a synchronized context anyway, after all.

          One question: right now you have it set to use the minimum of autocommit/maxTime and commitWithin on the update command. Might it be better to always use commitWithin, even if it greater than a specified maxTime? This would allow the insertion of "less important than normal" docs (right now, it seems only useful for the "more important" case)

          Show
          Mike Klaas added a comment - I don't see any issue with the code: adddedDocument is always called within a synchronized context anyway, after all. One question: right now you have it set to use the minimum of autocommit/maxTime and commitWithin on the update command. Might it be better to always use commitWithin, even if it greater than a specified maxTime? This would allow the insertion of "less important than normal" docs (right now, it seems only useful for the "more important" case)
          Hide
          Ryan McKinley added a comment -

          Thanks for checking this out Mike.

          I changed max time logic to use:

            long ctime = (commitWithin>0) ? commitWithin : timeUpperBound;
            if( ctime > 0 ) {
              scheduleCommitWithin( ctime );
            }
          

          This way, if you have a timeUpperBound set, it will use the passed in argument rather then the minimum of the two times.

          Show
          Ryan McKinley added a comment - Thanks for checking this out Mike. I changed max time logic to use: long ctime = (commitWithin>0) ? commitWithin : timeUpperBound; if ( ctime > 0 ) { scheduleCommitWithin( ctime ); } This way, if you have a timeUpperBound set, it will use the passed in argument rather then the minimum of the two times.
          Hide
          Ryan McKinley added a comment -

          This is a fix to possible deadlock caused by the previous patch.

          Show
          Ryan McKinley added a comment - This is a fix to possible deadlock caused by the previous patch.
          Hide
          Noble Paul added a comment -

          The XML API looks a bit ugly.
          Why do we need to add this like this <add commitWithin="6">

          One update request will have only one <add> . So it could have very well been added to the request params like waitFlush and waitSearcher.

          If at all it has to be kept in the xml it should be on a per document .

          The SolrInputDocument can have a 'commitWithin' attribute.

          Show
          Noble Paul added a comment - The XML API looks a bit ugly. Why do we need to add this like this <add commitWithin="6"> One update request will have only one <add> . So it could have very well been added to the request params like waitFlush and waitSearcher. If at all it has to be kept in the xml it should be on a per document . The SolrInputDocument can have a 'commitWithin' attribute.
          Hide
          Ryan McKinley added a comment -
          One update request will have only one <add>

          That is the existing limitation, but i don't see any reason there could not be multiple <add> statements within one request. similar to how we have multiple delete commands in one statement.

          Our existing parser supports this already, only we would need to add a new root element. This would allow a streaming client to post all commands sequentially to the server.

          The SolrInputDocument can have a 'commitWithin' attribute.

          I don't like that because the 'commitWithin' attribute is about the command, not the data. Attaching it to the 'add' command seems like the logical place for it.

          Show
          Ryan McKinley added a comment - One update request will have only one <add> That is the existing limitation, but i don't see any reason there could not be multiple <add> statements within one request. similar to how we have multiple delete commands in one statement. Our existing parser supports this already, only we would need to add a new root element. This would allow a streaming client to post all commands sequentially to the server. The SolrInputDocument can have a 'commitWithin' attribute. I don't like that because the 'commitWithin' attribute is about the command, not the data. Attaching it to the 'add' command seems like the logical place for it.
          Hide
          Noble Paul added a comment -

          we already have all the other parameters passed on as a request parameter , whether we like it or not

          Why don't we just stick to the convention.

          Show
          Noble Paul added a comment - we already have all the other parameters passed on as a request parameter , whether we like it or not Why don't we just stick to the convention.
          Hide
          Ryan McKinley added a comment -
          we already have all the other parameters passed on as a request parameter

          really? what about:

          public static final String OVERWRITE = "overwrite";
          public static final String OVERWRITE_COMMITTED = "overwriteCommitted";
          public static final String OVERWRITE_PENDING = "overwritePending";
          public static final String ALLOW_DUPS = "allowDups";

          they live with the <add>

          Show
          Ryan McKinley added a comment - we already have all the other parameters passed on as a request parameter really? what about: public static final String OVERWRITE = "overwrite"; public static final String OVERWRITE_COMMITTED = "overwriteCommitted"; public static final String OVERWRITE_PENDING = "overwritePending"; public static final String ALLOW_DUPS = "allowDups"; they live with the <add>
          Hide
          Ryan McKinley added a comment -

          Also, note that that XMLUpdate#processUpdate() does not have access to the request parameters while the add commands are being processed....

          Show
          Ryan McKinley added a comment - Also, note that that XMLUpdate#processUpdate() does not have access to the request parameters while the add commands are being processed....
          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4

            People

            • Assignee:
              Unassigned
              Reporter:
              Ryan McKinley
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development