Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11459

AddUpdateCommand#prevVersion is not cleared which may lead to problem for in-place updates of non existed documents

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 7.0
    • 7.3, 8.0
    • SolrCloud
    • None

    Description

      I have a 1_shard / m_replicas SolrCloud cluster with Solr 6.6.0 and run batches of 5 - 10k in-place updates from time to time.
      Once I noticed that job "hangs" - it started and couldn't finish for a a while.
      Logs were full of messages like:

       Missing update, on which current in-place update depends on, hasn't arrived. id=__, looking for version=___, last found version=0"  
       
      Tried to fetch document ___ from the leader, but the leader says document has been deleted. Deleting the document here and skipping this update: Last found version: 0, was looking for: ___",24,0,"but the leader says document has been deleted. Deleting the document here and skipping this update: Last found version: 0
      

      Further analysis shows that:

      • There are 100-500 updates for non-existed documents among other updates (something that I have to deal with)
      • Leader receives bunch of updates and executes this updates one by one. JavabinLoader which is used by processing documents reuses same instance of AddUpdateCommand for every update and just clearing its state at the end. Field AddUpdateCommand#prevVersion is not cleared.
      • In case of update is in-place update, but specified document does not exist, this update is processed as a regular atomic update (i.e. new doc is created), but prevVersion is used as a distrib.inplace.prevversion parameter in sequential calls to every slave in DistributedUpdateProcessor. prevVersion wasn't cleared, so it may contain version from previous processed update.
      • Slaves checks it's own version of documents which is 0 (cause doc does not exist), slave thinks that some updates were missed and spends 5 seconds in DistributedUpdateProcessor#waitForDependentUpdates waiting for missed updates (no luck) and also tries to get "correct" version from leader (no luck as well)
      • So update for non existed document costs m * 5 sec each

      I workarounded this by explicit check of doc existence, but it probably should be fixed.

      Obviously first guess is that prevVersion should be cleared in AddUpdateCommand#clear, but have no clue how to test it.

      +++ solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java	(revision )
      @@ -78,6 +78,7 @@
            updateTerm = null;
            isLastDocInBatch = false;
            version = 0;
      +     prevVersion = -1;
          }
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mkhl Mikhail Khludnev
            werder Andrey Kudryavtsev
            Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                Slack

                  Issue deployment