Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11459

AddUpdateCommand#prevVersion is not cleared which may lead to problem for in-place updates of non existed documents

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 7.0
    • Fix Version/s: 7.3, 8.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      I have a 1_shard / m_replicas SolrCloud cluster with Solr 6.6.0 and run batches of 5 - 10k in-place updates from time to time.
      Once I noticed that job "hangs" - it started and couldn't finish for a a while.
      Logs were full of messages like:

       Missing update, on which current in-place update depends on, hasn't arrived. id=__, looking for version=___, last found version=0"  
       
      Tried to fetch document ___ from the leader, but the leader says document has been deleted. Deleting the document here and skipping this update: Last found version: 0, was looking for: ___",24,0,"but the leader says document has been deleted. Deleting the document here and skipping this update: Last found version: 0
      

      Further analysis shows that:

      • There are 100-500 updates for non-existed documents among other updates (something that I have to deal with)
      • Leader receives bunch of updates and executes this updates one by one. JavabinLoader which is used by processing documents reuses same instance of AddUpdateCommand for every update and just clearing its state at the end. Field AddUpdateCommand#prevVersion is not cleared.
      • In case of update is in-place update, but specified document does not exist, this update is processed as a regular atomic update (i.e. new doc is created), but prevVersion is used as a distrib.inplace.prevversion parameter in sequential calls to every slave in DistributedUpdateProcessor. prevVersion wasn't cleared, so it may contain version from previous processed update.
      • Slaves checks it's own version of documents which is 0 (cause doc does not exist), slave thinks that some updates were missed and spends 5 seconds in DistributedUpdateProcessor#waitForDependentUpdates waiting for missed updates (no luck) and also tries to get "correct" version from leader (no luck as well)
      • So update for non existed document costs m * 5 sec each

      I workarounded this by explicit check of doc existence, but it probably should be fixed.

      Obviously first guess is that prevVersion should be cleared in AddUpdateCommand#clear, but have no clue how to test it.

      +++ solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java	(revision )
      @@ -78,6 +78,7 @@
            updateTerm = null;
            isLastDocInBatch = false;
            version = 0;
      +     prevVersion = -1;
          }
      

        Attachments

        1. SOLR-11459.patch
          4 kB
          Mikhail Khludnev

          Issue Links

            Activity

              People

              • Assignee:
                mkhl Mikhail Khludnev
                Reporter:
                werder Andrey Kudryavtsev
              • Votes:
                3 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h