Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13945

SPLITSHARD data loss due to "rollback"

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 8.4
    • Component/s: None
    • Labels:
      None

      Description

      1. As per SOLR-7673, there is a commit on the parent shard after state changes have happened, i.e. from active/construction/construction to inactive/active/active. Please see https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java#L586-L588
      2. Due to SOLR-12509, there's now a cleanup/rollback method called "cleanupAfterFailure" in the finally block that resets the state to active/construction/construction. Please see: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java#L657
      3. When 2 is entered into due to a failure in 1, we have a situation where any documents that went into the subshards (because they are already active by now) are now lost after the parent becomes active.

      If my above understanding is correct, I am wondering:

      1. Why is a commit to parent shard needed after the parent shard is inactive, subshards are now active and the split operation has completed?
      2. This rollback looks very suspicious. If state of subshards is already active and parent is inactive, then what is the need for setting them back to construction? Seems like a crucial check is missing there. Also, why do we reset the subshard status back to construction instead of inactive? It is extremely misleading (and, frankly, ridiculous) for any external clusterstate monitoring tools to see the subshards to go from CONSTRUCTION to ACTIVE to CONSTRUCTION and then the subshard disappearing.

        Attachments

        1. SOLR-13945.patch
          0.9 kB
          Ishan Chattopadhyaya
        2. SOLR-13945.patch
          0.8 kB
          Ishan Chattopadhyaya
        3. SOLR-13945.patch
          1 kB
          Andrzej Bialecki

          Activity

            People

            • Assignee:
              ichattopadhyaya Ishan Chattopadhyaya
              Reporter:
              ichattopadhyaya Ishan Chattopadhyaya
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: