Solr
  1. Solr
  2. SOLR-3126

We should try to do a quick sync on std start up recovery before trying to do a full blown replication.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      just more efficient - especially on cluster shutdown/start where the replicas may all be up to date and match anway.

      1. SOLR-3126.patch
        13 kB
        Mark Miller
      2. SOLR-3126.patch
        6 kB
        Mark Miller

        Issue Links

          Activity

          Mark Miller created issue -
          Hide
          Mark Miller added a comment -

          Current WIP.

          Still trying to track down an issue around FullSolrCloudTest#brindDownShardIndexSomeDocsAndRecover

          Show
          Mark Miller added a comment - Current WIP. Still trying to track down an issue around FullSolrCloudTest#brindDownShardIndexSomeDocsAndRecover
          Mark Miller made changes -
          Field Original Value New Value
          Attachment SOLR-3126.patch [ 12514311 ]
          Hide
          Mark Miller added a comment -

          Whoops - was not building the leader url correctly - fixed. I'll commit this soon.

          Show
          Mark Miller added a comment - Whoops - was not building the leader url correctly - fixed. I'll commit this soon.
          markrmiller committed 1244177 (2 files)
          Reviews: none

          SOLR-3126: We should try to do a quick sync on std start up recovery before trying to do a full blown replication.

          Hide
          Mark Miller added a comment -

          Alright, this is in.

          Show
          Mark Miller added a comment - Alright, this is in.
          Mark Miller made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Mark Miller added a comment -

          Actually I should probably do one more thing here - wait to start sync until we are sure the leader sees as recovering.

          Show
          Mark Miller added a comment - Actually I should probably do one more thing here - wait to start sync until we are sure the leader sees as recovering.
          Mark Miller made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Mark Miller added a comment -

          path for this - I stop committing in the prep recovery cmd so that it can be used also in the sync case - in the replicate case, we do a prep recovery cmd then an explicit commit

          Show
          Mark Miller added a comment - path for this - I stop committing in the prep recovery cmd so that it can be used also in the sync case - in the replicate case, we do a prep recovery cmd then an explicit commit
          Mark Miller made changes -
          Attachment SOLR-3126.patch [ 12514564 ]
          markrmiller committed 1244281 (1 file)
          yonik committed 1244806 (2 files)
          Hide
          Mark Miller added a comment -

          Hmm...somehow this has made regular replication recovery unstable in some situations (fairly often on apache jenkins, less often locally)...trying to figure out where/how.

          Show
          Mark Miller added a comment - Hmm...somehow this has made regular replication recovery unstable in some situations (fairly often on apache jenkins, less often locally)...trying to figure out where/how.
          Hide
          Mark Miller added a comment -

          I think ive made some progress on tracking this down. It looks like perhaps the 4 second wait we do to make sure no updates are still finishing that started seeing stale state might not be long enough after some stuff was rearranged. Boosting that wait is getting me better results - still testing though.

          Show
          Mark Miller added a comment - I think ive made some progress on tracking this down. It looks like perhaps the 4 second wait we do to make sure no updates are still finishing that started seeing stale state might not be long enough after some stuff was rearranged. Boosting that wait is getting me better results - still testing though.
          yonik committed 1290938 (2 files)
          yonik committed 1290941 (1 file)
          Reviews: none

          SOLR-3126: restore old deletes via tlog so peersync won't reorder

          yonik committed 1291003 (1 file)
          Reviews: none

          SOLR-3126: test changes to handle deletes surviving restart

          yonik committed 1291350 (4 files)
          Reviews: none

          SOLR-3126: get num versions from updatelog, fail sync if comm fail when retrieving updates, use starting versions if syncing aftger startup, only sync first time in recovery loop, more sync logging

          Yonik Seeley made changes -
          Link This issue is blocked by SOLR-3157 [ SOLR-3157 ]
          Hide
          Yonik Seeley added a comment -

          IMO the best way forward on this issue is to get sane logging so we can figure out what's happening to what core.

          Show
          Yonik Seeley added a comment - IMO the best way forward on this issue is to get sane logging so we can figure out what's happening to what core.
          Mark Miller made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Mark Miller
              Reporter:
              Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development