Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14356

PeerSync should not fail with SocketTimeoutException from hanging nodes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 8.6, 9.0
    • None
    • None

    Description

      Right now in PeerSync (during leader election), in case of exception on requesting versions to a node, we will skip that node if exception is one the following type

      • ConnectTimeoutException
      • NoHttpResponseException
      • SocketException
        Sometime the other node basically hang but still accept connection. In that case SocketTimeoutException is thrown and we consider the PeerSync process as failed and the whole shard just basically leaderless forever (as long as the hang node still there).

      We can't just blindly adding SocketTimeoutException to above list, since shalin mentioned that sometimes timeout can happen because of genuine reasons too e.g. temporary GC pause.
      I think the general idea here is we obey leaderVoteWait restriction and retry doing sync with others in case of connection/timeout exception happen.

      Attachments

        1. SOLR-14356.patch
          1 kB
          Cao Manh Dat
        2. SOLR-14356.patch
          1 kB
          Cao Manh Dat

        Activity

          People

            caomanhdat Cao Manh Dat
            caomanhdat Cao Manh Dat
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: