Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-9738

Client node can suddenly fail on start

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8
    • None
    • None

    Description

      If client joining to large topology it can to spend some time on waiting TcpDiscoveryNodeAddFinishedMessage, but in that time it can not to send TcpDiscoveryClientMetricsUpdateMessage. By that reason server can to reset client from topology.

      Client node considered as unreachable and will be dropped from cluster, because no metrics update messages received in interval: TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by network problems or long GC pause on client node, try to increase this parameter. [nodeId=a3493895-7c13-403c-bf0d-94eab0000011, clientFailureDetectionTimeout=30000];
      
      

      We should to sent TcpDiscoveryClientMetricsUpdateMessage as soon as possible without, waiting finish of join procedure.

      Attachments

        Issue Links

          Activity

            People

              v.pyatkov Vladislav Pyatkov
              v.pyatkov Vladislav Pyatkov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: