Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.2.2, 2.4.0
-
None
Description
We rely heavily on preemptible worker machines in GCP/GCE. These machines disappear without closing the TCP connections to the master which increases the number of established connections and new workers can not connect because of "Too many open files" on the master.
To solve the problem we need to enable TCP keep alive for the RPC connections to the master but it's not possible to do so via configuration.