Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Private Beta
-
None
-
None
Description
When starting up the Kudu cluster on bolt80 I often see a couple nodes crash with errors like:
F0902 10:12:03.816699 37993 leader_election.cc:157] Check failed: _s.ok() Bad status: Network error: Unable to resolve address 'e1313.halxg.cloudera.com': Name or service not known
I'm guessing that we end up producing a DNS storm of some kind, and this somehow causes us to get some incorrect "host not found" errors. We shouldn't crash the whole process.