Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10190

libprocess fails with "Failed to obtain the IP address for <uuid>" when using CNI on some hosts

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • 1.9.0
    • None
    • executor
    • None

    Description

      Hello,

       

      We deployed CNI support and 3 of our hosts (all the same) are failing to start container with CNI enabled. The log file is:

      E0917 16:58:11.481551 16770 process.cpp:1153] EXIT with status 1: Failed to obtain the IP address for '7c4beac7-5385-4dfa-845a-beb01e13c77c'; the DNS service may not be able to resolve it: Name or service not known

      So I tried enforcing LIBPROCESS_IP using env variable but I saw Mesos overwrites it. So I rebuilt Mesos with additionnal debugging and here is the log:

      Overwriting environment variable 'LIBPROCESS_IP' from '10.99.50.3' to '0.0.0.0'
      E0917 16:34:49.779429 31428 process.cpp:1153] EXIT with status 1: Failed to obtain the IP address for 'de65bbd8-b237-4884-ba87-7e13cb85078f'; the DNS service may not be able to resolve it: Name or service not known

      According to the code, it's expected to be set to 0.0.0.0 (MESOS-5127). So I tried to understand why libprocess attempts to resolve a container run uuid instead of the hostname, here is libprocess code:

       

      // Resolve the hostname if ip is 0.0.0.0 in case we actually have
       // a valid external IP address. Note that we need only one IP
       // address, so that other processes can send and receive and
       // don't get confused as to whom they are sending to.
       if (__address__.ip.isAny()) {
       char hostname[512];
      if (gethostname(hostname, sizeof(hostname)) < 0) {
       PLOG(FATAL) << "Failed to initialize, gethostname";
       }
      // Lookup an IP address of local hostname, taking the first result.
       Try<net::IP> ip = net::getIP(hostname, __address__.ip.family());
      if (ip.isError()) {
       EXIT(EXIT_FAILURE)
       << "Failed to obtain the IP address for '" << hostname << "';"
       << " the DNS service may not be able to resolve it: " << ip.error();
       }
      __address__.ip = ip.get();
       }
      

       

      Well actually this is perfectly fine, except "gethostname" returns the container UUID instead of an valid host IP address. How is that even possible ?

       

      Any help would be greatly appreciated.

      Regards, Adam.

      Attachments

        Activity

          People

            Unassigned Unassigned
            acecile5555555 acecile5555555
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: