Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2298

zookeeper: Should retry on EAI_NONAME return from getaddrinfo()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • None
    • None
    • None

    Description

      The zookeeper interface is designed to retry (once per second for up to ten minutes) if one or more of the Zookeeper hostnames can't be resolved (see MESOS-1326 and MESOS-1523).

      However, the current implementation assumes that a DNS resolution failure is indicated by zookeeper_init() returning NULL and errno being set to EINVAL (Zk translates getaddrinfo() failures into errno values). However, the current Zk code does:

      static int getaddrinfo_errno(int rc) {
          switch(rc) {
          case EAI_NONAME:
      // ZOOKEEPER-1323 EAI_NODATA and EAI_ADDRFAMILY are deprecated in FreeBSD.
      #if defined EAI_NODATA && EAI_NODATA != EAI_NONAME
          case EAI_NODATA:
      #endif
              return ENOENT;
          case EAI_MEMORY:
              return ENOMEM;
          default:
              return EINVAL;
          }
      }
      

      getaddrinfo() returns EAI_NONAME when "the node or service is not known"; per discussion in MESOS-2186, this seems to happen intermittently due to DNS failures.

      Proposed fix: looking at errno is always going to be somewhat fragile, but if we're going to continue doing that, we should check for ENOENT as well as EINVAL.

      Attachments

        Activity

          People

            Unassigned Unassigned
            neilc Neil Conway
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: