ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1998

C library calls getaddrinfo unconditionally from zookeeper_interest

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 3.5.0
    • Fix Version/s: 3.5.2, 3.6.0
    • Component/s: c client
    • Labels:
      None

      Description

      (commented this on ZOOKEEPER-338)

      I've just noticed that we call getaddrinfo from zookeeper_interest... on every call. So from zookeeper_interest we always call update_addrs:

      https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L2082

      which in turns unconditionally calls resolve_hosts:

      https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L787

      which does the unconditional calls to getaddrinfo:

      https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L648

      We should fix this since it'll make 3.5.0 slower for people relying on DNS. I think this is happened as part of ZOOKEEPER-107 in which the list of servers can be updated.

      cc: Alexander Shraer, Patrick Hunt, Flavio Junqueira

        Issue Links

          Activity

          Hide
          Joe Smith added a comment -

          I've filed MESOS-2681 which tracks Apache Mesos being affected by this as well in the 3.4.x branch.

          Show
          Joe Smith added a comment - I've filed MESOS-2681 which tracks Apache Mesos being affected by this as well in the 3.4.x branch.
          Hide
          Hongchao Deng added a comment -

          hey Raul Gutierrez Segales.
          One suggestion – when pointing Github link don't use "trunk" use specific commit hash in the URL. This would make it point to where you meant when trunk commits change.

          Show
          Hongchao Deng added a comment - hey Raul Gutierrez Segales . One suggestion – when pointing Github link don't use "trunk" use specific commit hash in the URL. This would make it point to where you meant when trunk commits change.
          Hide
          Marshall McMullen added a comment -

          Raul Gutierrez Segales - Looking at the 3.4 code I agree with you. It seems like we should only do the lookup when we are connecting.

          Show
          Marshall McMullen added a comment - Raul Gutierrez Segales - Looking at the 3.4 code I agree with you. It seems like we should only do the lookup when we are connecting.
          Hide
          Raul Gutierrez Segales added a comment -

          I actually didn't look at 3.4! Since the code looked new, I just guessed. But, it turns out the that getaddrs is only called from zookeeper_init in 3.4 (unless I am overlooking something):

          https://github.com/apache/zookeeper/blob/branch-3.4/src/c/src/zookeeper.c#L836

          Do you have thoughts on how we could avoid this? I suppose we could easily just check if the addrvec is the same and if it is bypass resolving the hosts.

          Well, but lookups should only happen upon reconnects right? When things are healthy (i.e.: connected), it doesn't matter if we have (even new) unresolved hosts. So I'd say we only call resolve_hosts() when a new connection is needed.. Thoughts?

          Show
          Raul Gutierrez Segales added a comment - I actually didn't look at 3.4! Since the code looked new, I just guessed. But, it turns out the that getaddrs is only called from zookeeper_init in 3.4 (unless I am overlooking something): https://github.com/apache/zookeeper/blob/branch-3.4/src/c/src/zookeeper.c#L836 Do you have thoughts on how we could avoid this? I suppose we could easily just check if the addrvec is the same and if it is bypass resolving the hosts. Well, but lookups should only happen upon reconnects right? When things are healthy (i.e.: connected), it doesn't matter if we have (even new) unresolved hosts. So I'd say we only call resolve_hosts() when a new connection is needed.. Thoughts?
          Hide
          Marshall McMullen added a comment -

          Raul Gutierrez Segales - yep, you're right. I added that code as part of ZOOKEEPER-107 working with Alexander Shraer. But if I recall correctly, the original code also unconditionally called resolve_hosts. Though I'd have to go look at the original code to confirm that. I'm guessing you've done that already and that it did not do that?

          Do you have thoughts on how we could avoid this? I suppose we could easily just check if the addrvec is the same and if it is bypass resolving the hosts.

          Show
          Marshall McMullen added a comment - Raul Gutierrez Segales - yep, you're right. I added that code as part of ZOOKEEPER-107 working with Alexander Shraer . But if I recall correctly, the original code also unconditionally called resolve_hosts. Though I'd have to go look at the original code to confirm that. I'm guessing you've done that already and that it did not do that? Do you have thoughts on how we could avoid this? I suppose we could easily just check if the addrvec is the same and if it is bypass resolving the hosts.

            People

            • Assignee:
              Raul Gutierrez Segales
              Reporter:
              Raul Gutierrez Segales
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development