Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-528

c client exists() call with watch on large number of nodes (>100k) causes connection loss

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Invalid
    • 3.2.0
    • 3.3.0
    • c client
    • None
    • workaround: the test environment in this case had a max heap of 64m, by increasing the max mem via -Xmx the performance issue was addressed and the test ran fine.

    Description

      If I create 100k nodes on /misc then

      CPPUNIT_ASSERT_EQUAL(0, zoo_get_children(zh2, "/misc", 0, &children));
      for (int i = 0; i < children.count; i++)

      { sprintf(path, "/misc/%s", children.data[i]); CPPUNIT_ASSERT_EQUAL(0, zoo_exists(zh2, path, 1, &stat)); CPPUNIT_ASSERT_EQUAL(0, zoo_wexists(zh3, path, watcher, &ctx3, &stat)); }

      around 47k or so through the loop the client fails with -4 (connection loss), the client timeout is 30 seconds. The server command port shows the following, so it looks like it's not the server but some issue with watcher reg on the c client?

      phunt@valhalla:~$ echo stat | nc localhost 22181
      Zookeeper version: 3.3.0--1, built on 07/22/2009 23:55 GMT
      Clients:
      /127.0.0.1:45729[1](queued=0,recved=100024,sent=0)
      /127.0.0.1:50229[1](queued=0,recved=0,sent=0)
      /127.0.0.1:45731[1](queued=0,recved=47116,sent=0)
      /127.0.0.1:45730[1](queued=0,recved=47117,sent=1)

      Latency min/avg/max: 0/196/1026
      Received: 194257
      Sent: 1
      Outstanding: 0
      Zxid: 0x186a4
      Mode: standalone
      Node count: 100005

      729 is a separate client - the one that created the nodes originally.

      731 and 730 are zh2/zh3 in the code.

      Attachments

        Activity

          People

            phunt Patrick D. Hunt
            phunt Patrick D. Hunt
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: