Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2901

Reuse ZK connection for getKeySequenceNumber

    Details

      Description

      Now when our nimbus restarts, many zookeeper connections will be made in minutes, and it's really a pressure for our zookeeper server.

      I checkout the log and code to find that when nimbus restart, in order to sync local storm keys[ actually valid storms ], it will:

      1. check storm keys diff of local storm and zk remote.
      2. set up path for all the valid storm keys with a keySequenceNumber.
      3. In order to get the keySequenceNumber, now blobstore will make a new zk-client and connect to zk-server.

      This is the reason why thousands of connections are made. For our cluster, there are about 800+ topologies running, which means that at least 800 connections will be made which totally can be reused.

       

      This is part of nimbus re-starting log:

      2018-01-18 12:51:57.031 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
      2018-01-18 12:51:57.032 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=dx-data-rt-zk01:2181,dx-data-rt-zk02:2181,dx-data-rt-zk04:2181/mtstorm_101_dx_storm01 sessionTimeout=30000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@76513a57
      2018-01-18 12:51:57.032 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181. Will not attempt to authenticate using SASL (unknown error)
      2018-01-18 12:51:57.033 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection established to dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181, initiating session
      2018-01-18 12:51:57.034 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment complete on server dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181, sessionid = 0x45cd92f0cc7e938, negotiated timeout = 30000
      2018-01-18 12:51:57.034 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
      2018-01-18 12:51:57.037 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] backgroundOperationsLoop exiting
      2018-01-18 12:51:57.039 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: 0x45cd92f0cc7e938 closed
      2018-01-18 12:51:57.039 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down
      2018-01-18 12:51:57.040 o.a.s.cluster [INFO] setup-path/blobstore/app_waimairank_wm_recsys_user_block-4-1504509174-stormconf.ser/dx-data-rt-nimbus05.dx.sankuai.com:9827-1
      2018-01-18 12:51:57.051 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
      2018-01-18 12:51:57.051 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=dx-data-rt-zk01:2181,dx-data-rt-zk02:2181,dx-data-rt-zk04:2181/mtstorm_101_dx_storm01 sessionTimeout=30000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@69c222d6
      2018-01-18 12:51:57.052 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181. Will not attempt to authenticate using SASL (unknown error)
      2018-01-18 12:51:57.053 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection established to dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181, initiating session
      2018-01-18 12:51:57.055 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment complete on server dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181, sessionid = 0x25cd386f245eb72, negotiated timeout = 30000
      2018-01-18 12:51:57.055 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
      2018-01-18 12:51:57.058 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] backgroundOperationsLoop exiting
      2018-01-18 12:51:57.061 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: 0x25cd386f245eb72 closed
      2018-01-18 12:51:57.061 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down
      2018-01-18 12:51:57.061 o.a.s.cluster [INFO] setup-path/blobstore/app_waimairank_waimai_rank_rt_pipeline_user_feature-12-1507516853-stormconf.ser/dx-data-rt-nimbus05.dx.sankuai.com:9827-1

        Attachments

          Activity

            People

            • Assignee:
              danny0405 Yuzhao Chen
              Reporter:
              danny0405 Yuzhao Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 5h 10m
                5h 10m