Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2674

NoNodeException when ZooKeeper tries to delete nodes

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0, 1.2.0, 1.1.2, 1.0.5
    • Component/s: None
    • Labels:
      None

      Description

      When StormClusterStateImpl reportError function is called, it will get all the children of

      /storm/errors/<topo-id>/count/
      

      and delete some znodes to keep latest 10 errors. NoNodeException could happen when any znode is already deleted by other executors.

      java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /errors/fastwc-halferrors-1-1501689263/count/e0000000562 at org.apache.storm.utils.Utils$2.run(Utils.java:345) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /errors/fastwc-halferrors-1-1501689263/count/e0000000562 at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:489) at org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:455) at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:98) at org.apache.storm.utils.Utils$2.run(Utils.java:335) ... 1 more Caused by: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /errors/fastwc-halferrors-1-1501689263/count/e0000000562 at org.apache.storm.utils.Utils.wrapInRuntime(Utils.java:413) at org.apache.storm.zookeeper.ClientZookeeper.deleteNode(ClientZookeeper.java:165) at org.apache.storm.cluster.ZKStateStorage.delete_node(ZKStateStorage.java:139) at org.apache.storm.cluster.StormClusterStateImpl.reportError(StormClusterStateImpl.java:655) at org.apache.storm.executor.error.ReportError.report(ReportError.java:69) at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.reportError(BoltOutputCollectorImpl.java:154) at org.apache.storm.task.OutputCollector.reportError(OutputCollector.java:234) at org.apache.storm.topology.BasicOutputCollector.reportError(BasicOutputCollector.java:70) at org.apache.storm.starter.FastWordCountTopology$WordCount.execute(FastWordCountTopology.java:113) at org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:125) at org.apache.storm.executor.Executor.onEvent(Executor.java:255) at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:476) ... 4 more Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /errors/fastwc-halferrors-1-1501689263/count/e0000000562 at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:250) at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:244) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:241) at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:225) at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35) at org.apache.storm.zookeeper.ClientZookeeper.deleteNode(ClientZookeeper.java:158) ... 15 more
      

        Issue Links

          Activity

          Hide
          revans2 Robert Joseph Evans added a comment -

          Ethan Li,

          I merged this into master, but the issue also exists on 1.x-branch. Could you back port your fix to that branch? Sadly because the file was moved git is not making it a clean merge.

          Show
          revans2 Robert Joseph Evans added a comment - Ethan Li , I merged this into master, but the issue also exists on 1.x-branch. Could you back port your fix to that branch? Sadly because the file was moved git is not making it a clean merge.
          Hide
          ethanli Ethan Li added a comment -

          Robert Joseph Evans Yes, sure. will do

          Show
          ethanli Ethan Li added a comment - Robert Joseph Evans Yes, sure. will do
          Hide
          ethanli Ethan Li added a comment -
          Show
          ethanli Ethan Li added a comment - On 1.x-branch: https://github.com/apache/storm/pull/2264

            People

            • Assignee:
              ethanli Ethan Li
              Reporter:
              ethanli Ethan Li
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 40m
                1h 40m

                  Development