Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3096

blobstores deleted before topologies can be submitted

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • None

    Description

      STORM-3053 attempted to fix the race condition where a nimbus timer causes doCleanup() to delete the blobs during topology submission.  After the fix went in, we still see the error occurring.  I tracked the problem down to idsOfTopologiesWithPrivateWorkerKeys() at https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L893. 

       

      The previous change to wait to delete topologies is useful, but should be moved after all the topologies are discovered.

       

      
       018-06-03 11:53:42.581 o.a.s.d.n.Nimbus pool-37-thread-1014 [INFO] Received topology submission for topology-testHardCoreFaultTolerance-4 (storm-0.10.2.y.248 JDK-1.8.0_131) with conf {topology.users=[hadoopqa@DEV.YGRID.YAHOO.COM, hadoopqa], topology.acker.executors=0, storm.zookeeper.superACL=sasl:gstorm, topology.workers=3, topology.submitter.principal=hadoopqa@DEV.YGRID.YAHOO.COM, topology.debug=true, topology.disable.loadaware.messaging=true, storm.zookeeper.topology.auth.payload=#########################################, topology.name=topology-testHardCoreFaultTolerance-4, storm.zookeeper.topology.auth.scheme=digest, topology.kryo.register={}, nimbus.task.timeout.secs=200, storm.id=topology-testHardCoreFaultTolerance-4-18-1528026822, topology.kryo.decorators=[], topology.eventlogger.executors=0, topology.submitter.user=hadoopqa, topology.max.task.parallelism=null}
       2018-06-03 11:53:42.591 o.a.s.d.n.Nimbus timer [INFO] Cleaning up topology-testHardCoreFaultTolerance-4-18-1528026822
       2018-06-03 11:53:42.597 o.a.s.d.n.Nimbus pool-37-thread-1014 [INFO] uploadedJar /home/y/var/storm/nimbus/inbox/stormjar-3c73de98-ced7-4fd0-86d9-8fba3e5100f1.jar
       2018-06-03 11:53:42.601 o.a.s.c.StormClusterStateImpl pool-37-thread-1014 [INFO] set-path: /blobstore/topology-testHardCoreFaultTolerance-4-18-1528026822-stormjar.jar/openqe82blue-n1.blue.ygrid.yahoo.com:50560-1
       2018-06-03 11:53:42.621 o.a.s.d.n.Nimbus timer [INFO] Exception {}
       org.apache.storm.utils.WrappedKeyNotFoundException: topology-testHardCoreFaultTolerance-4-18-1528026822-stormcode.ser
       at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:259) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.blobstore.LocalFsBlobStore.getBlob(LocalFsBlobStore.java:394) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.blobstore.BlobStore.readBlobTo(BlobStore.java:310) ~[storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.blobstore.BlobStore.readBlob(BlobStore.java:339) ~[storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.TopoCache.readTopology(TopoCache.java:67) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.readStormTopologyAsNimbus(Nimbus.java:680) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.rmDependencyJarsInTopology(Nimbus.java:2389) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2443) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$37(Nimbus.java:2730) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.StormTimer$1.run(StormTimer.java:111) [storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:227) [storm-client-2.0.0.y.jar:2.0.0.y]
       2018-06-03 11:53:42.871 o.a.s.c.StormClusterStateImpl pool-37-thread-1014 [INFO] set-path: /blobstore/topology-testHardCoreFaultTolerance-4-18-1528026822-stormconf.ser/openqe82blue-n1.blue.ygrid.yahoo.com:50560-1
       2018-06-03 11:53:42.881 o.a.s.c.StormClusterStateImpl pool-37-thread-1014 [INFO] set-path: /blobstore/topology-testHardCoreFaultTolerance-4-18-1528026822-stormcode.ser/openqe82blue-n1.blue.ygrid.yahoo.com:50560-1
       2018-06-03 11:53:42.886 o.a.s.d.n.Nimbus pool-37-thread-1023 [INFO] Created download session dd7fa916-e489-47a5-beea-ac3eba6ed905 for topology-testHardCoreFaultTolerance-0-14-1528026818-stormjar.jar
       2018-06-03 11:53:42.888 o.a.s.d.n.Nimbus pool-37-thread-1014 [WARN] Topology submission exception. (topology name='topology-testHardCoreFaultTolerance-4')
       org.apache.storm.utils.WrappedKeyNotFoundException: topology-testHardCoreFaultTolerance-4-18-1528026822-stormjar.jar
       at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:259) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.blobstore.LocalFsBlobStore.getBlobReplication(LocalFsBlobStore.java:423) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.getBlobReplicationCount(Nimbus.java:1499) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.waitForDesiredCodeReplication(Nimbus.java:1509) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2982) [storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3508) [storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3487) [storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.11.0.jar:0.11.0]
       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [libthrift-0.11.0.jar:0.11.0]
       at org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:147) [storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) [libthrift-0.11.0.jar:0.11.0]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
       2018-06-03 11:53:42.888 o.a.t.ProcessFunction pool-37-thread-1014 [ERROR] Internal error processing submitTopologyWithOpts
       org.apache.storm.utils.WrappedKeyNotFoundException: topology-testHardCoreFaultTolerance-4-18-1528026822-stormjar.jar
       at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:259) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.blobstore.LocalFsBlobStore.getBlobReplication(LocalFsBlobStore.java:423) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.getBlobReplicationCount(Nimbus.java:1499) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.waitForDesiredCodeReplication(Nimbus.java:1509) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2982) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3508) ~[storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3487) ~[storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.11.0.jar:0.11.0]
       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [libthrift-0.11.0.jar:0.11.0]
       at org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:147) [storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) [libthrift-0.11.0.jar:0.11.0]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            agresch Aaron Gresch Assign to me
            agresch Aaron Gresch
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 20m
              20m

              Slack

                Issue deployment