Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3096

blobstores deleted before topologies can be submitted

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: None

      Description

      STORM-3053 attempted to fix the race condition where a nimbus timer causes doCleanup() to delete the blobs during topology submission.  After the fix went in, we still see the error occurring.  I tracked the problem down to idsOfTopologiesWithPrivateWorkerKeys() at https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L893. 

       

      The previous change to wait to delete topologies is useful, but should be moved after all the topologies are discovered.

       

      
       018-06-03 11:53:42.581 o.a.s.d.n.Nimbus pool-37-thread-1014 [INFO] Received topology submission for topology-testHardCoreFaultTolerance-4 (storm-0.10.2.y.248 JDK-1.8.0_131) with conf {topology.users=[hadoopqa@DEV.YGRID.YAHOO.COM, hadoopqa], topology.acker.executors=0, storm.zookeeper.superACL=sasl:gstorm, topology.workers=3, topology.submitter.principal=hadoopqa@DEV.YGRID.YAHOO.COM, topology.debug=true, topology.disable.loadaware.messaging=true, storm.zookeeper.topology.auth.payload=#########################################, topology.name=topology-testHardCoreFaultTolerance-4, storm.zookeeper.topology.auth.scheme=digest, topology.kryo.register={}, nimbus.task.timeout.secs=200, storm.id=topology-testHardCoreFaultTolerance-4-18-1528026822, topology.kryo.decorators=[], topology.eventlogger.executors=0, topology.submitter.user=hadoopqa, topology.max.task.parallelism=null}
       2018-06-03 11:53:42.591 o.a.s.d.n.Nimbus timer [INFO] Cleaning up topology-testHardCoreFaultTolerance-4-18-1528026822
       2018-06-03 11:53:42.597 o.a.s.d.n.Nimbus pool-37-thread-1014 [INFO] uploadedJar /home/y/var/storm/nimbus/inbox/stormjar-3c73de98-ced7-4fd0-86d9-8fba3e5100f1.jar
       2018-06-03 11:53:42.601 o.a.s.c.StormClusterStateImpl pool-37-thread-1014 [INFO] set-path: /blobstore/topology-testHardCoreFaultTolerance-4-18-1528026822-stormjar.jar/openqe82blue-n1.blue.ygrid.yahoo.com:50560-1
       2018-06-03 11:53:42.621 o.a.s.d.n.Nimbus timer [INFO] Exception {}
       org.apache.storm.utils.WrappedKeyNotFoundException: topology-testHardCoreFaultTolerance-4-18-1528026822-stormcode.ser
       at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:259) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.blobstore.LocalFsBlobStore.getBlob(LocalFsBlobStore.java:394) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.blobstore.BlobStore.readBlobTo(BlobStore.java:310) ~[storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.blobstore.BlobStore.readBlob(BlobStore.java:339) ~[storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.TopoCache.readTopology(TopoCache.java:67) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.readStormTopologyAsNimbus(Nimbus.java:680) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.rmDependencyJarsInTopology(Nimbus.java:2389) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2443) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$37(Nimbus.java:2730) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.StormTimer$1.run(StormTimer.java:111) [storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:227) [storm-client-2.0.0.y.jar:2.0.0.y]
       2018-06-03 11:53:42.871 o.a.s.c.StormClusterStateImpl pool-37-thread-1014 [INFO] set-path: /blobstore/topology-testHardCoreFaultTolerance-4-18-1528026822-stormconf.ser/openqe82blue-n1.blue.ygrid.yahoo.com:50560-1
       2018-06-03 11:53:42.881 o.a.s.c.StormClusterStateImpl pool-37-thread-1014 [INFO] set-path: /blobstore/topology-testHardCoreFaultTolerance-4-18-1528026822-stormcode.ser/openqe82blue-n1.blue.ygrid.yahoo.com:50560-1
       2018-06-03 11:53:42.886 o.a.s.d.n.Nimbus pool-37-thread-1023 [INFO] Created download session dd7fa916-e489-47a5-beea-ac3eba6ed905 for topology-testHardCoreFaultTolerance-0-14-1528026818-stormjar.jar
       2018-06-03 11:53:42.888 o.a.s.d.n.Nimbus pool-37-thread-1014 [WARN] Topology submission exception. (topology name='topology-testHardCoreFaultTolerance-4')
       org.apache.storm.utils.WrappedKeyNotFoundException: topology-testHardCoreFaultTolerance-4-18-1528026822-stormjar.jar
       at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:259) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.blobstore.LocalFsBlobStore.getBlobReplication(LocalFsBlobStore.java:423) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.getBlobReplicationCount(Nimbus.java:1499) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.waitForDesiredCodeReplication(Nimbus.java:1509) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2982) [storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3508) [storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3487) [storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.11.0.jar:0.11.0]
       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [libthrift-0.11.0.jar:0.11.0]
       at org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:147) [storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) [libthrift-0.11.0.jar:0.11.0]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
       2018-06-03 11:53:42.888 o.a.t.ProcessFunction pool-37-thread-1014 [ERROR] Internal error processing submitTopologyWithOpts
       org.apache.storm.utils.WrappedKeyNotFoundException: topology-testHardCoreFaultTolerance-4-18-1528026822-stormjar.jar
       at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:259) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.blobstore.LocalFsBlobStore.getBlobReplication(LocalFsBlobStore.java:423) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.getBlobReplicationCount(Nimbus.java:1499) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.waitForDesiredCodeReplication(Nimbus.java:1509) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2982) ~[storm-server-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3508) ~[storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3487) ~[storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.11.0.jar:0.11.0]
       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [libthrift-0.11.0.jar:0.11.0]
       at org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:147) [storm-client-2.0.0.y.jar:2.0.0.y]
       at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) [libthrift-0.11.0.jar:0.11.0]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                agresch Aaron Gresch
                Reporter:
                agresch Aaron Gresch
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m