-
Type:
Bug
-
Status: Resolved
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 2.0.0, 1.x
-
Component/s: storm-core, storm-server
-
Labels:
For now, when a topology is reassigned or killed for a cluster, supervisor will delete 4 files for an overdue storm:
- storm-code
- storm-ser
- storm-jar
- LocalAssignment
Slot.java
static DynamicState cleanupCurrentContainer(DynamicState dynamicState, StaticState staticState, MachineState nextState) throws Exception {
assert(dynamicState.container != null);
assert(dynamicState.currentAssignment != null);
assert(dynamicState.container.areAllProcessesDead());
dynamicState.container.cleanUp();
staticState.localizer.releaseSlotFor(dynamicState.currentAssignment, staticState.port);
DynamicState ret = dynamicState.withCurrentAssignment(null, null);
if (nextState != null)
return ret;
}
But we do not make a transaction to do this, if an exception occurred during deleting storm-code/ser/jar, an overdue local assignment will be left on disk.
Then when supervisor restart from the exception above, the slots will be initial and container will be recovered from LocalAssignments, the blob store will fetch the files from Nimbus/Master, but will get a KeyNotFoundException, and supervisor collapses again.
This will happens continuously and supervisor will never recover until we clean up all the local assignments manually.
This is the stack:
2017-12-27 14:15:04.434 o.a.s.l.AsyncLocalizer [INFO] Cleaning up unused topologies in /opt/meituan/storm/data/supervisor/stormdist
2017-12-27 14:15:04.434 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path /opt/meituan/storm/data/supervisor/stormdist/app_dpsr_realtime_shop_vane_allcates-14-1513685785
2017-12-27 14:15:04.445 o.a.s.d.s.Slot [INFO] STATE EMPTY msInState: 109 -> WAITING_FOR_BASIC_LOCALIZATION msInState: 1
2017-12-27 14:15:04.471 o.a.s.d.s.Supervisor [INFO] Starting supervisor with id 255d3fed-f3ee-4c7e-8a08-b693c9a6a072 at host gq-data-rt48.gq.sankuai.com.
2017-12-27 14:15:04.502 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:123) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:04.611 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:123) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:04.718 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormcode.ser from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:124) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:04.825 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormcode.ser from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:124) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:04.932 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormconf.ser from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:125) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:05.039 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormconf.ser from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:125) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:05.140 o.a.s.u.Utils [INFO] Could not extract resources from /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar
2017-12-27 14:15:05.142 o.a.s.d.s.Slot [INFO] STATE WAITING_FOR_BASIC_LOCALIZATION msInState: 697 -> WAITING_FOR_BLOB_LOCALIZATION msInState: 0
2017-12-27 14:15:05.142 o.a.s.l.AsyncLocalizer [WARN] Caught Exception While Downloading (rethrowing)...
java.io.FileNotFoundException: File '/opt/meituan/storm/data/supervisor/stormdist/app_dpsr_realtime_shop_vane_allcates-14-1513685785/stormconf.ser' does not exist
at org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:292) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1815) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:264) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:376) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:370) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBlobs.call(AsyncLocalizer.java:226) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBlobs.call(AsyncLocalizer.java:213) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
- is duplicated by
-
STORM-2878 Supervisor collapse continuously when there is a expired assignment for overdue storm
-
- Closed
-
- links to