Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.0.0, 1.x
Description
For now, when a topology is reassigned or killed for a cluster, supervisor will delete 4 files for an overdue storm:
- storm-code
- storm-ser
- storm-jar
- LocalAssignment
Slot.java
static DynamicState cleanupCurrentContainer(DynamicState dynamicState, StaticState staticState, MachineState nextState) throws Exception {
assert(dynamicState.container != null);
assert(dynamicState.currentAssignment != null);
assert(dynamicState.container.areAllProcessesDead());
dynamicState.container.cleanUp();
staticState.localizer.releaseSlotFor(dynamicState.currentAssignment, staticState.port);
DynamicState ret = dynamicState.withCurrentAssignment(null, null);
if (nextState != null)
return ret;
}
But we do not make a transaction to do this, if an exception occurred during deleting storm-code/ser/jar, an overdue local assignment will be left on disk.
Then when supervisor restart from the exception above, the slots will be initial and container will be recovered from LocalAssignments, the blob store will fetch the files from Nimbus/Master, but will get a KeyNotFoundException, and supervisor collapses again.
This will happens continuously and supervisor will never recover until we clean up all the local assignments manually.
This is the stack:
2017-12-27 14:15:04.434 o.a.s.l.AsyncLocalizer [INFO] Cleaning up unused topologies in /opt/meituan/storm/data/supervisor/stormdist
2017-12-27 14:15:04.434 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path /opt/meituan/storm/data/supervisor/stormdist/app_dpsr_realtime_shop_vane_allcates-14-1513685785
2017-12-27 14:15:04.445 o.a.s.d.s.Slot [INFO] STATE EMPTY msInState: 109 -> WAITING_FOR_BASIC_LOCALIZATION msInState: 1
2017-12-27 14:15:04.471 o.a.s.d.s.Supervisor [INFO] Starting supervisor with id 255d3fed-f3ee-4c7e-8a08-b693c9a6a072 at host gq-data-rt48.gq.sankuai.com.
2017-12-27 14:15:04.502 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:123) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:04.611 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:123) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:04.718 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormcode.ser from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:124) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:04.825 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormcode.ser from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:124) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:04.932 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormconf.ser from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:125) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:05.039 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormconf.ser from blob store.
org.apache.storm.generated.KeyNotFoundException: null
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:125) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
2017-12-27 14:15:05.140 o.a.s.u.Utils [INFO] Could not extract resources from /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar
2017-12-27 14:15:05.142 o.a.s.d.s.Slot [INFO] STATE WAITING_FOR_BASIC_LOCALIZATION msInState: 697 -> WAITING_FOR_BLOB_LOCALIZATION msInState: 0
2017-12-27 14:15:05.142 o.a.s.l.AsyncLocalizer [WARN] Caught Exception While Downloading (rethrowing)...
java.io.FileNotFoundException: File '/opt/meituan/storm/data/supervisor/stormdist/app_dpsr_realtime_shop_vane_allcates-14-1513685785/stormconf.ser' does not exist
at org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:292) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1815) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:264) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:376) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:370) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBlobs.call(AsyncLocalizer.java:226) ~[storm-core-1.1.2-mt001.jar:?]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBlobs.call(AsyncLocalizer.java:213) ~[storm-core-1.1.2-mt001.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
Attachments
Issue Links
- is duplicated by
-
STORM-2878 Supervisor collapse continuously when there is a expired assignment for overdue storm
- Closed
- links to