Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
BroadcastReduceTest test fails intermittent. But if I re-run, it passes sometimes.
When failure happens, I looked the log, there are some failed evalutors even the tasks are completed.
Looks like there is an exception:
ExceptionCaught ObjectDisposedException encountered error [System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'System.Net.Sockets.NetworkStream'.
Here are more details in a failed evaluator log:
INFO: Task Call Finished
Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.7115823-07:00 0004
INFO: Evaluator state : RUNNING
Org.Apache.REEF.Common.Runtime.Evaluator.HeartBeatManager Information: 0 : 2016-03-30T16:52:01.7115823-07:00 0004
INFO: Triggered a heartbeat: EvaluatorHeartbeatProto: task_id=[SlaveTask-SlaveTaskContext-3], task_status=[DONE], task_message=[], evaluator_status=[RUNNING], context_status=[evaluator Node-6-1459381912995 has context SlaveTaskContext-3 in state READY with recovery flag False], timestamp=[1459381921711], recoveryFlag =[False].
Org.Apache.REEF.Network.Naming.NameClient Information: 0 : 2016-03-30T16:52:01.7125833-07:00 0004
INFO: Unregistering id: SlaveTask-SlaveTaskContext-3
Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1[[Org.Apache.REEF.Wake.Remote.IRemoteEvent`1[[Org.Apache.REEF.Network.NetworkService.NsMessage`1[[Org.Apache.REEF.Network.Group.Driver.Impl.GeneralGroupCommunicationMessage, Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Wake, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]] Warning: 0 : 2016-03-30T16:52:01.7235831-07:00 0007
WARNING: In ReadAsync function unable to read the message.
Org.Apache.REEF.Wake.Remote.Impl.StreamingTransportServer`1 Information: 0 : 2016-03-30T16:52:01.7255836-07:00 0005
INFO: StreamingTransportServer has been closed.
Org.Apache.REEF.Network.NetworkService.StreamingNetworkService`1 Information: 0 : 2016-03-30T16:52:01.7570926-07:00 0004
INFO: Disposed of network service
Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1[[Org.Apache.REEF.Wake.Remote.IRemoteEvent`1[[Org.Apache.REEF.Network.NetworkService.NsMessage`1[[Org.Apache.REEF.Network.Group.Driver.Impl.GeneralGroupCommunicationMessage, Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Wake, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]] Error: 0 : 2016-03-30T16:52:01.7600922-07:00 0007
ERROR: ExceptionCaught ObjectDisposedException encountered error [System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'System.Net.Sockets.NetworkStream'.
at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncTrimPromise`1.Complete(TInstance thisRef, Func`3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization)
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadAsync>d__25.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadInt32Async>d__14.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Network.NetworkService.Codec.NsMessageStreamingCodec`1.<ReadAsync>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.RemoteEventStreamingCodec`1.<ReadAsync>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1.<ReadAsync>d__3.MoveNext()] with mesage [Cannot access a disposed object.
Object name: 'System.Net.Sockets.NetworkStream'.] and stack trace [ at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncTrimPromise`1.Complete(TInstance thisRef, Func`3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization)
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadAsync>d__25.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadInt32Async>d__14.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Network.NetworkService.Codec.NsMessageStreamingCodec`1.<ReadAsync>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.RemoteEventStreamingCodec`1.<ReadAsync>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1.<ReadAsync>d__3.MoveNext()]
Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader Error: 0 : 2016-03-30T16:52:01.7716008-07:00 0007
ERROR: ExceptionThrowing Exception encountered error [System.Exception: No bytes read] with mesage [No bytes read] and stack trace []
Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1[[Org.Apache.REEF.Wake.Remote.IRemoteEvent`1[[Org.Apache.REEF.Network.NetworkService.NsMessage`1[[Org.Apache.REEF.Network.Group.Driver.Impl.GeneralGroupCommunicationMessage, Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Wake, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]] Warning: 0 : 2016-03-30T16:52:01.7716008-07:00 0007
WARNING: In ReadAsync function unable to read the message.
Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1[[Org.Apache.REEF.Wake.Remote.IRemoteEvent`1[[Org.Apache.REEF.Network.NetworkService.NsMessage`1[[Org.Apache.REEF.Network.Group.Driver.Impl.GeneralGroupCommunicationMessage, Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Wake, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]] Error: 0 : 2016-03-30T16:52:01.7726012-07:00 0007
ERROR: ExceptionCaught Exception encountered error [System.Exception: No bytes read
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Org.Apache.REEF.Utilities.Diagnostics.Exceptions.Throw(Exception exception, String message, Logger logger)
at Org.Apache.REEF.Utilities.Diagnostics.Exceptions.Throw(Exception exception, Logger logger)
at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadInt32Async>d__14.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Network.NetworkService.Codec.NsMessageStreamingCodec`1.<ReadAsync>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.RemoteEventStreamingCodec`1.<ReadAsync>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1.<ReadAsync>d__3.MoveNext()] with mesage [No bytes read] and stack trace [ at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Org.Apache.REEF.Utilities.Diagnostics.Exceptions.Throw(Exception exception, String message, Logger logger)
at Org.Apache.REEF.Utilities.Diagnostics.Exceptions.Throw(Exception exception, Logger logger)
at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadInt32Async>d__14.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Network.NetworkService.Codec.NsMessageStreamingCodec`1.<ReadAsync>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.RemoteEventStreamingCodec`1.<ReadAsync>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1.<ReadAsync>d__3.MoveNext()]
Org.Apache.REEF.Common.Runtime.Evaluator.ReefMessageProtoObserver Information: 0 : 2016-03-30T16:52:01.7936021-07:00 0005
INFO: receive a ReefMessage from Org.Apache.REEF.Common.Protobuf.ReefProtocol.REEFMessage Driver at socket://10.130.68.76:9693.
Org.Apache.REEF.Common.Runtime.Evaluator.ReefMessageProtoObserver Information: 0 : 2016-03-30T16:52:01.7946028-07:00 0005
INFO: Control protobuf to remove context SlaveTaskContext-3
Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.7996021-07:00 0005
INFO: Received a REEFMessage with EvaluatorControl
Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8016030-07:00 0005
INFO: Handle Evaluator control message
Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8026027-07:00 0005
INFO: Send task control message to ContextManager
Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextManager Information: 0 : 2016-03-30T16:52:01.8056029-07:00 0005
INFO: RemoveContext with id SlaveTaskContext-3
Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextRuntime Warning: 0 : 2016-03-30T16:52:01.8386042-07:00 0005
WARNING: Shutting down an task because the underlying context is being closed.
Org.Apache.REEF.Common.Runtime.Evaluator.Task.TaskRuntime Information: 0 : 2016-03-30T16:52:01.8396050-07:00 0005
INFO: Trying to close Task SlaveTask-SlaveTaskContext-3
Org.Apache.REEF.Common.Runtime.Evaluator.Task.TaskRuntime Warning: 0 : 2016-03-30T16:52:01.8406054-07:00 0005
WARNING: Trying to close an task that is in Done state. Ignored.
Org.Apache.REEF.Common.Context.Defaults.DefaultContextStopHandler Information: 0 : 2016-03-30T16:52:01.8416044-07:00 0005
INFO: DefaultContextStopHandler received for context: SlaveTaskContext-3
Org.Apache.REEF.Network.NetworkService.StreamingNetworkService`1 Information: 0 : 2016-03-30T16:52:01.8416044-07:00 0005
INFO: Disposed of network service
Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8426035-07:00 0005
INFO: Context stack is empty, done
Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8426035-07:00 0005
INFO: Evaluator state : DONE
Org.Apache.REEF.Common.Runtime.Evaluator.HeartBeatManager Information: 0 : 2016-03-30T16:52:01.8426035-07:00 0005
INFO: Triggered a heartbeat: EvaluatorHeartbeatProto: task_id=[], task_status=[], task_message=[], evaluator_status=[DONE], context_status=[], timestamp=[1459381921842], recoveryFlag =[False].
Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8466051-07:00 0001
INFO: Runtime stop
Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8476052-07:00 0001
INFO: EvaluatorRuntime shutdown complete
Org.Apache.REEF.Evaluator.Evaluator Information: 0 : 2016-03-30T16:52:01.8476052-07:00 0001
INFO: Evaluator is returned from Run()
Attachments
Issue Links
- is superceded by
-
REEF-976 Fix broken C# Tests caused by race condition of local RM
- Resolved