Uploaded image for project: 'REEF'
  1. REEF
  2. REEF-1323 Fix failing tests for .NET code
  3. REEF-1302

BroadcastReduceTest fail intermitently

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.15
    • Labels:
      None

      Description

      BroadcastReduceTest test fails intermittent. But if I re-run, it passes sometimes.
      When failure happens, I looked the log, there are some failed evalutors even the tasks are completed.

      Looks like there is an exception:
      ExceptionCaught ObjectDisposedException encountered error [System.ObjectDisposedException: Cannot access a disposed object.
      Object name: 'System.Net.Sockets.NetworkStream'.

      Here are more details in a failed evaluator log:
      INFO: Task Call Finished
      Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.7115823-07:00 0004
      INFO: Evaluator state : RUNNING
      Org.Apache.REEF.Common.Runtime.Evaluator.HeartBeatManager Information: 0 : 2016-03-30T16:52:01.7115823-07:00 0004
      INFO: Triggered a heartbeat: EvaluatorHeartbeatProto: task_id=[SlaveTask-SlaveTaskContext-3], task_status=[DONE], task_message=[], evaluator_status=[RUNNING], context_status=[evaluator Node-6-1459381912995 has context SlaveTaskContext-3 in state READY with recovery flag False], timestamp=[1459381921711], recoveryFlag =[False].
      Org.Apache.REEF.Network.Naming.NameClient Information: 0 : 2016-03-30T16:52:01.7125833-07:00 0004
      INFO: Unregistering id: SlaveTask-SlaveTaskContext-3
      Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1[[Org.Apache.REEF.Wake.Remote.IRemoteEvent`1[[Org.Apache.REEF.Network.NetworkService.NsMessage`1[[Org.Apache.REEF.Network.Group.Driver.Impl.GeneralGroupCommunicationMessage, Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Wake, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]] Warning: 0 : 2016-03-30T16:52:01.7235831-07:00 0007
      WARNING: In ReadAsync function unable to read the message.
      Org.Apache.REEF.Wake.Remote.Impl.StreamingTransportServer`1 Information: 0 : 2016-03-30T16:52:01.7255836-07:00 0005
      INFO: StreamingTransportServer has been closed.
      Org.Apache.REEF.Network.NetworkService.StreamingNetworkService`1 Information: 0 : 2016-03-30T16:52:01.7570926-07:00 0004
      INFO: Disposed of network service
      Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1[[Org.Apache.REEF.Wake.Remote.IRemoteEvent`1[[Org.Apache.REEF.Network.NetworkService.NsMessage`1[[Org.Apache.REEF.Network.Group.Driver.Impl.GeneralGroupCommunicationMessage, Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Wake, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]] Error: 0 : 2016-03-30T16:52:01.7600922-07:00 0007
      ERROR: ExceptionCaught ObjectDisposedException encountered error [System.ObjectDisposedException: Cannot access a disposed object.
      Object name: 'System.Net.Sockets.NetworkStream'.
      at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
      at System.Threading.Tasks.TaskFactory`1.FromAsyncTrimPromise`1.Complete(TInstance thisRef, Func`3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization)
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadAsync>d__25.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadInt32Async>d__14.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Network.NetworkService.Codec.NsMessageStreamingCodec`1.<ReadAsync>d__0.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.RemoteEventStreamingCodec`1.<ReadAsync>d__0.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1.<ReadAsync>d__3.MoveNext()] with mesage [Cannot access a disposed object.
      Object name: 'System.Net.Sockets.NetworkStream'.] and stack trace [ at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
      at System.Threading.Tasks.TaskFactory`1.FromAsyncTrimPromise`1.Complete(TInstance thisRef, Func`3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization)
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadAsync>d__25.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadInt32Async>d__14.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Network.NetworkService.Codec.NsMessageStreamingCodec`1.<ReadAsync>d__0.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.RemoteEventStreamingCodec`1.<ReadAsync>d__0.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1.<ReadAsync>d__3.MoveNext()]
      Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader Error: 0 : 2016-03-30T16:52:01.7716008-07:00 0007
      ERROR: ExceptionThrowing Exception encountered error [System.Exception: No bytes read] with mesage [No bytes read] and stack trace []
      Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1[[Org.Apache.REEF.Wake.Remote.IRemoteEvent`1[[Org.Apache.REEF.Network.NetworkService.NsMessage`1[[Org.Apache.REEF.Network.Group.Driver.Impl.GeneralGroupCommunicationMessage, Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Wake, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]] Warning: 0 : 2016-03-30T16:52:01.7716008-07:00 0007
      WARNING: In ReadAsync function unable to read the message.
      Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1[[Org.Apache.REEF.Wake.Remote.IRemoteEvent`1[[Org.Apache.REEF.Network.NetworkService.NsMessage`1[[Org.Apache.REEF.Network.Group.Driver.Impl.GeneralGroupCommunicationMessage, Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Network, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]], Org.Apache.REEF.Wake, Version=0.15.0.0, Culture=neutral, PublicKeyToken=c27bf5b2e9a7ddb9]] Error: 0 : 2016-03-30T16:52:01.7726012-07:00 0007
      ERROR: ExceptionCaught Exception encountered error [System.Exception: No bytes read
      at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
      at Org.Apache.REEF.Utilities.Diagnostics.Exceptions.Throw(Exception exception, String message, Logger logger)
      at Org.Apache.REEF.Utilities.Diagnostics.Exceptions.Throw(Exception exception, Logger logger)
      at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadInt32Async>d__14.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Network.NetworkService.Codec.NsMessageStreamingCodec`1.<ReadAsync>d__0.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.RemoteEventStreamingCodec`1.<ReadAsync>d__0.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1.<ReadAsync>d__3.MoveNext()] with mesage [No bytes read] and stack trace [ at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
      at Org.Apache.REEF.Utilities.Diagnostics.Exceptions.Throw(Exception exception, String message, Logger logger)
      at Org.Apache.REEF.Utilities.Diagnostics.Exceptions.Throw(Exception exception, Logger logger)
      at Org.Apache.REEF.Wake.Remote.Impl.StreamDataReader.<ReadInt32Async>d__14.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Network.NetworkService.Codec.NsMessageStreamingCodec`1.<ReadAsync>d__0.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.RemoteEventStreamingCodec`1.<ReadAsync>d__0.MoveNext()
      — End of stack trace from previous location where exception was thrown —
      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
      at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
      at Org.Apache.REEF.Wake.Remote.Impl.StreamingLink`1.<ReadAsync>d__3.MoveNext()]
      Org.Apache.REEF.Common.Runtime.Evaluator.ReefMessageProtoObserver Information: 0 : 2016-03-30T16:52:01.7936021-07:00 0005
      INFO: receive a ReefMessage from Org.Apache.REEF.Common.Protobuf.ReefProtocol.REEFMessage Driver at socket://10.130.68.76:9693.
      Org.Apache.REEF.Common.Runtime.Evaluator.ReefMessageProtoObserver Information: 0 : 2016-03-30T16:52:01.7946028-07:00 0005
      INFO: Control protobuf to remove context SlaveTaskContext-3
      Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.7996021-07:00 0005
      INFO: Received a REEFMessage with EvaluatorControl
      Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8016030-07:00 0005
      INFO: Handle Evaluator control message
      Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8026027-07:00 0005
      INFO: Send task control message to ContextManager
      Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextManager Information: 0 : 2016-03-30T16:52:01.8056029-07:00 0005
      INFO: RemoveContext with id SlaveTaskContext-3
      Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextRuntime Warning: 0 : 2016-03-30T16:52:01.8386042-07:00 0005
      WARNING: Shutting down an task because the underlying context is being closed.
      Org.Apache.REEF.Common.Runtime.Evaluator.Task.TaskRuntime Information: 0 : 2016-03-30T16:52:01.8396050-07:00 0005
      INFO: Trying to close Task SlaveTask-SlaveTaskContext-3
      Org.Apache.REEF.Common.Runtime.Evaluator.Task.TaskRuntime Warning: 0 : 2016-03-30T16:52:01.8406054-07:00 0005
      WARNING: Trying to close an task that is in Done state. Ignored.
      Org.Apache.REEF.Common.Context.Defaults.DefaultContextStopHandler Information: 0 : 2016-03-30T16:52:01.8416044-07:00 0005
      INFO: DefaultContextStopHandler received for context: SlaveTaskContext-3
      Org.Apache.REEF.Network.NetworkService.StreamingNetworkService`1 Information: 0 : 2016-03-30T16:52:01.8416044-07:00 0005
      INFO: Disposed of network service
      Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8426035-07:00 0005
      INFO: Context stack is empty, done
      Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8426035-07:00 0005
      INFO: Evaluator state : DONE
      Org.Apache.REEF.Common.Runtime.Evaluator.HeartBeatManager Information: 0 : 2016-03-30T16:52:01.8426035-07:00 0005
      INFO: Triggered a heartbeat: EvaluatorHeartbeatProto: task_id=[], task_status=[], task_message=[], evaluator_status=[DONE], context_status=[], timestamp=[1459381921842], recoveryFlag =[False].
      Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8466051-07:00 0001
      INFO: Runtime stop
      Org.Apache.REEF.Common.Runtime.Evaluator.EvaluatorRuntime Information: 0 : 2016-03-30T16:52:01.8476052-07:00 0001
      INFO: EvaluatorRuntime shutdown complete
      Org.Apache.REEF.Evaluator.Evaluator Information: 0 : 2016-03-30T16:52:01.8476052-07:00 0001
      INFO: Evaluator is returned from Run()

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              juliaw Julia Wang

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment