Uploaded image for project: 'REEF'
  1. REEF
  2. REEF-2017

Org.Apache.REEF.IO.FileSystem.AzureBlob produces Error 503 (server unavailable) when reading data from Azure Blob into >=80 evaluators

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Blocker
    • Resolution: Unresolved
    • Affects Version/s: 0.17
    • Fix Version/s: 0.17
    • Component/s: REEF.NET IO
    • Labels:
      None

      Description

      Running into an issue where Azure Storage produces Microsoft.WindowsAzure.Storage.StorageException Error 503 server unavailable when I run a job that downloads data partitions from Azure Storage to 80 evaluators or more. This does not happen when using 64 evaluators. Full stack trace below.

       

       

      Org.Apache.REEF.IMRU.OnREEF.Driver.IMRUDriver`4[[Microsoft.MachineLearning.Distributed.Core.Trainers.KMeans.InputOutput.KMeansInputOutput, Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null],[Microsoft.MachineLearning.Distributed.Core.Trainers.KMeans.InputOutput.KMeansInputOutput, Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null],[Microsoft.MachineLearning.Runtime.IPredictor, Microsoft.MachineLearning.Core, Version=3.9.290.3615, Culture=neutral, PublicKeyToken=d353f9ba84f0e281],[Microsoft.MachineLearning.Distributed.Core.Common.IPipeline, Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null]] Warning: 0 : 2018-05-11T00:59:28.4674513+00:00 0031 : WARNING: Received IFailedEvaluator bf0bcb92-5773-448d-bffa-6c478b619beb from endpoint unknown_endpoint with systemState WaitingForEvaluator in retry# 0 with Exception: Org.Apache.REEF.Driver.Evaluator.EvaluatorException: One or more errors occurred. ---> System.AggregateException: One or more errors occurred. ---> Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (503) Server Unavailable. ---> System.Net.WebException: The remote server returned an error: (503) Server Unavailable.
      at Microsoft.WindowsAzure.Storage.Shared.Protocol.HttpResponseParsers.ProcessExpectedStatusCodeNoException[T](HttpStatusCode expectedStatusCode, HttpStatusCode actualStatusCode, T retVal, StorageCommandBase`1 cmd, Exception ex)
      at Microsoft.WindowsAzure.Storage.Blob.CloudBlob.<>c_DisplayClass1e.<GetBlobImpl>b_1b(RESTCommand`1 cmd, HttpWebResponse resp, Exception ex, OperationContext ctx)
      at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndGetResponse[T](IAsyncResult getResponseResult)
      — End of inner exception stack trace —
      at Microsoft.WindowsAzure.Storage.Core.Util.StorageAsyncResult`1.End()
      at Microsoft.WindowsAzure.Storage.Core.Util.AsyncExtensions.<>c_DisplayClass4.<CreateCallbackVoid>b_3(IAsyncResult ar)
      — End of inner exception stack trace —
      at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
      at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
      at Org.Apache.REEF.IO.FileSystem.AzureBlob.AzureCloudBlockBlob.DownloadToFile(String path, FileMode mode)
      at Org.Apache.REEF.IO.PartitionedData.FileSystem.FileSystemInputPartition`1.Download()
      at Org.Apache.REEF.IO.PartitionedData.FileSystem.FileSystemInputPartition`1.Cache()
      at Org.Apache.REEF.IMRU.OnREEF.Driver.DataLoadingContext`1.OnNext(IContextStart value)
      at Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextLifeCycle.Start()
      at Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextRuntime..ctor(IInjector serviceInjector, IConfiguration contextConfiguration, Optional`1 parentContext)
      — End of inner exception stack trace ---.

        Attachments

          Activity

            People

            • Assignee:
              dwaijam Dwaipayan Mukhopadhyay
              Reporter:
              nhkazmi Najeeb Kazmi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: