Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1993

Aurora crashes when handling an unknown custom resource

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.16.0
    • 0.17.0
    • None
    • None

    Description

      While we tried to declare network bandwidth as a custom resource in Mesos, we faced a crash in Aurora with the following stacktrace:

      Jul 18, 2018 1:35:19 PM com.google.common.util.concurrent.ServiceManager$ServiceListener failed
      SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING state.
      java.lang.NullPointerException: Unknown Mesos resource: name: "network_bandwidth"
      type: SCALAR
      scalar {
      value: 2000.0
      }
      role: "*"
      11: "\n\adefault"
      at java.util.Objects.requireNonNull(Objects.java:228)
      at org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
      at org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
      at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
      at java.util.Iterator.forEachRemaining(Iterator.java:115)
      at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
      at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
      at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
      at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
      at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
      at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
      at org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
      at org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
      at org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
      at org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
      at org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
      at com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
      at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      E0718 13:35:19.240 [SlotSizeCounterService RUNNING, GuavaUtils$LifecycleShutdownListener:55] Service: SlotSizeCounterService [FAILED] faile
      I0718 13:35:19.240 [SlotSizeCounterService RUNNING, Lifecycle:84] Shutting down application
      I0718 13:35:19.240 [SlotSizeCounterService RUNNING, ShutdownRegistry$ShutdownRegistryImpl:77] Executing 4 shutdown commands.
      I0718 13:35:19.243 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] SchedulerLifecycle state machine transition ACTIVE -> DEAD
      I0718 13:35:19.249073 331 sched.cpp:2021] Asked to stop the driver
      I0718 13:35:19.249344 30748 sched.cpp:1203] Stopping framework 2a905643-b76f-4f17-a406-524d406f49f8-0000
      I0718 13:35:19.249 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] storage state machine transition READY -> STOPPED
      I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$6:267] Driver exited, terminating lifecycle.
      I0718 13:35:19.250 [BlockingDriverJoin, StateMachine$Builder:389] SchedulerLifecycle state machine transition DEAD -> DEAD
      I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$7:287] Shutdown already invoked, ignoring extra call.
      I0718 13:35:19.255 [CronLifecycle STOPPING, CronLifecycle:90] Shutting down Quartz cron scheduler.
      I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:694] Scheduler QuartzScheduler_$_aurora-cron-1 shutting down.
      I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:613] Scheduler QuartzScheduler_$_aurora-cron-1 paused.
      I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:771] Scheduler QuartzScheduler_$_aurora-cron-1 shutdown complete.
      E0718 13:35:19.945 [AsyncProcessor-0, AsyncUtil:159] java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Driver is no
      

      It would be great if Aurora was able to handle custom resources or at least not crash.

      We are using version 0.16.0.

       

      https://mesos.slack.com/archives/C1KR1PRP1/p1532013001000626

      Attachments

        Activity

          People

            Unassigned Unassigned
            clems4ever Clément Michaud
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: