Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1949

PreemptionVictimFilterImpl comparator violates transitivity causing exceptions

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Scheduler
    • Labels:
      None

      Description

      The PreemptionVictimFilterImpl uses a comparator to sort ResourceBags in order to preempt the biggest tasks first when searching for a victim. However, the current implementation causes an exception which causes the Scheduler to fail:

      SEVERE: Service PreemptorService [FAILED] has failed in the RUNNING state.
      java.lang.IllegalArgumentException: Comparison method violates its general contract!
              at java.util.TimSort.mergeLo(TimSort.java:777)
              at java.util.TimSort.mergeAt(TimSort.java:514)
              at java.util.TimSort.mergeCollapse(TimSort.java:441)
              at java.util.TimSort.sort(TimSort.java:245)
              at java.util.Arrays.sort(Arrays.java:1438)
              at com.google.common.collect.Ordering.immutableSortedCopy(Ordering.java:882)
              at org.apache.aurora.scheduler.preemptor.PreemptionVictimFilter$PreemptionVictimFilterImpl.filterPreemptionVictims(PreemptionVictimFilter.java:210)
              at org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$0(PendingTaskProcessor.java:178)
              at org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147)
              at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
              at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
              at org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:562)
              at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113)
              at org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:135)
              at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
              at org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:205)
              at com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:188)
              at com.google.common.util.concurrent.Callables$4.run(Callables.java:122)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

      Looking at the code, it seems it violates transitivity:

          @VisibleForTesting
          static final Ordering<ResourceBag> ORDER = new Ordering<ResourceBag>() {
            @Override
            public int compare(ResourceBag left, ResourceBag right) {
              Set<ResourceType> types = ImmutableSet.<ResourceType>builder()
                  .addAll(left.streamResourceVectors().map(e -> e.getKey()).iterator())
                  .addAll(right.streamResourceVectors().map(e -> e.getKey()).iterator())
                  .build();
      
              boolean allZero = true;
              boolean allGreaterOrEqual = true;
              boolean allLessOrEqual = true;
      
              for (ResourceType type : types) {
                int compare = left.valueOf(type).compareTo(right.valueOf(type));
                if (compare != 0) {
                  allZero = false;
                }
      
                if (compare < 0) {
                  allGreaterOrEqual = false;
                }
      
                if (compare > 0) {
                  allLessOrEqual = false;
                }
              }
      
              if (allZero) {
                return 0;
              }
      
              if (allGreaterOrEqual) {
                return 1;
              }
      
              if (allLessOrEqual) {
                return -1;
              }
      
              return 0;
            }
          };
      

      The example below illustrates the error:

      Resource:    X Y Z
      Bag A:       2 0 2
      Bag B:       1 2 1
      Bag C:       2 2 1
      

      We can see that A = B, B < C, and C = A which would cause an exception.

      There are a couple of routes we can take to solve this. What we really want is to be able to define partial ordering (comparator does total ordering) so we can do a topological sort. Additionally, we can give priority to particular resources (Dominant Resource Fairness).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jordanly Jordan Ly
                Reporter:
                jordanly Jordan Ly
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: