Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17341

freeSlot in TaskExecutor.closeJobManagerConnection cause ConcurrentModificationException

    XMLWordPrintableJSON

Details

    Description

      TaskExecutor may freeSlot when closeJobManagerConnection. freeSlot will modify the TaskSlotTable.slotsPerJob. this modify will cause ConcurrentModificationException.

      Iterator<AllocationID> activeSlots = taskSlotTable.getActiveSlots(jobId);
      
      final FlinkException freeingCause = new FlinkException("Slot could not be marked inactive.");
      
      while (activeSlots.hasNext()) {
       AllocationID activeSlot = activeSlots.next();
      
       try {
       if (!taskSlotTable.markSlotInactive(activeSlot, taskManagerConfiguration.getTimeout())) {
       freeSlotInternal(activeSlot, freeingCause);
       }
       } catch (SlotNotFoundException e) {
       log.debug("Could not mark the slot {} inactive.", jobId, e);
       }
      }
      

       error log:

      2020-04-21 23:37:11,363 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor                - Caught exception while executing runnable in main thread.
      java.util.ConcurrentModificationException
          at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
          at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
          at org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable$TaskSlotIterator.hasNext(TaskSlotTable.java:698)
          at org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable$AllocationIDIterator.hasNext(TaskSlotTable.java:652)
          at org.apache.flink.runtime.taskexecutor.TaskExecutor.closeJobManagerConnection(TaskExecutor.java:1314)
          at org.apache.flink.runtime.taskexecutor.TaskExecutor.access$1300(TaskExecutor.java:149)
          at org.apache.flink.runtime.taskexecutor.TaskExecutor$JobLeaderListenerImpl.lambda$jobManagerLostLeadership$1(TaskExecutor.java:1726)
          at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
          at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
          at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
          at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
          at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
          at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
      

      Attachments

        Issue Links

          Activity

            People

              mapohl Matthias Pohl
              huwh !huwh
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: