Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7144

Job re-scale fails on Flink >= 1.6 with certain values of maxParallelism

Details

    • Bug
    • Status: Triage Needed
    • P2
    • Resolution: Fixed
    • 2.11.0
    • 2.14.0
    • runner-flink
    • None

    Description

      I am unable to rescale job after moving it to flink runner 1.7. What I am doing is:

      1. Recompile job code just with swapped flink runner version 1.5 -> 1.7
      2. Run streaming job with parallelism 112 and maxParallelism 448
      3. Wait until checkpoint is taken
      4. Stop job
      5. Run job again with parallelims 224 and checpooint path to restore from
      6. Job fails

      The same happens if I try to increase parallelims. This procedure works for the same job compiled with flink runner 1.5 and run on 1.5.0. Fails with runner 1.7 on flink 1.7.2

      Exception is:

      java.lang.Exception: Exception while creating StreamOperatorStateContext.
      at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
      at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for WindowDoFnOperator_2b6af61dc418f10e82551367a7e7f78e_(83/224) from any of the 1 provided restore options.
      at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
      at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:284)
      at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
      ... 5 more
      Caused by: java.lang.IndexOutOfBoundsException: Index: 101, Size: 0
      at java.util.ArrayList.rangeCheck(ArrayList.java:653)
      at java.util.ArrayList.get(ArrayList.java:429)
      at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
      at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
      at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:759)
      at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:315)
      at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:73)
      at org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
      at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readKeyGroupStateData(HeapKeyedStateBackend.java:492)
      at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readStateHandleStateData(HeapKeyedStateBackend.java:453)
      at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:410)
      at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:358)
      at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:104)
      at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
      at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
      ... 7 more

      Attachments

        Issue Links

          Activity

            People

              mxm Maximilian Michels
              JozoVilcek Jozef Vilcek
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h