Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7144

Job re-scale fails on Flink >= 1.6 with certain values of maxParallelism

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.11.0
    • Fix Version/s: 2.14.0
    • Component/s: runner-flink
    • Labels:
      None

      Description

      I am unable to rescale job after moving it to flink runner 1.7. What I am doing is:

      1. Recompile job code just with swapped flink runner version 1.5 -> 1.7
      2. Run streaming job with parallelism 112 and maxParallelism 448
      3. Wait until checkpoint is taken
      4. Stop job
      5. Run job again with parallelims 224 and checpooint path to restore from
      6. Job fails

      The same happens if I try to increase parallelims. This procedure works for the same job compiled with flink runner 1.5 and run on 1.5.0. Fails with runner 1.7 on flink 1.7.2

      Exception is:

      java.lang.Exception: Exception while creating StreamOperatorStateContext.
      at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
      at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for WindowDoFnOperator_2b6af61dc418f10e82551367a7e7f78e_(83/224) from any of the 1 provided restore options.
      at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
      at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:284)
      at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
      ... 5 more
      Caused by: java.lang.IndexOutOfBoundsException: Index: 101, Size: 0
      at java.util.ArrayList.rangeCheck(ArrayList.java:653)
      at java.util.ArrayList.get(ArrayList.java:429)
      at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
      at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
      at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:759)
      at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:315)
      at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:73)
      at org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
      at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readKeyGroupStateData(HeapKeyedStateBackend.java:492)
      at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readStateHandleStateData(HeapKeyedStateBackend.java:453)
      at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:410)
      at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:358)
      at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:104)
      at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
      at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
      ... 7 more

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mxm Maximilian Michels
                Reporter:
                JozoVilcek Jozef Vilcek
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h