Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5462

Flink job fails due to java.util.concurrent.CancellationException while snapshotting

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Bug
    • 1.2.0
    • None
    • None

    Description

      I'm using Flink 699f4b0.
      My restored, rescaled Flink job failed while creating a checkpoint with the following exception:

      2017-01-11 18:46:49,853 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 3 @ 1484160409846
      2017-01-11 18:49:50,111 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream
      .apply(AllWindowedStream.java:440)) (1/1) (2accc6ca2727c4f7ec963318fbd237e9) switched from RUNNING to FAILED.
      AsynchronousException{java.lang.Exception: Could not materialize checkpoint 3 for operator TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream.ap
      ply(AllWindowedStream.java:440)) (1/1).}
              at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.Exception: Could not materialize checkpoint 3 for operator TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream.apply(AllWind
      owedStream.java:440)) (1/1).
              ... 6 more
      Caused by: java.util.concurrent.CancellationException
              at java.util.concurrent.FutureTask.report(FutureTask.java:121)
              at java.util.concurrent.FutureTask.get(FutureTask.java:188)
              at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
              at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:899)
              ... 5 more
      2017-01-11 18:49:50,113 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Generate Event Window stream (90859d392c1da472e07695f434b332ef) switched from state RUNNING to FAILING.
      AsynchronousException{java.lang.Exception: Could not materialize checkpoint 3 for operator TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream.ap
      ply(AllWindowedStream.java:440)) (1/1).}
              at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.Exception: Could not materialize checkpoint 3 for operator TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1).
              ... 6 more
      Caused by: java.util.concurrent.CancellationException
              at java.util.concurrent.FutureTask.report(FutureTask.java:121)
              at java.util.concurrent.FutureTask.get(FutureTask.java:188)
              at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
              at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:899)
              ... 5 more
      2017-01-11 18:49:50,122 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source -> Timestamps/Watermarks (1/2) (e52c1211b5693552f5908b0082c80882) switched from RUNNING to CANCELING.
      

      There are no other logged around that time.

      Attachments

        1. application-1484132267957-0005
          1.46 MB
          Robert Metzger

        Activity

          People

            Unassigned Unassigned
            rmetzger Robert Metzger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: