Details
-
Sub-task
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
1.3.2, 1.4.0
-
None
-
Yarn, Flink 1.3.2, HDFS, FsStateBackend
Description
My job failed due to failure of cassandra. I have enabled ExternalizedCheckpoints. But when job tried to restore from that checkpoint it's failing continuously with following error.
2017-10-04 09:39:20,611 INFO org.apache.flink.runtime.taskmanager.Task - KeyedCEPPatternOperator -> Map (1/2) (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED. java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.StreamCorruptedException: invalid type code: 00 at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) at java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397) at org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311) ... 6 more
I have tried to start new job also after failure with parameter
-s [checkpoint meta data path]