Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Resolved
-
1.5.4
-
None
-
AWS EMR 5.17 (upgraded to Flink 1.5.4)
3 task managers, 1 job manager running in YARN in Hadoop
Running on Amazon Linux with OpenJDK 1.8
Description
I'm having trouble (which usually occurs after an hour of processing in a StreamExecutionEnvironment) where I get this failure message. I'm at a loss for what is causing it. I'm running this in AWS on EMR 5.17 with 3 task managers and a job manager running in a YARN cluster and I've upgraded my flink libraries to 1.5.4 to bypass another serialization issue and the kerberos auth issues.
The avro classes that are being deserialized were generated with avro 1.8.2.
2018-10-22 16:12:10,680 [INFO ] class=o.a.flink.runtime.taskmanager.Task thread="Calculate Estimated NAV -> Split into single messages (3/10)" Calculate Estimated NAV -> Split into single messages (3/10) (de7d8fa77 84903a475391d0168d56f2e) switched from RUNNING to FAILED. java.io.EOFException: null at org.apache.flink.core.memory.DataInputDeserializer.readLong(DataInputDeserializer.java:219) at org.apache.flink.core.memory.DataInputDeserializer.readDouble(DataInputDeserializer.java:138) at org.apache.flink.formats.avro.utils.DataInputDecoder.readDouble(DataInputDecoder.java:70) at org.apache.avro.io.ResolvingDecoder.readDouble(ResolvingDecoder.java:190) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:186) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:116) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:266) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:177) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:116) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) at org.apache.flink.formats.avro.typeutils.AvroSerializer.deserialize(AvroSerializer.java:172) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:208) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:49) at org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:140) at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:208) at org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:116) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712) at java.lang.Thread.run(Thread.java:748)
Do you have any ideas on how I could further troubleshoot this issue?
Attachments
Issue Links
- is caused by
-
FLINK-10469 FileChannel may not write the whole buffer in a single call to FileChannel.write(Buffer buffer)
- Closed