[FLINK-4150] Problem with Blobstore in Yarn HA setting on recovery after cluster shutdown - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Runtime / Coordination
Labels:
None

Description

Submitting a job in Yarn with HA can lead to the following exception:

org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot load user class: org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09
ClassLoader info: URL ClassLoader:
    file: '/tmp/blobStore-ccec0f4a-3e07-455f-945b-4fcd08f5bac1/cache/blob_7fafffe9595cd06aff213b81b5da7b1682e1d6b0' (invalid JAR: zip file is empty)
Class not resolvable through given classloader.
	at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:207)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:222)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:588)
	at java.lang.Thread.run(Thread.java:745)

Some job information, including the Blob ids, are stored in Zookeeper. The actual Blobs are stored in a dedicated BlobStore, if the recovery mode is set to Zookeeper. This BlobStore is typically located in a FS like HDFS. When the cluster is shut down, the path for the BlobStore is deleted. When the cluster is then restarted, recovering jobs cannot restore because it's Blob ids stored in Zookeeper now point to deleted files.

Attachments

Issue Links

is duplicated by

FLINK-4182 HA recovery not working properly under ApplicationMaster failures.

Closed

relates to

FLINK-4166 Generate automatic different namespaces in Zookeeper for Flink applications

Closed

links to

GitHub Pull Request #2256

Activity

People

Assignee:: Ufuk Celebi

Reporter:: Stefan Richter

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/Jul/16 16:13

Updated:: 28/Feb/19 13:30

Resolved:: 25/Jul/16 13:21