Details
-
Bug
-
Status: Reopened
-
Major
-
Resolution: Unresolved
-
1.8.1, 1.9.0
-
None
-
None
-
Flink 1.8.1 and 1.9.0.
JVM 8
Description
Suppose state.backend.rocksdb.localdir is configured as:
state.backend.rocksdb.localdir: /flink/tmp
If I then run rm -rf /flink/tmp/job_* on a host while a Flink application is running, I will observe the following:
- throughput of my operators running on that host will drop to zero
- the application will not fail or restart
- the task manager will not fail or restart
- in most cases there is nothing in the logs to indicate a failure (I've run this several times and only once seen an exception - I believe I was lucky and deleted those directories during a checkpoint or something)
The desired behaviour here would be to throw an exception and crash, instead of silently dropping throughput to zero. Restarting the Task Manager will resolve the issues.
I only tried this on Flink 1.8.1 and 1.9.0.