Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14110

Deleting state.backend.rocksdb.localdir causes silent failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 1.8.1, 1.9.0
    • None
    • None
    • Flink 1.8.1 and 1.9.0.

      JVM 8

    Description

      Suppose state.backend.rocksdb.localdir is configured as:

      state.backend.rocksdb.localdir: /flink/tmp
      

      If I then run rm -rf /flink/tmp/job_* on a host while a Flink application is running, I will observe the following:

      • throughput of my operators running on that host will drop to zero
      • the application will not fail or restart
      • the task manager will not fail or restart
      • in most cases there is nothing in the logs to indicate a failure (I've run this several times and only once seen an exception - I believe I was lucky and deleted those directories during a checkpoint or something)

      The desired behaviour here would be to throw an exception and crash, instead of silently dropping throughput to zero. Restarting the Task Manager will resolve the issues.

      I only tried this on Flink 1.8.1 and 1.9.0.

      Attachments

        Activity

          People

            Unassigned Unassigned
            aaronlevin Aaron Levin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: