Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-8559

Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to get stuck

    Details

      Description

      In the RocksDBKeyedStatebackend#snapshotIncrementally we can find this code
       

      final RocksDBIncrementalSnapshotOperation<K> snapshotOperation =
      	new RocksDBIncrementalSnapshotOperation<>(
      		this,
      		checkpointStreamFactory,
      		checkpointId,
      		checkpointTimestamp);
      
      snapshotOperation.takeSnapshot();
      
      return new FutureTask<KeyedStateHandle>(
      	new Callable<KeyedStateHandle>() {
      		@Override
      		public KeyedStateHandle call() throws Exception {
      			return snapshotOperation.materializeSnapshot();
      		}
      	}
      ) {
      	@Override
      	public boolean cancel(boolean mayInterruptIfRunning) {
      		snapshotOperation.stop();
      		return super.cancel(mayInterruptIfRunning);
      	}
      
      	@Override
      	protected void done() {
      		snapshotOperation.releaseResources(isCancelled());
      	}
      };
      

      In the constructor of RocksDBIncrementalSnapshotOperation we call aquireResource() on the RocksDB ResourceGuard. If snapshotOperation.takeSnapshot() fails with an exception these resources are never released. When the task is shutdown due to the exception it will get stuck on releasing RocksDB.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Zentol Chesnay Schepler
                Reporter:
                Zentol Chesnay Schepler
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: