Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Information Provided
-
1.11.3, 1.12.2, 1.13.0
-
None
Description
1. Bug description:
When RocksDB Checkpoint, it may be stuck in `WaitUntilFlushWouldNotStallWrites` method.
2. Simple analysis of the reasons:
2.1 Configuration parameters:
# Flink yaml: state.backend.rocksdb.predefined-options: SPINNING_DISK_OPTIMIZED_HIGH_MEM state.backend.rocksdb.compaction.style: UNIVERSAL # corresponding RocksDB config Compaction Style : Universal max_write_buffer_number : 4 min_write_buffer_number_to_merge : 3
Checkpoint is usually very fast. When the Checkpoint is executed, `WaitUntilFlushWouldNotStallWrites` is called. If there are 2 Immutable MemTables, which are less than `min_write_buffer_number_to_merge`, they will not be flushed. But will enter this code.
// method: GetWriteStallConditionAndCause if (mutable_cf_options.max_write_buffer_number> 3 && num_unflushed_memtables >= mutable_cf_options.max_write_buffer_number-1) { return {WriteStallCondition::kDelayed, WriteStallCause::kMemtableLimit}; }
Checkpoint thought there was a FlushJob, but it didn't. So will always wait.
2.2 solution:
Increase the restriction: the `number of Immutable MemTable` >= `min_write_buffer_number_to_merge will wait`.
The rocksdb community has fixed this bug, link: https://github.com/facebook/rocksdb/pull/7921
2.3 Code that can reproduce the bug:
3. Interesting point
This bug will be triggered only when `the number of sorted runs >= level0_file_num_compaction_trigger`.
Because there is a break in WaitUntilFlushWouldNotStallWrites.
if (cfd->imm()->NumNotFlushed() < cfd->ioptions()->min_write_buffer_number_to_merge && vstorage->l0_delay_trigger_count() < mutable_cf_options.level0_file_num_compaction_trigger) { break; }
Universal may have `l0_delay_trigger_count() >= level0_file_num_compaction_trigger`, so this bug is triggered.
Attachments
Issue Links
- is fixed by
-
FLINK-14482 Bump up rocksdb version
- Closed