XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Information Provided
Affects Version/s: 1.11.3, 1.12.2, 1.13.0
Fix Version/s: None
Component/s: Runtime / State Backends
Labels:
- auto-deprioritized-critical
- auto-deprioritized-major

Description

1. Bug description:

When RocksDB Checkpoint, it may be stuck in `WaitUntilFlushWouldNotStallWrites` method.

2. Simple analysis of the reasons:

2.1 Configuration parameters:

# Flink yaml:
state.backend.rocksdb.predefined-options: SPINNING_DISK_OPTIMIZED_HIGH_MEM
state.backend.rocksdb.compaction.style: UNIVERSAL


# corresponding RocksDB config
Compaction Style : Universal 

max_write_buffer_number : 4
min_write_buffer_number_to_merge : 3

Checkpoint is usually very fast. When the Checkpoint is executed, `WaitUntilFlushWouldNotStallWrites` is called. If there are 2 Immutable MemTables, which are less than `min_write_buffer_number_to_merge`, they will not be flushed. But will enter this code.

// method: GetWriteStallConditionAndCause
if (mutable_cf_options.max_write_buffer_number> 3 &&
              num_unflushed_memtables >=
                  mutable_cf_options.max_write_buffer_number-1) {
     return {WriteStallCondition::kDelayed, WriteStallCause::kMemtableLimit};
}

code link: https://github.com/facebook/rocksdb/blob/fbed72f03c3d9e4fdca3e5993587ef2559ba6ab9/db/column_family.cc#L847

Checkpoint thought there was a FlushJob, but it didn't. So will always wait.

2.2 solution:

Increase the restriction: the `number of Immutable MemTable` >= `min_write_buffer_number_to_merge will wait`.

The rocksdb community has fixed this bug, link: https://github.com/facebook/rocksdb/pull/7921

2.3 Code that can reproduce the bug:

https://github.com/1996fanrui/fanrui-learning/blob/flink-1.12/module-java/src/main/java/com/dream/rocksdb/RocksDBCheckpointStuck.java

3. Interesting point

This bug will be triggered only when `the number of sorted runs >= level0_file_num_compaction_trigger`.

Because there is a break in WaitUntilFlushWouldNotStallWrites.

if (cfd->imm()->NumNotFlushed() <
        cfd->ioptions()->min_write_buffer_number_to_merge &&
    vstorage->l0_delay_trigger_count() <
        mutable_cf_options.level0_file_num_compaction_trigger) {
  break;
}

code link: https://github.com/facebook/rocksdb/blob/fbed72f03c3d9e4fdca3e5993587ef2559ba6ab9/db/db_impl/db_impl_compaction_flush.cc#L1974

Universal may have `l0_delay_trigger_count() >= level0_file_num_compaction_trigger`, so this bug is triggered.

Attachments

Issue Links

is fixed by

FLINK-14482 Bump up rocksdb version

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Rui Fan

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 11/Mar/21 04:03

Updated:: 08/Dec/21 06:53

Resolved:: 08/Dec/21 06:53