Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
1.4.0
-
None
Description
In larger deployments we've observed that opening the block manager can take a really long time, like tens of minutes or sometimes even hours. This is especially true as of 1.4 where the log block manager tries to optimize on-disk data structures during startup.
The default time to Raft peer eviction is 5 minutes. If one node is restarted and LBM startup takes over 5 minutes, or if all nodes are restarted and there's over 5 minutes of LBM startup time variance across them, the "slow" node could have all of its replicas evicted. Besides generating a lot of unnecessary work in rereplication, this effectively "defeats" the LBM optimizations in that it would have been equally slow (but more efficient) to reformat the node instead.
So, let's reorder startup such that LBM startup counts towards replica bootstrapping. One idea: adjust FsManager startup so that tablet-meta/cmeta files can be accessed early to construct bootstrapping replicas, but to defer opening of the block manager until after that time.