Affects Version/s: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0
Fix Version/s: None
This question is similar to
We introduced Lazy Index, which helps us skip checking the index files of all log segments when starting kafka, which has greatly improved the speed of our kafka startup.
Unfortunately, it skips the index file detection of the active segment. The active segment will receive write requests from the client or the replica synchronization thread.
There is a situation when we skip the index detection of all segments, and we do not need to recover the unflushed log segment, and the index file of the last active segment is damaged at this time. When appending data to the active segment, at this time The program reported an error.
Below are the problems I encountered in the production environment:
When Kafka starts to load the log segment, I see in the program log that the memory mapping position of the index file with timestamp and offset is at the larger position of the current index file, but in fact, the index file is not written With so many index items, I guess this kind of problem will occur during the kafka startup process. When kafka has not been started yet, stop the kafka process at this time, and then start the kafka process again, whether it will cause the limit address of the index file memory map to be a file The maximum value is not cut to the actual size used, which will cause the memory map position to be set to limit when Kafka is started.
At this time, adding data to the active segment will cause niobufferoverflow.
I agree to skip the index detection of all inactive segments, because in fact they will no longer receive write requests, but for active segments, we need to perform index file detection.
Another situation is that we have CleanShutdown, but due to some factors, the index file of the active segment sets the position of the memory map to limit, resulting in a niobuffer overflow in the write