Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0-alpha-4
-
None
-
Reviewed
Description
Just as HBASE-27223 said, basically, if the WAL write to HDFS fails, we do not know whether the data has been persistent or not. The implementation for AsyncFSWAL, is to open a new writer and try to write the WAL entries again, and then adding logic in WAL split and replay to deal with duplicate entries. But for FSHLog, it does not have the same logic with AsyncFSWAL, when ProtobufLogWriter.append and ProtobufLogWriter.sync failed, FSHLog.sync immediately throws the exception thrown by ProtobufLogWriter.append and ProtobufLogWriter.sync , we should implement the same retry logic as AsyncFSWAL, so WAL.sync could only throw TimeoutIOException and we could uniformly abort the RegionServer when WAL.sync failed.
The basic idea is because both FSHLog.RingBufferEventHandler and AsyncFSWAL.consumeExecutor are single-thread, we could reuse the logic in AsyncWAL and move the most code in AsyncWAL upward to AbstractFSWAL , and just adapting the SyncRunner in FSHLog to the logic in AsyncWriter.sync. Once we do that, most logic in AsyncWAL and FSHLog are unified, just how to sync the writer is different.
Attachments
Issue Links
- blocks
-
HBASE-27970 Make sync replication also work with FSHLog
- Resolved
- links to