Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0-alpha-4
Description
As HBASE-27223 said, if WAL.sync get a timeout exception, we should abort the region server, as the design of WAL sync, is to succeed or die, there is no 'failure'. It is usually not a big deal is because we set a very large default value(5 minutes) for AbstractFSWAL.WAL_SYNC_TIMEOUT_MS, usually the WAL system will abort the region server if it can not finish the sync within 5 minutes.
In the PR, only the WAL.sync timeout in HRegion#doWALAppend ,regionServer is always aborted. For WALUtil.writeMarker, it is just record the internal state and seems it is no need to always abort the regionServer when WAL.sync timeout,it is the internal state transition that determines whether regionServer is aborted.
Attachments
Issue Links
- relates to
-
HBASE-28803 HBase Master stuck due to improper handling of WALSyncTimeoutException within UncheckedIOException
- Resolved
- links to