Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27231

FSHLog should retry writing WAL entries when syncs to HDFS failed.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-4
    • 3.0.0-beta-1
    • wal
    • None
    • Reviewed

    Description

      Just as HBASE-27223 said, basically, if the WAL write to HDFS fails, we do not know whether the data has been persistent or not. The implementation for AsyncFSWAL, is to open a new writer and try to write the WAL entries again, and then adding logic in WAL split and replay to deal with duplicate entries. But for FSHLog, it does not have the same logic with AsyncFSWAL, when ProtobufLogWriter.append and ProtobufLogWriter.sync failed, FSHLog.sync immediately throws the exception thrown by ProtobufLogWriter.append and ProtobufLogWriter.sync , we should implement the same retry logic as AsyncFSWAL, so WAL.sync could only throw TimeoutIOException and we could uniformly abort the RegionServer when WAL.sync failed.

      The basic idea is because both FSHLog.RingBufferEventHandler and AsyncFSWAL.consumeExecutor are single-thread, we could reuse the logic in AsyncWAL and move the most code in AsyncWAL upward to AbstractFSWAL , and just adapting the SyncRunner in FSHLog to the logic in AsyncWriter.sync. Once we do that, most logic in AsyncWAL and FSHLog are unified, just how to sync the writer is different.

      Attachments

        Issue Links

          Activity

            People

              comnetwork chenglei
              comnetwork chenglei
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: