Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27230

RegionServer should be aborted when WAL.sync throws TimeoutIOException

    XMLWordPrintableJSON

Details

    • Hide
      This changes add additional logic for WAL.sync:
      If WAL.sync get a timeout exception, we wrap TimeoutIOException as a special WALSyncTimeoutIOException. When upper layer such as HRegion.doMiniBatchMutate called by HRegion.batchMutation catches this special exception, we abort the region server.
      Show
      This changes add additional logic for WAL.sync: If WAL.sync get a timeout exception, we wrap TimeoutIOException as a special WALSyncTimeoutIOException. When upper layer such as HRegion.doMiniBatchMutate called by HRegion.batchMutation catches this special exception, we abort the region server.

    Description

      As HBASE-27223 said, if WAL.sync get a timeout exception, we should abort the region server, as the design of WAL sync, is to succeed or die, there is no 'failure'. It is usually not a big deal is because we set a very large default value(5 minutes) for AbstractFSWAL.WAL_SYNC_TIMEOUT_MS, usually the WAL system will abort the region server if it can not finish the sync within 5 minutes.

      In the PR, only the WAL.sync timeout in HRegion#doWALAppend ,regionServer is always aborted. For WALUtil.writeMarker, it is just record the internal state and seems it is no need to always abort the regionServer when WAL.sync timeout,it is the internal state transition that determines whether regionServer is aborted.

      Attachments

        Issue Links

          Activity

            People

              comnetwork chenglei
              comnetwork chenglei
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: