HBase
  1. HBase
  2. HBASE-7006

[MTTR] Improve Region Server Recovery Time - Distributed Log Replay

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.98.0, 0.95.1
    • Component/s: MTTR
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Distributed Log Replay Description:

      After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge.

      The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files.

      The advantages over existing log splitting recovered edits implementation:
      1) Eliminate the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance.
      2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again.

      The feature can be enabled by setting hbase.master.distributed.log.replay to true (by default is false)
      Show
      Distributed Log Replay Description: After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge. The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files. The advantages over existing log splitting recovered edits implementation: 1) Eliminate the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance. 2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again. The feature can be enabled by setting hbase.master.distributed.log.replay to true (by default is false)

      Description

      Just saw interesting issue where a cluster went down hard and 30 nodes had 1700 WALs to replay. Replay took almost an hour. It looks like it could run faster that much of the time is spent zk'ing and nn'ing.

      Putting in 0.96 so it gets a look at least. Can always punt.

      1. LogSplitting Comparison.pdf
        50 kB
        Jeffrey Zhong
      2. ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
        130 kB
        Jeffrey Zhong
      3. hbase-7006-combined.patch
        234 kB
        Jeffrey Zhong
      4. hbase-7006-combined-v1.patch
        246 kB
        Jeffrey Zhong
      5. hbase-7006-combined-v4.patch
        298 kB
        Jeffrey Zhong
      6. hbase-7006-combined-v5.patch
        307 kB
        Jeffrey Zhong
      7. hbase-7006-combined-v6.patch
        315 kB
        Jeffrey Zhong
      8. hbase-7006-combined-v7.patch
        315 kB
        Jeffrey Zhong
      9. hbase-7006-combined-v8.patch
        311 kB
        Jeffrey Zhong
      10. hbase-7006-combined-v9.patch
        312 kB
        Jeffrey Zhong
      11. hbase-7006-addendum.patch
        1.0 kB
        Jeffrey Zhong
      12. 7006-addendum-3.txt
        2 kB
        Ted Yu

        Issue Links

          Activity

          stack made changes -
          Link This issue depends upon HBASE-14028 [ HBASE-14028 ]
          Misty Stanley-Jones made changes -
          Link This issue relates to HBASE-11280 [ HBASE-11280 ]
          stack made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Nicolas Liochon made changes -
          Link This issue supercedes HBASE-6984 [ HBASE-6984 ]
          Ted Yu made changes -
          Link This issue relates to HBASE-8729 [ HBASE-8729 ]
          Jeffrey Zhong made changes -
          Link This issue relates to HBASE-8701 [ HBASE-8701 ]
          Jeffrey Zhong made changes -
          Link This issue relates to HBASE-8617 [ HBASE-8617 ]
          Ted Yu made changes -
          Attachment 7006-addendum-2.txt [ 12583683 ]
          Ted Yu made changes -
          Link This issue relates to HBASE-8573 [ HBASE-8573 ]
          Ted Yu made changes -
          Link This issue relates to HBASE-8568 [ HBASE-8568 ]
          Ted Yu made changes -
          Link This issue relates to HBASE-8575 [ HBASE-8575 ]
          Ted Yu made changes -
          Release Note Distributed Log Replay Description:

          After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge.

          The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files.

          The advantages over existing log splitting recovered edits implementation:
          1) Eliminate the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance.
          2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again.

          The feature can be disabled by setting hbase.master.distributed.log.replay to false(by default is true)
          Distributed Log Replay Description:

          After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge.

          The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files.

          The advantages over existing log splitting recovered edits implementation:
          1) Eliminate the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance.
          2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again.

          The feature can be enabled by setting hbase.master.distributed.log.replay to true (by default is false)
          Ted Yu made changes -
          Attachment 7006-addendum-3.txt [ 12583690 ]
          Ted Yu made changes -
          Attachment 7006-addendum-2.txt [ 12583683 ]
          Ted Yu made changes -
          Attachment 7006-addendum-2.txt [ 12583669 ]
          Ted Yu made changes -
          Attachment 7006-addendum-2.txt [ 12583669 ]
          Ted Yu made changes -
          Link This issue is related to HBASE-8567 [ HBASE-8567 ]
          Ted Yu made changes -
          Link This issue is related to HBASE-8560 [ HBASE-8560 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-addendum.patch [ 12583350 ]
          Jeffrey Zhong made changes -
          Release Note Distributed Log Replay Description:

          After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge.

          The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files.

          The advantages over existing log splitting recovered edits implementation:
          1) Eliminate the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance.
          2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again.

          The feature can be disabled by setting hbase.master.distributed.log.replay to false(by default is true)
          Description Just saw interesting issue where a cluster went down hard and 30 nodes had 1700 WALs to replay. Replay took almost an hour. It looks like it could run faster that much of the time is spent zk'ing and nn'ing.

          Putting in 0.96 so it gets a look at least. Can always punt.


          Distributed Log Replay Description:

          After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge.

          The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files.

          The advantages over existing log splitting recovered edits implementation:
          1) Eliminate the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance.
          2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again.

          The feature can be disabled by setting hbase.master.distributed.log.replay to false(by default is true)
          Just saw interesting issue where a cluster went down hard and 30 nodes had 1700 WALs to replay. Replay took almost an hour. It looks like it could run faster that much of the time is spent zk'ing and nn'ing.

          Putting in 0.96 so it gets a look at least. Can always punt.
          stack made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 0.98.0 [ 12323143 ]
          Resolution Fixed [ 1 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v9.patch [ 12583262 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v8.patch [ 12582743 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v7.patch [ 12582663 ]
          Jeffrey Zhong made changes -
          Description Just saw interesting issue where a cluster went down hard and 30 nodes had 1700 WALs to replay. Replay took almost an hour. It looks like it could run faster that much of the time is spent zk'ing and nn'ing.

          Putting in 0.96 so it gets a look at least. Can always punt.


          Distributed Log Replay Description:

          After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge.

          The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files.

          The advantages over existing log splitting recovered edits implementation:
          1) Eliminated the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance.
          2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again.

          The feature can be disabled by setting hbase.master.distributed.log.replay to false(by default is true)
          Just saw interesting issue where a cluster went down hard and 30 nodes had 1700 WALs to replay. Replay took almost an hour. It looks like it could run faster that much of the time is spent zk'ing and nn'ing.

          Putting in 0.96 so it gets a look at least. Can always punt.


          Distributed Log Replay Description:

          After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge.

          The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files.

          The advantages over existing log splitting recovered edits implementation:
          1) Eliminate the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance.
          2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again.

          The feature can be disabled by setting hbase.master.distributed.log.replay to false(by default is true)
          Jeffrey Zhong made changes -
          Description Just saw interesting issue where a cluster went down hard and 30 nodes had 1700 WALs to replay. Replay took almost an hour. It looks like it could run faster that much of the time is spent zk'ing and nn'ing.

          Putting in 0.96 so it gets a look at least. Can always punt.
          Just saw interesting issue where a cluster went down hard and 30 nodes had 1700 WALs to replay. Replay took almost an hour. It looks like it could run faster that much of the time is spent zk'ing and nn'ing.

          Putting in 0.96 so it gets a look at least. Can always punt.


          Distributed Log Replay Description:

          After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge.

          The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files.

          The advantages over existing log splitting recovered edits implementation:
          1) Eliminated the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance.
          2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again.

          The feature can be disabled by setting hbase.master.distributed.log.replay to false(by default is true)
          Jeffrey Zhong made changes -
          Issue Type Bug [ 1 ] New Feature [ 2 ]
          Jeffrey Zhong made changes -
          Summary [MTTR] Study distributed log splitting to see how we can make it faster [MTTR] Improve Region Server Recovery Time - Distributed Log Replay
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v6.patch [ 12582559 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v3.patch [ 12580940 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v5.patch [ 12582262 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v4.patch [ 12581430 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v4.patch [ 12581779 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v2.patch [ 12580612 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v4.patch [ 12581430 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v3.patch [ 12580940 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v2.patch [ 12580612 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined-v1.patch [ 12580153 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined.patch [ 12579497 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined.patch [ 12579488 ]
          Jeffrey Zhong made changes -
          Attachment hbase-7006-combined.patch [ 12579488 ]
          Jeffrey Zhong made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Jeffrey Zhong made changes -
          Attachment ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006.pdf [ 12563207 ]
          Jeffrey Zhong made changes -
          stack made changes -
          Fix Version/s 0.95.1 [ 12324288 ]
          Fix Version/s 0.95.0 [ 12324094 ]
          stack made changes -
          Fix Version/s 0.95.0 [ 12324094 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          stack made changes -
          Component/s MTTR [ 12320396 ]
          Jeffrey Zhong made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          Enis Soztutar made changes -
          Link This issue is related to HBASE-7825 [ HBASE-7825 ]
          Jeffrey Zhong made changes -
          Attachment LogSplitting Comparison.pdf [ 12568897 ]
          stack made changes -
          Priority Critical [ 2 ] Major [ 3 ]
          Jeffrey Zhong made changes -
          Assignee Jeffrey Zhong [ jeffreyz ]
          Jeffrey Zhong made changes -
          Attachment ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006.pdf [ 12563207 ]
          Nicolas Liochon made changes -
          Field Original Value New Value
          Link This issue is required by HBASE-5843 [ HBASE-5843 ]
          stack created issue -

            People

            • Assignee:
              Jeffrey Zhong
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development